How Git Thinks – Snapshots, the DAG, and SHA-1 Hashes

Snapshots, Not Diffs

The most fundamental thing to understand about Git is how it stores data. Most older version control systems (CVS, SVN, Perforce) think of history as a list of file-based changes — they store the initial version of each file, then record only the lines that changed in each subsequent version.

Git is completely different. Git thinks of its data as a series of snapshots of your entire project. Every time you commit, Git takes a picture of all your tracked files at that exact moment and stores a reference to that complete snapshot. If a file has not changed since the last commit, Git does not store it again — it simply stores a reference (pointer) to the identical file it stored previously. This is both storage-efficient and conceptually simple: each commit is a complete, self-contained state of your project.

Approach	How it stores history	Examples
Delta-based (diff-based)	Initial file + list of changes per commit	SVN, CVS, Perforce
Snapshot-based	Complete state of all files at each commit	Git

This snapshot model is why Git operations like branching, merging, and switching between commits are so fast. Git never needs to "reconstruct" a file version by replaying diffs — it just retrieves the snapshot directly.

ℹ️

Unchanged files are stored once

If your project has 100 files and you change only 3 in a commit, Git stores the 3 new snapshots and points to the existing 97 unchanged file objects. There is no duplication. This is efficient because Git uses content-addressable storage — identical content is only stored once, regardless of how many commits reference it.

The Three Areas

Every file in a Git repository exists in one of three distinct areas. Moving files between these areas is what the core Git commands actually do. Getting this model clear in your head removes 80% of the confusion beginners experience with Git.

Area	Also Called	Where It Lives	What It Contains
Working Tree	Working Directory	Your project folder	The actual files you see and edit in your text editor
Staging Area	Index, Cache	`.git/index`	A list of files and their content that will go into the next commit
Repository	Git Database, .git dir	`.git/` folder	All commits, branches, tags — the complete project history

Bash

# Working Tree → Staging Area
# "git add" copies files from your working tree into the staging area
git add index.html

# Staging Area → Repository
# "git commit" takes everything in the staging area and saves a snapshot
git commit -m "Add homepage"

# Repository → Working Tree
# "git checkout" or "git restore" copies files from a commit back to your working tree
git restore index.html

The staging area is the most misunderstood area for beginners. Why does it exist? It gives you fine-grained control over what goes into each commit. If you changed 10 files but only want to commit 3 of them together (because they are logically related), you stage just those 3. The other 7 remain modified in your working tree, ready to be staged in a future commit. This produces a clean, meaningful commit history — which is one of Git's most valuable professional features.

The Commit DAG

Commits in Git do not form a simple linear list — they form a Directed Acyclic Graph (DAG). Each commit stores a reference to its parent commit (or two parent commits in the case of a merge). This chain of parent references is what makes Git's history work.

"Directed" means the arrows in the graph point in one direction (each commit points to its parent, backwards in time). "Acyclic" means the graph has no cycles — you can never follow parent references and end up back at the same commit.

Bash

# View the commit graph as a visual tree (very useful!)
git log --oneline --graph --all

▶ Example output — two branches

* a1b2c3d (HEAD -> main) Add footer
* 9f8e7d6 Merge branch 'feature/nav'
|\
| * 5c4d3e2 (feature/nav) Add navigation bar
| * 2b1a0f9 Add nav styles
|/
* 7e6d5c4 Initial commit

When you create a branch and make commits on it, and then merge it back, Git creates a merge commit that has two parents — one from each branch. This is what the |\ and |/ in the graph above represents. The DAG structure is what makes Git's powerful merge and rebase operations possible.

SHA-1 Hashes — Git's Unique IDs

Every single object in Git — every commit, every file version, every directory snapshot — is identified by a SHA-1 hash: a 40-character hexadecimal string like a1b2c3d4e5f6789012345678901234567890abcd. This hash is computed from the content of the object.

This content-based addressing has profound implications:

Integrity — If any bit of a file or commit is corrupted, its hash changes, and Git detects the corruption immediately.
Deduplication — Identical content always produces the identical hash, so Git stores duplicate content exactly once.
Immutability — You cannot change a past commit without changing its hash (and all subsequent commits). Git history is tamper-evident.

Bash

# See full commit hashes in the log
git log

# See abbreviated (short) hashes — first 7 characters
git log --oneline

# You can reference a commit by its first 4-7 characters
# as long as they are unique in the repository
git show a1b2c3d

# Git tells you the hash of the current commit
git rev-parse HEAD

▶ git log --oneline output

a1b2c3d (HEAD -> main) Add footer
9f8e7d6 Merge branch 'feature/nav'
5c4d3e2 Add navigation bar
7e6d5c4 Initial commit

ℹ️

SHA-1 collision risks

SHA-1 has known theoretical weaknesses, and Git's maintainers are aware. Git is transitioning to SHA-256 for new repositories (git init --object-format=sha256). For practical day-to-day use, SHA-1 collisions in Git repositories remain a non-issue — the risk is theoretical and requires enormous computational effort to exploit in practice.

HEAD — Where You Are Right Now

In Git, HEAD is a special pointer that tells you where you currently are in the repository. Most of the time, HEAD points to a branch name, which in turn points to the latest commit on that branch. When you make a new commit, the branch moves forward to the new commit — and HEAD moves with it, always pointing to the branch tip.

Bash

# See what HEAD is pointing at
cat .git/HEAD
# Output: ref: refs/heads/main

# After switching branches, HEAD updates
git switch feature/login
cat .git/HEAD
# Output: ref: refs/heads/feature/login

# "Detached HEAD" — HEAD points directly to a commit hash, not a branch
git checkout a1b2c3d
cat .git/HEAD
# Output: a1b2c3d4e5f6789012345678901234567890abcd

The concept of detached HEAD confuses many beginners. It simply means HEAD is pointing directly at a commit instead of at a branch. You are "looking at" that specific commit. If you make commits in this state, they have no branch pointing to them and will be garbage-collected eventually. To keep those commits, create a new branch: git switch -c new-branch-name.

Git's Four Object Types

Git's object database stores exactly four types of objects. Understanding these types gives you a complete picture of how Git represents your project:

Object Type	What It Represents	Contains
blob	A single file's content at a specific version	Raw file contents (no filename, no metadata)
tree	A directory snapshot	References to blobs (files) and other trees (subdirectories), plus filenames and permissions
commit	A project snapshot in time	Reference to a root tree, parent commit(s), author, timestamp, and commit message
tag	A named reference to a specific commit	Reference to a commit, tagger name, date, and message (for annotated tags)

When you run git commit, Git creates a new commit object that references a tree object (the root of your project directory), which in turn references blob objects for each tracked file. The commit object also records its parent commit's hash, forming the chain. This is the entire storage model — everything in Git is built from these four object types.

✅

Why this mental model matters

Once you understand that commits are snapshots in a DAG, that branches are just lightweight pointers to commits, and that HEAD is just a pointer to the current branch — commands like git rebase, git reset, git cherry-pick, and detached HEAD become intuitive rather than mysterious. The mental model is the unlock for everything that follows.

Git's Three Trees, and Why Commits Are Snapshots

Almost every confusing Git command makes sense once you picture the three trees a change moves through:

Tree	What it is	Moved by
Working directory	the files you edit	your editor
Staging area (index)	what you've marked for the next commit	`git add`
Repository (.git)	committed history	`git commit`

A change flows working → staging → repository. git add promotes edits to staging; git commit seals staging into permanent history. git status shows you which tree each change currently sits in.

Snapshots, not diffs

The other key idea: a Git commit stores a snapshot of the entire project at that moment, not a list of changes. (It's efficient because unchanged files are stored once and reused.) History is a chain of these snapshots; a branch is a pointer to one of them, and HEAD points to the branch you're on.

Why this matters: understanding "snapshot + pointers" demystifies reset (move a pointer), checkout/switch (point HEAD elsewhere), and merge (join two snapshot chains). You stop memorizing commands and start reasoning about where the pointers move.

🏋️ Practical Exercise

Name the three areas: working directory, staging, repository.
Move a file through the stages (edit → add → commit).
Note what HEAD points to.
Visualize commits as a linked chain.
Run git status to see each state.

🔥 Challenge Exercise

Describe the three Git areas — working directory, staging area (index), and repository — and trace a file as it moves edit → git add → git commit. Explain what HEAD is and how commits form a chain.

📋 Summary

Git stores snapshots of your entire project at each commit — not line-by-line diffs like SVN.
Every file lives in one of three areas: working tree (what you edit), staging area / index (what will be committed), repository / .git (permanent history).
Commits form a Directed Acyclic Graph (DAG) — each commit points to its parent(s), enabling branching and merging.
Every object is identified by a SHA-1 hash of its content — commits, files, and directories all have unique, content-derived hashes.
HEAD is a pointer to your current location — usually points to a branch, which points to the latest commit on that branch.
Git has four object types: blob (file content), tree (directory), commit (snapshot + metadata), tag (named reference).

Interview Questions

What are the three areas in Git’s model?
What is the staging area?
What is HEAD?
What is the difference between the working directory and the repository?
How are commits linked together?

FAQ

Why does Git use SHA-1 instead of just numbering commits 1, 2, 3...? +

Sequential numbers only work in centralised systems where one server assigns them. Git is distributed — two developers on opposite sides of the world can make commits simultaneously with no central authority to assign numbers. SHA-1 hashes solve this: because the hash is derived from the commit's content (including timestamp, author, parent hash, and message), two different commits will virtually always produce different hashes, making them unique across all repositories without any coordination.

What is a "detached HEAD" and should I be worried? +

Detached HEAD means HEAD points directly at a commit hash instead of at a branch name. It is not harmful — you can look around, run git log, even make experimental commits. The only risk is that commits made in detached HEAD state have no branch tracking them, so they can be garbage-collected by Git eventually. To fix detached HEAD, either go back to a branch (git switch main) or create a new branch from your current position (git switch -c my-experiment).

Does Git store complete file copies in every commit? Isn't that wasteful? +

Git stores complete snapshots conceptually, but is storage-efficient in practice for two reasons: (1) identical content is stored as the same blob object (referenced by hash), so unchanged files take zero extra space; (2) Git periodically runs a "pack" operation that compresses objects using delta compression, storing only the differences for objects with similar content. The result is that a large repository's .git folder is typically smaller than you would expect from the "complete snapshots" model.

Is the staging area really necessary? Can I skip it? +

You can skip it for simple commits using git commit -a, which stages all tracked modified files and commits in one step. But the staging area is genuinely useful: it lets you craft focused, logical commits even when you have made many unrelated changes. Professional developers use the staging area (and git add -p for partial file staging) to produce clean, reviewable commit histories. Good commit hygiene is a professional skill that makes code reviews faster and git bisect more effective.

Snapshots, Not Diffs

The Three Areas

The Commit DAG

SHA-1 Hashes — Git's Unique IDs

HEAD — Where You Are Right Now

Git's Four Object Types

Git's Three Trees, and Why Commits Are Snapshots

Snapshots, not diffs

🏋️ Practical Exercise

🔥 Challenge Exercise

📋 Summary

Interview Questions

Related Topics

FAQ