Ad – 728×90
📖 Introduction

How Git Thinks – Snapshots, the DAG, and SHA-1 Hashes

Most developers learn Git by memorising commands. That works — until something goes wrong and you have no idea why. The developers who are genuinely fluent in Git have a correct mental model of how Git thinks: Git stores snapshots, not diffs. Commits form a graph, not a list. Every object has a cryptographic hash. HEAD is a pointer, not a fixed concept. In this lesson you will build that mental model from the ground up — and once it clicks, every Git command, every error message, and every tricky scenario will start to make intuitive sense.

⏱️ 20 min read 🎯 Beginner 📅 Updated 2026

Snapshots, Not Diffs

The most fundamental thing to understand about Git is how it stores data. Most older version control systems (CVS, SVN, Perforce) think of history as a list of file-based changes — they store the initial version of each file, then record only the lines that changed in each subsequent version.

Git is completely different. Git thinks of its data as a series of snapshots of your entire project. Every time you commit, Git takes a picture of all your tracked files at that exact moment and stores a reference to that complete snapshot. If a file has not changed since the last commit, Git does not store it again — it simply stores a reference (pointer) to the identical file it stored previously. This is both storage-efficient and conceptually simple: each commit is a complete, self-contained state of your project.

ApproachHow it stores historyExamples
Delta-based (diff-based)Initial file + list of changes per commitSVN, CVS, Perforce
Snapshot-basedComplete state of all files at each commitGit

This snapshot model is why Git operations like branching, merging, and switching between commits are so fast. Git never needs to "reconstruct" a file version by replaying diffs — it just retrieves the snapshot directly.

ℹ️
Unchanged files are stored once

If your project has 100 files and you change only 3 in a commit, Git stores the 3 new snapshots and points to the existing 97 unchanged file objects. There is no duplication. This is efficient because Git uses content-addressable storage — identical content is only stored once, regardless of how many commits reference it.

The Three Areas

Every file in a Git repository exists in one of three distinct areas. Moving files between these areas is what the core Git commands actually do. Getting this model clear in your head removes 80% of the confusion beginners experience with Git.

AreaAlso CalledWhere It LivesWhat It Contains
Working TreeWorking DirectoryYour project folderThe actual files you see and edit in your text editor
Staging AreaIndex, Cache.git/indexA list of files and their content that will go into the next commit
RepositoryGit Database, .git dir.git/ folderAll commits, branches, tags — the complete project history
Bash
# Working Tree → Staging Area
# "git add" copies files from your working tree into the staging area
git add index.html

# Staging Area → Repository
# "git commit" takes everything in the staging area and saves a snapshot
git commit -m "Add homepage"

# Repository → Working Tree
# "git checkout" or "git restore" copies files from a commit back to your working tree
git restore index.html

The staging area is the most misunderstood area for beginners. Why does it exist? It gives you fine-grained control over what goes into each commit. If you changed 10 files but only want to commit 3 of them together (because they are logically related), you stage just those 3. The other 7 remain modified in your working tree, ready to be staged in a future commit. This produces a clean, meaningful commit history — which is one of Git's most valuable professional features.

Ad – 336×280

The Commit DAG

Commits in Git do not form a simple linear list — they form a Directed Acyclic Graph (DAG). Each commit stores a reference to its parent commit (or two parent commits in the case of a merge). This chain of parent references is what makes Git's history work.

"Directed" means the arrows in the graph point in one direction (each commit points to its parent, backwards in time). "Acyclic" means the graph has no cycles — you can never follow parent references and end up back at the same commit.

Bash
# View the commit graph as a visual tree (very useful!)
git log --oneline --graph --all
▶ Example output — two branches
* a1b2c3d (HEAD -> main) Add footer
* 9f8e7d6 Merge branch 'feature/nav'
|\
| * 5c4d3e2 (feature/nav) Add navigation bar
| * 2b1a0f9 Add nav styles
|/
* 7e6d5c4 Initial commit

When you create a branch and make commits on it, and then merge it back, Git creates a merge commit that has two parents — one from each branch. This is what the |\ and |/ in the graph above represents. The DAG structure is what makes Git's powerful merge and rebase operations possible.

SHA-1 Hashes — Git's Unique IDs

Every single object in Git — every commit, every file version, every directory snapshot — is identified by a SHA-1 hash: a 40-character hexadecimal string like a1b2c3d4e5f6789012345678901234567890abcd. This hash is computed from the content of the object.

This content-based addressing has profound implications:

  • Integrity — If any bit of a file or commit is corrupted, its hash changes, and Git detects the corruption immediately.
  • Deduplication — Identical content always produces the identical hash, so Git stores duplicate content exactly once.
  • Immutability — You cannot change a past commit without changing its hash (and all subsequent commits). Git history is tamper-evident.
Bash
# See full commit hashes in the log
git log

# See abbreviated (short) hashes — first 7 characters
git log --oneline

# You can reference a commit by its first 4-7 characters
# as long as they are unique in the repository
git show a1b2c3d

# Git tells you the hash of the current commit
git rev-parse HEAD
▶ git log --oneline output
a1b2c3d (HEAD -> main) Add footer
9f8e7d6 Merge branch 'feature/nav'
5c4d3e2 Add navigation bar
7e6d5c4 Initial commit
ℹ️
SHA-1 collision risks

SHA-1 has known theoretical weaknesses, and Git's maintainers are aware. Git is transitioning to SHA-256 for new repositories (git init --object-format=sha256). For practical day-to-day use, SHA-1 collisions in Git repositories remain a non-issue — the risk is theoretical and requires enormous computational effort to exploit in practice.

HEAD — Where You Are Right Now

In Git, HEAD is a special pointer that tells you where you currently are in the repository. Most of the time, HEAD points to a branch name, which in turn points to the latest commit on that branch. When you make a new commit, the branch moves forward to the new commit — and HEAD moves with it, always pointing to the branch tip.

Bash
# See what HEAD is pointing at
cat .git/HEAD
# Output: ref: refs/heads/main

# After switching branches, HEAD updates
git switch feature/login
cat .git/HEAD
# Output: ref: refs/heads/feature/login

# "Detached HEAD" — HEAD points directly to a commit hash, not a branch
git checkout a1b2c3d
cat .git/HEAD
# Output: a1b2c3d4e5f6789012345678901234567890abcd

The concept of detached HEAD confuses many beginners. It simply means HEAD is pointing directly at a commit instead of at a branch. You are "looking at" that specific commit. If you make commits in this state, they have no branch pointing to them and will be garbage-collected eventually. To keep those commits, create a new branch: git switch -c new-branch-name.

Git's Four Object Types

Git's object database stores exactly four types of objects. Understanding these types gives you a complete picture of how Git represents your project:

Object TypeWhat It RepresentsContains
blobA single file's content at a specific versionRaw file contents (no filename, no metadata)
treeA directory snapshotReferences to blobs (files) and other trees (subdirectories), plus filenames and permissions
commitA project snapshot in timeReference to a root tree, parent commit(s), author, timestamp, and commit message
tagA named reference to a specific commitReference to a commit, tagger name, date, and message (for annotated tags)

When you run git commit, Git creates a new commit object that references a tree object (the root of your project directory), which in turn references blob objects for each tracked file. The commit object also records its parent commit's hash, forming the chain. This is the entire storage model — everything in Git is built from these four object types.

Why this mental model matters

Once you understand that commits are snapshots in a DAG, that branches are just lightweight pointers to commits, and that HEAD is just a pointer to the current branch — commands like git rebase, git reset, git cherry-pick, and detached HEAD become intuitive rather than mysterious. The mental model is the unlock for everything that follows.

📋 Summary

  • Git stores snapshots of your entire project at each commit — not line-by-line diffs like SVN.
  • Every file lives in one of three areas: working tree (what you edit), staging area / index (what will be committed), repository / .git (permanent history).
  • Commits form a Directed Acyclic Graph (DAG) — each commit points to its parent(s), enabling branching and merging.
  • Every object is identified by a SHA-1 hash of its content — commits, files, and directories all have unique, content-derived hashes.
  • HEAD is a pointer to your current location — usually points to a branch, which points to the latest commit on that branch.
  • Git has four object types: blob (file content), tree (directory), commit (snapshot + metadata), tag (named reference).

FAQ

Why does Git use SHA-1 instead of just numbering commits 1, 2, 3...? +

Sequential numbers only work in centralised systems where one server assigns them. Git is distributed — two developers on opposite sides of the world can make commits simultaneously with no central authority to assign numbers. SHA-1 hashes solve this: because the hash is derived from the commit's content (including timestamp, author, parent hash, and message), two different commits will virtually always produce different hashes, making them unique across all repositories without any coordination.

What is a "detached HEAD" and should I be worried? +

Detached HEAD means HEAD points directly at a commit hash instead of at a branch name. It is not harmful — you can look around, run git log, even make experimental commits. The only risk is that commits made in detached HEAD state have no branch tracking them, so they can be garbage-collected by Git eventually. To fix detached HEAD, either go back to a branch (git switch main) or create a new branch from your current position (git switch -c my-experiment).

Does Git store complete file copies in every commit? Isn't that wasteful? +

Git stores complete snapshots conceptually, but is storage-efficient in practice for two reasons: (1) identical content is stored as the same blob object (referenced by hash), so unchanged files take zero extra space; (2) Git periodically runs a "pack" operation that compresses objects using delta compression, storing only the differences for objects with similar content. The result is that a large repository's .git folder is typically smaller than you would expect from the "complete snapshots" model.

Is the staging area really necessary? Can I skip it? +

You can skip it for simple commits using git commit -a, which stages all tracked modified files and commits in one step. But the staging area is genuinely useful: it lets you craft focused, logical commits even when you have made many unrelated changes. Professional developers use the staging area (and git add -p for partial file staging) to produce clean, reviewable commit histories. Good commit hygiene is a professional skill that makes code reviews faster and git bisect more effective.