Ad – 728×90
🔧 Advanced SQL

SQL Normalization – 1NF, 2NF, 3NF Explained

Normalization is the process of organizing your database tables to minimize data redundancy and prevent data integrity problems. Each "normal form" (1NF, 2NF, 3NF) adds a stricter rule on top of the previous one. This lesson walks through all three levels with a concrete example, explains the anomalies they prevent, and covers when to intentionally denormalize.

⏱️ 25 min read 🎯 Intermediate 📅 Updated 2026

What Is Normalization?

Normalization is a systematic approach to designing relational database tables so that data is stored in one place and one place only — no duplication. When the same fact is stored in multiple places, updating it requires changing multiple rows, and forgetting one creates inconsistency.

The normalization process is guided by a series of rules called normal forms. Each form builds on the previous, adding a stricter constraint. In practice, most production databases aim for Third Normal Form (3NF).

Starting Point: An Unnormalized Table

Imagine an e-commerce system that tracks orders in a single table:

SQL – Unnormalized
-- A flat, unnormalized "God table" — everything in one place
CREATE TABLE orders_flat (
  order_id       INT,
  customer_id    INT,
  customer_name  VARCHAR(100),
  customer_email VARCHAR(100),
  customer_city  VARCHAR(50),
  product_ids    VARCHAR(200),   -- "101,102,103" — multi-value!
  product_names  VARCHAR(500),   -- "Laptop,Mouse,Keyboard"
  quantities     VARCHAR(100),   -- "1,2,1"
  unit_prices    VARCHAR(200),   -- "999.00,25.00,79.00"
  order_total    DECIMAL(10,2),
  order_date     DATE
);

-- Sample data:
-- order_id=1, customer_name='Alice', product_ids='101,102', ...
-- order_id=2, customer_name='Alice', product_ids='103', ...
-- (Alice's email stored twice — redundancy!)

This design has three types of problems:

  • Update anomaly — if Alice changes her email, you must update every row where she appears. Miss one row and you have inconsistent data.
  • Insert anomaly — you cannot add a customer until they place an order (there's no customer-only row).
  • Delete anomaly — if Alice's only order is deleted, all knowledge of Alice disappears from the database.

First Normal Form (1NF)

Rule: Every column must contain atomic (indivisible) values. No repeating groups or multi-valued columns. Every row must be uniquely identifiable.

Fix: Replace multi-value columns with separate rows; establish a primary key.

SQL – After 1NF
-- Split multi-value columns into separate rows.
-- The PK is now composite: (order_id, product_id)
CREATE TABLE orders_1nf (
  order_id       INT,
  customer_id    INT,
  customer_name  VARCHAR(100),
  customer_email VARCHAR(100),
  customer_city  VARCHAR(50),
  product_id     INT,          -- one product per row now
  product_name   VARCHAR(100),
  quantity       INT,
  unit_price     DECIMAL(10,2),
  order_date     DATE,
  PRIMARY KEY (order_id, product_id)
);
-- ✅ Atomic values, unique rows, no repeating groups
-- ❌ Still has redundancy: customer_name stored for every order line

Second Normal Form (2NF)

Rule: Must be in 1NF, PLUS no partial dependency — every non-key column must depend on the entire primary key, not just part of it. (This rule only applies when the primary key is composite.)

The problem: In orders_1nf, the PK is (order_id, product_id). But customer_name depends only on order_id — not on the product. And product_name depends only on product_id. These are partial dependencies.

Fix: Split into separate tables so each non-key column depends on the full PK of its table.

SQL – After 2NF
-- Products table — product_name depends only on product_id
CREATE TABLE products (
  product_id   INT PRIMARY KEY,
  product_name VARCHAR(100),
  unit_price   DECIMAL(10,2)
);

-- Orders table — customer info depends only on order_id
CREATE TABLE orders (
  order_id    INT PRIMARY KEY,
  customer_id INT,
  customer_name  VARCHAR(100),
  customer_email VARCHAR(100),
  customer_city  VARCHAR(50),
  order_date  DATE
);

-- Order items — junction table, PK is (order_id, product_id)
-- Both order_id and product_id are needed to identify a line item
CREATE TABLE order_items (
  order_id   INT,
  product_id INT,
  quantity   INT,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id)   REFERENCES orders(order_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);
-- ✅ No partial dependencies
-- ❌ Still has redundancy: customer_city depends on customer, not order
Ad – 336×280

Third Normal Form (3NF)

Rule: Must be in 2NF, PLUS no transitive dependency — non-key columns must depend on the primary key only, not on other non-key columns.

The problem: In the orders table, customer_city depends on customer_id, not on order_id. This is a transitive dependency: order_id → customer_id → customer_city.

Fix: Move customer information to its own table.

SQL – After 3NF
-- Customers table — all customer attributes depend only on customer_id
CREATE TABLE customers (
  customer_id   INT PRIMARY KEY,
  customer_name VARCHAR(100) NOT NULL,
  email         VARCHAR(100) UNIQUE,
  city          VARCHAR(50)
);

-- Orders table — only order-specific data; FK to customers
CREATE TABLE orders (
  order_id    INT PRIMARY KEY,
  customer_id INT NOT NULL,
  order_date  DATE NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Products table
CREATE TABLE products (
  product_id   INT PRIMARY KEY,
  product_name VARCHAR(100) NOT NULL,
  unit_price   DECIMAL(10,2) NOT NULL
);

-- Order items junction table
CREATE TABLE order_items (
  order_id   INT,
  product_id INT,
  quantity   INT NOT NULL,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id)   REFERENCES orders(order_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- ✅ Fully in 3NF:
-- Each non-key column depends ONLY on the PK of its table.
-- No redundancy. Update customer email in ONE place.
💡
The normalization mantra

"Every non-key attribute must depend on the key, the whole key, and nothing but the key." This covers 1NF (the key), 2NF (the whole key), and 3NF (nothing but the key) in a single memorable sentence.

Boyce-Codd Normal Form (BCNF)

BCNF is a slightly stricter version of 3NF. It covers edge cases where a table has multiple overlapping candidate keys. The rule: every determinant (left side of a functional dependency) must be a candidate key.

BCNF violations are rare in typical application schemas. If your tables are in 3NF and you don't have unusual overlapping composite keys, you're usually also in BCNF.

SQL – BCNF Example
-- Classic BCNF violation: student-course-teacher
-- (student_id, course_id) → teacher_id   [one teacher per course per student]
-- teacher_id → course_id                 [each teacher teaches one course]
-- But teacher_id is NOT a candidate key — it determines course_id,
-- which violates BCNF.

-- BCNF fix: decompose into two tables
CREATE TABLE teacher_courses (
  teacher_id INT PRIMARY KEY,
  course_id  INT,
  FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

CREATE TABLE student_teachers (
  student_id INT,
  teacher_id INT,
  PRIMARY KEY (student_id, teacher_id),
  FOREIGN KEY (student_id) REFERENCES students(student_id),
  FOREIGN KEY (teacher_id) REFERENCES teacher_courses(teacher_id)
);

When to Denormalize

Denormalization is the deliberate introduction of redundancy for performance reasons. It makes reads faster at the cost of making writes more complex (you must update duplicated data in multiple places).

Common scenarios where denormalization is justified:

  • Reporting and data warehouses — OLAP workloads have infrequent writes and very frequent complex aggregation reads. Denormalized "star schemas" with pre-joined fact tables are standard practice.
  • Cached aggregate columns — storing a pre-computed order_count on the customers table avoids a COUNT join on every page load. A trigger or application code keeps it updated.
  • Read performance on very large tables — when a normalized JOIN is a bottleneck even with indexes, adding redundant columns to avoid the JOIN can be necessary.
⚠️
Denormalize as a measured decision

Start normalized. Measure actual query performance. Denormalize only the specific bottlenecks you have identified — not as a pre-emptive strategy. Premature denormalization creates data integrity problems that are hard to fix later.

Summary

📋 Summary

  • Normalization organizes tables to eliminate redundancy and prevent update, insert, and delete anomalies.
  • 1NF — atomic column values, no repeating groups, every row uniquely identifiable.
  • 2NF — 1NF + no partial dependencies (non-key columns depend on the full composite PK).
  • 3NF — 2NF + no transitive dependencies (non-key columns depend only on the PK, not on other non-key columns).
  • BCNF — 3NF + every determinant of a functional dependency is a candidate key. Covers rare edge cases.
  • The normalization mantra: depend on the key, the whole key, and nothing but the key.
  • Denormalize deliberately and measurably — start normalized, identify real bottlenecks, then add controlled redundancy.

FAQ

Do I always need to go all the way to 3NF? +

For OLTP (transactional) databases, 3NF is the standard target and the right default. In practice, most well-designed application schemas reach 3NF naturally. Going beyond 3NF to BCNF or 4NF is only necessary when you have specific structural problems (overlapping candidate keys, multi-valued dependencies) — which are rare in typical app databases.

What is the difference between 2NF and 3NF? +

2NF eliminates partial dependencies — where a non-key column depends on only part of a composite primary key. 3NF eliminates transitive dependencies — where a non-key column depends on another non-key column, which in turn depends on the PK. In other words: 2NF is about the relationship between non-key columns and the PK; 3NF is about relationships between non-key columns and each other.

Is a single-column primary key always in 2NF? +

Yes. Partial dependencies only arise with composite primary keys. If your table has a single-column PK, it is automatically in 2NF (assuming it's in 1NF). You still need to check for 3NF violations (transitive dependencies), which can exist even with a single-column PK.

How does normalization affect query performance? +

Normalized tables generally require more JOINs to reassemble data, which adds query complexity and can be slower on very large datasets without proper indexes. For OLTP workloads with many small transactions, normalized schemas perform well. For OLAP/reporting workloads with complex aggregations over large datasets, denormalized schemas (like star/snowflake schemas in data warehouses) often perform better. Always index your foreign key columns in normalized schemas.