SQL Indexes – Speed Up Queries with CREATE INDEX

What Are Indexes?

An index is a data structure — most commonly a B-tree — maintained by the database engine alongside your table. It maps the values of one or more columns to the physical locations of the corresponding rows, enabling the database to find matching rows without scanning every single row in the table.

Think of a book's index at the back: instead of reading the entire book to find mentions of a topic, you look up the term and jump directly to the relevant pages. SQL indexes work the same way.

ℹ️

Without an index: full table scan

When no index exists for a column used in a WHERE clause, the database reads every row from top to bottom. On a table with millions of rows, this is extremely slow. An index lets the engine jump directly to the matching rows — O(log n) instead of O(n).

Creating Indexes

The basic syntax is CREATE INDEX followed by a name, the table, and the column(s) to index.

SQL

-- Basic index on a single column
CREATE INDEX idx_email ON users(email);

-- Unique index — enforces uniqueness AND speeds up lookups
CREATE UNIQUE INDEX idx_email_unique ON users(email);

-- Index on a foreign key column (very common)
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Drop an index
DROP INDEX idx_email ON users;           -- MySQL
DROP INDEX idx_email;                    -- PostgreSQL

💡

Naming convention

Use a consistent naming pattern like idx_tablename_columnname. This makes it easy to identify indexes in query plans and schema inspection queries. For unique indexes, some teams use uix_ as a prefix.

Types of Indexes

Different database engines support different index types. Here are the most important ones:

Type	Best For	Notes
B-tree	Range queries, equality, ORDER BY, LIKE 'prefix%'	Default in almost all databases. Works for most use cases.
Hash	Exact equality only (`=`)	Faster than B-tree for equality, but useless for range queries (>, <, BETWEEN).
Composite	Multi-column queries	Column order matters — see below.
Partial / Filtered	Subset of rows	Smaller and faster when you only query a filtered subset.
Full-text	Text search (CONTAINS, MATCH)	Specialized for keyword search in text columns.

SQL

-- Hash index (PostgreSQL)
CREATE INDEX idx_user_hash ON users USING HASH (username);

-- Partial index — only index active users (PostgreSQL / SQLite)
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';

-- Full-text index (MySQL)
CREATE FULLTEXT INDEX idx_article_body ON articles(body);

-- Full-text index (PostgreSQL — uses GIN)
CREATE INDEX idx_article_tsv ON articles USING GIN (to_tsvector('english', body));

Composite Indexes and Column Order

A composite index covers multiple columns. The order of columns in the index definition is critical — the index is only usable when the query filters on the leftmost column(s) first (the leading column rule).

SQL

-- Create a composite index on (department, salary)
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- ✅ Uses the index — filters on leading column
SELECT * FROM employees WHERE department = 'Engineering';

-- ✅ Uses the index — filters on both columns
SELECT * FROM employees WHERE department = 'Engineering' AND salary > 80000;

-- ❌ Does NOT use the index — skips the leading column
SELECT * FROM employees WHERE salary > 80000;

-- ✅ Correct order for range + equality: put equality column first
CREATE INDEX idx_status_created ON orders(status, created_at);
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01';

⚠️

Leading column rule

Put the column with the highest cardinality (most distinct values) first in a composite index, UNLESS you always filter by an equality column — in that case, put the equality column first so the database can narrow down rows before applying the range filter.

When to Add an Index

Indexes are not free — they consume storage and slow down writes. Add an index only when you have a clear reason:

Foreign key columns — always index FK columns to speed up JOINs and prevent lock issues on DELETE.
WHERE clause columns — columns you filter on frequently in hot queries.
JOIN columns — columns used in ON clauses between large tables.
ORDER BY / GROUP BY columns — indexes can eliminate sort operations.
High-cardinality columns — columns with many distinct values (email, user_id). Low-cardinality columns (is_active with only 0/1) rarely benefit from an index.

SQL

-- Find existing indexes on a table (MySQL)
SHOW INDEX FROM employees;

-- Find existing indexes (PostgreSQL)
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'employees';

-- Identify missing FK indexes (PostgreSQL)
SELECT conname, conrelid::regclass AS table_name,
       a.attname AS column_name
FROM pg_constraint c
JOIN pg_attribute a ON a.attrelid = c.conrelid AND a.attnum = c.conkey[1]
WHERE c.contype = 'f'
  AND NOT EXISTS (
    SELECT 1 FROM pg_index i
    WHERE i.indrelid = c.conrelid AND i.indkey[0] = a.attnum
  );

Reading the EXPLAIN Plan

EXPLAIN shows the query execution plan — how the database intends to retrieve your data. EXPLAIN ANALYZE actually runs the query and shows real timing information.

SQL

-- MySQL: see the execution plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- MySQL: with actual runtime stats (MySQL 8.0+)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- PostgreSQL: estimated plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- PostgreSQL: actual execution plan with timings
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Key things to look for in the output:

Output Term	Meaning	Good or Bad?
Seq Scan / ALL	Full table scan — reads every row	Bad on large tables; investigate
Index Scan	Uses an index to find rows, then fetches full row	Good for low-cardinality matches
Index Only Scan	All needed data is in the index — never touches the table	Excellent
rows	Estimated number of rows to process	Lower is better
cost	Relative cost estimate (not wall-clock time)	Lower is better

Trade-offs of Indexes

Every index you add has a cost:

Slower writes — every INSERT, UPDATE, and DELETE must also update all relevant indexes. Tables with many indexes suffer noticeably on write-heavy workloads.
Storage space — indexes consume disk space. A large table with many indexes can use more space for indexes than for the actual data.
Maintenance overhead — over time, indexes can become fragmented and need to be rebuilt (REINDEX in PostgreSQL, OPTIMIZE TABLE in MySQL).

⚠️

Don't over-index

More indexes is not always better. Profile your actual queries, identify the slow ones, and add targeted indexes. A common anti-pattern is adding an index on every column "just in case" — this degrades write performance without benefiting most queries.

Summary

Indexes: the Trade That Makes Reads Fast and Writes Slower

An index is a sorted lookup structure (usually a B-tree) the database keeps alongside your table. Without one, finding a row means a full table scan — reading every row. With one, the engine jumps straight to the match, like a book's index instead of flipping every page.

-- Slow at scale: scans the whole table
SELECT * FROM users WHERE email = 'a@b.com';

CREATE INDEX idx_users_email ON users(email);
-- Now the same query is a near-instant lookup

Indexes aren't free

Helps	Costs
`WHERE`, `JOIN`, `ORDER BY` lookups	every `INSERT/UPDATE/DELETE` must update the index too
enforcing `UNIQUE`	extra disk space

Index the columns you filter and join on — not every column. On a write-heavy table, too many indexes slow inserts to a crawl.

The composite index left-prefix rule

An index on (last_name, first_name) speeds queries filtering by last_name alone, or both — but not first_name alone. Like a phone book sorted by last name then first: useless for finding everyone named "John." Order composite columns most-selective / most-filtered first.

🏋️ Practical Exercise

Create an index on a column.
Create a composite index on two columns.
Create a unique index.
Inspect a query plan with EXPLAIN.
Drop an index.

🔥 Challenge Exercise

Add an index to speed up lookups by email, then use EXPLAIN to compare query plans before and after. Explain how indexes speed reads but slow writes, and when a composite index helps.

📋 Summary

An index is a B-tree (or other) data structure that speeds up row lookups at the cost of write overhead and storage.
Use CREATE INDEX idx_name ON table(column); to add an index; CREATE UNIQUE INDEX to enforce uniqueness.
Main types: B-tree (default), Hash (equality only), Composite (multi-column, order matters), Partial (filtered subset), Full-text.
In composite indexes, queries must filter on the leading column; skipping it means the index is not used.
Index FK columns, high-cardinality WHERE/JOIN/ORDER BY columns; avoid low-cardinality columns.
Use EXPLAIN / EXPLAIN ANALYZE to verify that indexes are being used.
Every index slows INSERT/UPDATE/DELETE — don't over-index write-heavy tables.

Interview Questions

What is a database index?
How do indexes improve query performance?
What is the downside of having too many indexes?
What is a composite index, and does column order matter?
What is the difference between a clustered and a non-clustered index?

FAQ

Does adding a PRIMARY KEY automatically create an index? +

Yes. In MySQL (InnoDB) and PostgreSQL, declaring a PRIMARY KEY automatically creates a unique index on that column. In MySQL InnoDB, the primary key is the clustered index — the table data is physically stored in primary key order. You do not need to manually create an index for primary key columns.

Why isn't my index being used? +

The query optimizer may skip your index for several reasons: the table is small (a full scan is faster than an index lookup for tiny tables); you used a function on the indexed column (WHERE LOWER(email) = ... breaks the index — use a functional index instead); the column has low cardinality; or statistics are stale. Run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) to update statistics.

What is a covering index? +

A covering index includes all columns that a query needs — both the filter columns and the SELECT columns — so the database never has to access the actual table rows. This is called an "Index Only Scan" in PostgreSQL and produces the best read performance. You can create one by including extra columns: CREATE INDEX idx_cover ON orders(customer_id) INCLUDE (total, status); (PostgreSQL 11+).

How many indexes should a table have? +

There is no fixed number, but a common guideline is 3–6 indexes on OLTP tables. Start with a primary key index (automatic), indexes on all foreign keys, and indexes on your most frequently queried WHERE columns. Then profile slow queries and add targeted indexes as needed. Tables in reporting/data warehouse databases can have many more since they are read-heavy.