What Are Indexes?
An index is a data structure β most commonly a B-tree β maintained by the database engine alongside your table. It maps the values of one or more columns to the physical locations of the corresponding rows, enabling the database to find matching rows without scanning every single row in the table.
Think of a book's index at the back: instead of reading the entire book to find mentions of a topic, you look up the term and jump directly to the relevant pages. SQL indexes work the same way.
When no index exists for a column used in a WHERE clause, the database reads every row from top to bottom. On a table with millions of rows, this is extremely slow. An index lets the engine jump directly to the matching rows β O(log n) instead of O(n).
Creating Indexes
The basic syntax is CREATE INDEX followed by a name, the table, and the column(s) to index.
-- Basic index on a single column
CREATE INDEX idx_email ON users(email);
-- Unique index β enforces uniqueness AND speeds up lookups
CREATE UNIQUE INDEX idx_email_unique ON users(email);
-- Index on a foreign key column (very common)
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Drop an index
DROP INDEX idx_email ON users; -- MySQL
DROP INDEX idx_email; -- PostgreSQL
Use a consistent naming pattern like idx_tablename_columnname. This makes it easy to identify indexes in query plans and schema inspection queries. For unique indexes, some teams use uix_ as a prefix.
Types of Indexes
Different database engines support different index types. Here are the most important ones:
| Type | Best For | Notes |
|---|---|---|
| B-tree | Range queries, equality, ORDER BY, LIKE 'prefix%' | Default in almost all databases. Works for most use cases. |
| Hash | Exact equality only (=) | Faster than B-tree for equality, but useless for range queries (>, <, BETWEEN). |
| Composite | Multi-column queries | Column order matters β see below. |
| Partial / Filtered | Subset of rows | Smaller and faster when you only query a filtered subset. |
| Full-text | Text search (CONTAINS, MATCH) | Specialized for keyword search in text columns. |
-- Hash index (PostgreSQL)
CREATE INDEX idx_user_hash ON users USING HASH (username);
-- Partial index β only index active users (PostgreSQL / SQLite)
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Full-text index (MySQL)
CREATE FULLTEXT INDEX idx_article_body ON articles(body);
-- Full-text index (PostgreSQL β uses GIN)
CREATE INDEX idx_article_tsv ON articles USING GIN (to_tsvector('english', body));
Composite Indexes and Column Order
A composite index covers multiple columns. The order of columns in the index definition is critical β the index is only usable when the query filters on the leftmost column(s) first (the leading column rule).
-- Create a composite index on (department, salary)
CREATE INDEX idx_dept_salary ON employees(department, salary);
-- β
Uses the index β filters on leading column
SELECT * FROM employees WHERE department = 'Engineering';
-- β
Uses the index β filters on both columns
SELECT * FROM employees WHERE department = 'Engineering' AND salary > 80000;
-- β Does NOT use the index β skips the leading column
SELECT * FROM employees WHERE salary > 80000;
-- β
Correct order for range + equality: put equality column first
CREATE INDEX idx_status_created ON orders(status, created_at);
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01';
Put the column with the highest cardinality (most distinct values) first in a composite index, UNLESS you always filter by an equality column β in that case, put the equality column first so the database can narrow down rows before applying the range filter.
When to Add an Index
Indexes are not free β they consume storage and slow down writes. Add an index only when you have a clear reason:
- Foreign key columns β always index FK columns to speed up JOINs and prevent lock issues on DELETE.
- WHERE clause columns β columns you filter on frequently in hot queries.
- JOIN columns β columns used in ON clauses between large tables.
- ORDER BY / GROUP BY columns β indexes can eliminate sort operations.
- High-cardinality columns β columns with many distinct values (email, user_id). Low-cardinality columns (is_active with only 0/1) rarely benefit from an index.
-- Find existing indexes on a table (MySQL)
SHOW INDEX FROM employees;
-- Find existing indexes (PostgreSQL)
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'employees';
-- Identify missing FK indexes (PostgreSQL)
SELECT conname, conrelid::regclass AS table_name,
a.attname AS column_name
FROM pg_constraint c
JOIN pg_attribute a ON a.attrelid = c.conrelid AND a.attnum = c.conkey[1]
WHERE c.contype = 'f'
AND NOT EXISTS (
SELECT 1 FROM pg_index i
WHERE i.indrelid = c.conrelid AND i.indkey[0] = a.attnum
);
Reading the EXPLAIN Plan
EXPLAIN shows the query execution plan β how the database intends to retrieve your data. EXPLAIN ANALYZE actually runs the query and shows real timing information.
-- MySQL: see the execution plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- MySQL: with actual runtime stats (MySQL 8.0+)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- PostgreSQL: estimated plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- PostgreSQL: actual execution plan with timings
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
Key things to look for in the output:
| Output Term | Meaning | Good or Bad? |
|---|---|---|
| Seq Scan / ALL | Full table scan β reads every row | Bad on large tables; investigate |
| Index Scan | Uses an index to find rows, then fetches full row | Good for low-cardinality matches |
| Index Only Scan | All needed data is in the index β never touches the table | Excellent |
| rows | Estimated number of rows to process | Lower is better |
| cost | Relative cost estimate (not wall-clock time) | Lower is better |
Trade-offs of Indexes
Every index you add has a cost:
- Slower writes β every INSERT, UPDATE, and DELETE must also update all relevant indexes. Tables with many indexes suffer noticeably on write-heavy workloads.
- Storage space β indexes consume disk space. A large table with many indexes can use more space for indexes than for the actual data.
- Maintenance overhead β over time, indexes can become fragmented and need to be rebuilt (
REINDEXin PostgreSQL,OPTIMIZE TABLEin MySQL).
More indexes is not always better. Profile your actual queries, identify the slow ones, and add targeted indexes. A common anti-pattern is adding an index on every column "just in case" β this degrades write performance without benefiting most queries.
Summary
π Summary
- An index is a B-tree (or other) data structure that speeds up row lookups at the cost of write overhead and storage.
- Use
CREATE INDEX idx_name ON table(column);to add an index;CREATE UNIQUE INDEXto enforce uniqueness. - Main types: B-tree (default), Hash (equality only), Composite (multi-column, order matters), Partial (filtered subset), Full-text.
- In composite indexes, queries must filter on the leading column; skipping it means the index is not used.
- Index FK columns, high-cardinality WHERE/JOIN/ORDER BY columns; avoid low-cardinality columns.
- Use
EXPLAIN/EXPLAIN ANALYZEto verify that indexes are being used. - Every index slows INSERT/UPDATE/DELETE β don't over-index write-heavy tables.
FAQ
Yes. In MySQL (InnoDB) and PostgreSQL, declaring a PRIMARY KEY automatically creates a unique index on that column. In MySQL InnoDB, the primary key is the clustered index β the table data is physically stored in primary key order. You do not need to manually create an index for primary key columns.
The query optimizer may skip your index for several reasons: the table is small (a full scan is faster than an index lookup for tiny tables); you used a function on the indexed column (WHERE LOWER(email) = ... breaks the index β use a functional index instead); the column has low cardinality; or statistics are stale. Run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) to update statistics.
A covering index includes all columns that a query needs β both the filter columns and the SELECT columns β so the database never has to access the actual table rows. This is called an "Index Only Scan" in PostgreSQL and produces the best read performance. You can create one by including extra columns: CREATE INDEX idx_cover ON orders(customer_id) INCLUDE (total, status); (PostgreSQL 11+).
There is no fixed number, but a common guideline is 3β6 indexes on OLTP tables. Start with a primary key index (automatic), indexes on all foreign keys, and indexes on your most frequently queried WHERE columns. Then profile slow queries and add targeted indexes as needed. Tables in reporting/data warehouse databases can have many more since they are read-heavy.