What Is a Correlated Subquery?
In a regular (non-correlated) subquery, the inner query is independent and runs once. In a correlated subquery, the inner query references a column from the outer query, so it must run once for each row processed by the outer query.
| Feature | Regular Subquery | Correlated Subquery |
|---|---|---|
| References outer query? | No | Yes |
| Runs how many times? | Once | Once per outer row |
| Can be cached? | Yes (run once) | No (changes per row) |
| Performance | Usually fast | Potentially slow (O(N) executions) |
How It Works
The outer query passes the current row's values into the inner query. The inner query uses those values, returns a result, and the outer query decides whether to include that row:
-- Find employees earning more than their department average
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e.department -- e.department comes from the outer query
);
-- For each row in the outer query, the inner query computes AVG for THAT department
Look for a column reference in the inner query that uses an alias defined in the outer FROM clause (e.g., e.department where e is the outer table alias). If the inner query has no reference to the outer query, it is a regular subquery.
Correlated vs Regular Subquery
-- Regular subquery: inner query runs ONCE, returns the overall average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Correlated subquery: inner query runs ONCE PER ROW, returns dept-specific average
SELECT name, department, salary
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department = e.department);
More Correlated Subquery Examples
Find the most recent order for each customer:
SELECT o.customer_id, o.order_date, o.amount
FROM orders o
WHERE o.order_date = (
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.customer_id = o.customer_id -- correlated reference
);
Find products that have never been ordered:
SELECT p.name, p.category
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.id -- correlated reference
);
Performance Considerations
Correlated subqueries execute once per outer row. For a table with 100,000 rows, that is 100,000 subquery executions. Most can be rewritten as JOINs for better performance:
-- Correlated subquery (potentially slow for large tables)
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e.department
);
-- Equivalent JOIN with derived table (faster)
SELECT e.name, e.department, e.salary
FROM employees e
JOIN (
SELECT department, AVG(salary) AS dept_avg
FROM employees
GROUP BY department
) dept_stats ON e.department = dept_stats.department
WHERE e.salary > dept_stats.dept_avg;
Some correlated subqueries cannot be trivially rewritten as JOINs β particularly those using EXISTS/NOT EXISTS for semi-join or anti-join semantics. Modern query optimisers (PostgreSQL, SQL Server, MySQL 8) often convert correlated subqueries to joins internally. Always EXPLAIN/profile your query on real data.
Summary
π Summary
- A correlated subquery references a column from the outer query β making it dependent on each outer row.
- It re-executes once per outer row, which can be slow on large datasets.
- Classic use: find rows where a value compares against a per-group aggregate (e.g., salary vs department average).
- Most correlated subqueries can be rewritten as JOINs with derived tables for better performance.
- EXISTS-based correlated subqueries are often the best way to express semi-join or anti-join logic.
FAQ
Yes. A correlated scalar subquery in SELECT computes a value for each row: SELECT name, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count FROM customers c. This is often called a "correlated column subquery." It is convenient but slow for large tables β a LEFT JOIN with GROUP BY is usually faster.
Check whether the inner query references any alias or column from the outer FROM clause. If you can run the inner query independently (without the outer query's context) and get a meaningful result, it is a regular subquery. If it references outer columns and would fail or produce no result standalone, it is correlated.
Major databases (PostgreSQL, MySQL 8, SQL Server, Oracle) apply "subquery unnesting" to convert many correlated subqueries into more efficient join plans automatically. However, this is not guaranteed for all query patterns. Always check the EXPLAIN plan to see whether the optimiser actually avoided a nested loop execution.