SQL Correlated Subqueries – Row-by-Row Subqueries

What Is a Correlated Subquery?

In a regular (non-correlated) subquery, the inner query is independent and runs once. In a correlated subquery, the inner query references a column from the outer query, so it must run once for each row processed by the outer query.

Feature	Regular Subquery	Correlated Subquery
References outer query?	No	Yes
Runs how many times?	Once	Once per outer row
Can be cached?	Yes (run once)	No (changes per row)
Performance	Usually fast	Potentially slow (O(N) executions)

How It Works

The outer query passes the current row's values into the inner query. The inner query uses those values, returns a result, and the outer query decides whether to include that row:

SQL

-- Find employees earning more than their department average
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department = e.department   -- e.department comes from the outer query
);
-- For each row in the outer query, the inner query computes AVG for THAT department

ℹ️

Spot a correlated subquery

Look for a column reference in the inner query that uses an alias defined in the outer FROM clause (e.g., e.department where e is the outer table alias). If the inner query has no reference to the outer query, it is a regular subquery.

Correlated vs Regular Subquery

SQL

-- Regular subquery: inner query runs ONCE, returns the overall average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Correlated subquery: inner query runs ONCE PER ROW, returns dept-specific average
SELECT name, department, salary
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department = e.department);

More Correlated Subquery Examples

Find the most recent order for each customer:

SQL

SELECT o.customer_id, o.order_date, o.amount
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id   -- correlated reference
);

Find products that have never been ordered:

SQL

SELECT p.name, p.category
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.id    -- correlated reference
);

Performance Considerations

Correlated subqueries execute once per outer row. For a table with 100,000 rows, that is 100,000 subquery executions. Most can be rewritten as JOINs for better performance:

SQL

-- Correlated subquery (potentially slow for large tables)
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e.department
);

-- Equivalent JOIN with derived table (faster)
SELECT e.name, e.department, e.salary
FROM employees e
JOIN (
    SELECT department, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY department
) dept_stats ON e.department = dept_stats.department
WHERE e.salary > dept_stats.dept_avg;

⚠️

When to keep correlated subqueries

Some correlated subqueries cannot be trivially rewritten as JOINs — particularly those using EXISTS/NOT EXISTS for semi-join or anti-join semantics. Modern query optimisers (PostgreSQL, SQL Server, MySQL 8) often convert correlated subqueries to joins internally. Always EXPLAIN/profile your query on real data.

Summary

Correlated Subqueries: One That Runs Per Row

A regular subquery runs once, independently. A correlated subquery references a column from the outer query, so it re-evaluates for every outer row — like a loop. Powerful, but potentially slow.

-- each employee vs their OWN department's average — inner query
-- depends on the outer row's e1.dept, so it re-runs per employee
SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.dept = e1.dept        -- ← correlation: references e1
);

	Regular subquery	Correlated
References outer row?	no	yes
Runs	once	per outer row
Best for	a fixed value/list	row-by-row comparison

Performance: because it repeats per row, a correlated subquery over a big table can be expensive. Two common rewrites are faster: a window function (AVG(salary) OVER (PARTITION BY dept)) computes the per-group value in one pass, or a JOIN to an aggregated subquery. Reach for correlation when it reads clearest (especially with EXISTS), but know the window-function alternative when speed matters.

🏋️ Practical Exercise

Write a correlated subquery that references the outer row.
Find rows above their group’s average.
Compare correlated vs non-correlated subqueries.
Use EXISTS with correlation.
Observe that it executes per outer row.

🔥 Challenge Exercise

For each employee, find those earning more than the average salary in their own department using a correlated subquery. Explain how a correlated subquery re-runs per outer row and its performance trade-offs vs a window function.

📋 Summary

A correlated subquery references a column from the outer query — making it dependent on each outer row.
It re-executes once per outer row, which can be slow on large datasets.
Classic use: find rows where a value compares against a per-group aggregate (e.g., salary vs department average).
Most correlated subqueries can be rewritten as JOINs with derived tables for better performance.
EXISTS-based correlated subqueries are often the best way to express semi-join or anti-join logic.

Interview Questions

What is a correlated subquery?
How does it differ from a regular subquery?
Why can correlated subqueries be slow?
How does EXISTS use correlation?
How might a window function replace a correlated subquery?

FAQ

Can a correlated subquery appear in the SELECT clause? +

Yes. A correlated scalar subquery in SELECT computes a value for each row: SELECT name, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count FROM customers c. This is often called a "correlated column subquery." It is convenient but slow for large tables — a LEFT JOIN with GROUP BY is usually faster.

How do I know if my subquery is correlated or not? +

Check whether the inner query references any alias or column from the outer FROM clause. If you can run the inner query independently (without the outer query's context) and get a meaningful result, it is a regular subquery. If it references outer columns and would fail or produce no result standalone, it is correlated.

Do all database engines optimise correlated subqueries? +

Major databases (PostgreSQL, MySQL 8, SQL Server, Oracle) apply "subquery unnesting" to convert many correlated subqueries into more efficient join plans automatically. However, this is not guaranteed for all query patterns. Always check the EXPLAIN plan to see whether the optimiser actually avoided a nested loop execution.