What Are Aggregate Functions?
An aggregate function takes a set of rows as input and returns a single value. When used with GROUP BY, each group returns one row. When used without GROUP BY, the entire table is treated as one group.
You cannot mix aggregate functions with non-aggregated column references in SELECT unless all non-aggregate columns appear in the GROUP BY clause. Violating this is one of the most common SQL errors for beginners.
COUNT()
COUNT(*) counts all rows in the result, including rows with NULLs. COUNT(column) counts only the non-NULL values in that column. The difference matters when columns contain NULLs.
-- Count all rows
SELECT COUNT(*) FROM employees; -- 150
-- Count non-NULL values in a column
SELECT COUNT(manager_id) FROM employees; -- e.g. 145 (5 employees have no manager)
-- With GROUP BY: count employees per department
SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
ORDER BY employee_count DESC;
-- Count with condition (conditional counting)
SELECT
COUNT(*) AS total_orders,
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_orders
FROM orders;
SUM() and AVG()
SUM() adds up all non-NULL values in a numeric column. AVG() returns the arithmetic mean of non-NULL values — it is equivalent to SUM() / COUNT(column) (not COUNT(*)).
SELECT SUM(total_amount) FROM orders; -- total revenue
SELECT AVG(salary) FROM employees; -- average salary
-- Per group
SELECT
department_id,
SUM(salary) AS total_payroll,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department_id;
-- Running total with subquery
SELECT
order_date,
total_amount,
(SELECT SUM(o2.total_amount)
FROM orders o2
WHERE o2.order_date <= o1.order_date) AS running_total
FROM orders o1
ORDER BY order_date;
-- Careful: AVG with NULLs
-- If 3 of 5 rows have a salary and 2 are NULL:
-- AVG(salary) = sum_of_3 / 3 (NOT sum / 5)
MIN() and MAX()
MIN() and MAX() work on numbers, strings (alphabetical order), and dates. Both ignore NULLs.
SELECT MIN(salary) FROM employees; -- lowest salary
SELECT MAX(salary) FROM employees; -- highest salary
-- On dates
SELECT MIN(order_date) AS first_order,
MAX(order_date) AS latest_order
FROM orders;
-- On strings (alphabetical)
SELECT MIN(last_name) AS first_alpha,
MAX(last_name) AS last_alpha
FROM employees;
-- Find the most expensive product per category
SELECT category_id, MAX(price) AS max_price
FROM products
GROUP BY category_id;
-- Combine all aggregates for a full summary
SELECT
COUNT(*) AS total_employees,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
ROUND(AVG(salary),2) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees;
COUNT(DISTINCT col)
COUNT(DISTINCT column) counts the number of unique non-NULL values. Use it to answer "how many unique customers placed an order?" rather than "how many orders were placed?"
-- Unique customers who placed at least one order
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
-- Unique products ordered per month
SELECT
YEAR(order_date) AS yr,
MONTH(order_date) AS mo,
COUNT(DISTINCT product_id) AS unique_products,
COUNT(*) AS total_line_items
FROM order_items
GROUP BY yr, mo;
-- Compare: COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col)
-- Given: orders with customer_ids [1, 2, 2, 3, NULL, NULL]
SELECT
COUNT(*) AS total_rows, -- 6
COUNT(customer_id) AS non_null, -- 4
COUNT(DISTINCT customer_id) AS unique_customers -- 3
FROM orders;
How Aggregates Handle NULLs
| Function | NULL rows | All NULLs |
|---|---|---|
COUNT(*) | Included | Returns row count |
COUNT(col) | Excluded | Returns 0 |
SUM(col) | Ignored | Returns NULL |
AVG(col) | Ignored (denominator uses non-NULL count) | Returns NULL |
MIN(col) | Ignored | Returns NULL |
MAX(col) | Ignored | Returns NULL |
If a GROUP BY group has no non-NULL values, SUM and AVG return NULL, not 0. Wrap with COALESCE(SUM(col), 0) to default to zero when that matters for your output.
Using Aggregates Without GROUP BY
When you omit GROUP BY, the entire result set is treated as one group and returns a single summary row. This is perfect for quick whole-table statistics.
-- Dashboard-style summary row
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(total_amount) AS gross_revenue,
ROUND(AVG(total_amount), 2) AS avg_order_value,
MIN(order_date) AS first_order,
MAX(order_date) AS latest_order
FROM orders
WHERE YEAR(order_date) = 2026;
-- Scalar subquery using an aggregate
SELECT
product_name,
price,
price - (SELECT AVG(price) FROM products) AS diff_from_avg
FROM products;
Summary
📋 Summary
- COUNT(*) — Counts all rows including NULLs. Use for total row count.
- COUNT(col) — Counts non-NULL values. Different from COUNT(*) when NULLs exist.
- COUNT(DISTINCT col) — Counts unique non-NULL values.
- SUM(col) — Totals non-NULL numeric values. Returns NULL if all are NULL.
- AVG(col) — Mean of non-NULL values. Denominator is non-NULL count, not row count.
- MIN(col) / MAX(col) — Work on numbers, strings, and dates. Both ignore NULLs.
- Without GROUP BY — Returns a single summary row for the whole result set.
- Wrap with COALESCE(SUM(col), 0) — Avoid NULL results on empty groups.
FAQ
COUNT(*) counts every row in the group, regardless of NULL values in any column. COUNT(column) counts only the rows where that specific column is not NULL. If a column has no NULLs, they return the same number. Always use COUNT(*) for row counts and COUNT(col) when you specifically want to count non-missing values in a column.
It depends on your business logic. SQL's AVG divides by the count of non-NULL values. If you want NULL to mean "zero" (e.g. a customer placed 0 orders this month), use AVG(COALESCE(col, 0)) to substitute 0 for NULL before averaging. If NULL genuinely means "unknown" or "not applicable", the default behavior (ignoring NULLs in the denominator) is usually correct.
No — aggregate functions cannot be used in WHERE because WHERE filters individual rows before aggregation happens. Use HAVING to filter on aggregate results: HAVING COUNT(*) > 5. The only way to use an aggregate in WHERE is inside a subquery: WHERE salary > (SELECT AVG(salary) FROM employees).
Common approaches: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees). Or using window functions (covered later): SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees) t WHERE rnk = 2. The window function approach handles ties correctly.