SQL GROUP BY – Grouping Rows for Aggregation

GROUP BY Syntax

The GROUP BY clause comes after WHERE and before ORDER BY in a SELECT statement:

SQL

SELECT   column1, aggregate_function(column2)
FROM     table_name
WHERE    condition          -- optional: filters rows BEFORE grouping
GROUP BY column1
ORDER BY column1;           -- optional

💡

Every non-aggregate column in SELECT must appear in GROUP BY

This is the fundamental rule. If you select department and COUNT(*), then department must be in the GROUP BY clause. Aggregate functions like COUNT, SUM, AVG are exempt — they summarise the group.

How Grouping Works

Think of GROUP BY as sorting the rows into buckets. Rows with the same value in the GROUP BY column are placed in the same bucket. Then each aggregate function runs once per bucket, producing one output row per group.

Given this employees table:

id	name	department	salary
1	Alice	Engineering	90000
2	Bob	Engineering	85000
3	Carol	Marketing	72000
4	Dave	Marketing	68000
5	Eve	HR	60000

Using GROUP BY with Aggregate Functions

Count employees and compute average salary per department:

SQL

SELECT
    department,
    COUNT(*)        AS headcount,
    AVG(salary)     AS avg_salary,
    SUM(salary)     AS total_payroll,
    MIN(salary)     AS lowest_salary,
    MAX(salary)     AS highest_salary
FROM employees
GROUP BY department
ORDER BY headcount DESC;

▶ Result

Engineering → headcount: 2, avg_salary: 87500, total_payroll: 175000
Marketing → headcount: 2, avg_salary: 70000, total_payroll: 140000
HR → headcount: 1, avg_salary: 60000, total_payroll: 60000

ℹ️

COUNT(*) vs COUNT(column)

COUNT(*) counts every row in the group including NULLs. COUNT(column) counts only non-NULL values in that column. Use COUNT(*) for headcounts and COUNT(column) when you want to know how many rows have a value in a specific field.

Grouping by Multiple Columns

You can group by more than one column. Each unique combination of the grouped columns forms its own group:

SQL

SELECT
    department,
    job_title,
    COUNT(*)    AS headcount,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, job_title;

This groups by the combination of department AND job_title. "Engineering / Senior Engineer" and "Engineering / Junior Engineer" become separate groups even though they share the same department.

SQL

-- Sales totals per year and month
SELECT
    YEAR(order_date)  AS yr,
    MONTH(order_date) AS mo,
    SUM(amount)       AS monthly_sales
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;

GROUP BY Rules

Follow these rules to write correct GROUP BY queries:

Rule	Explanation
Non-aggregate = GROUP BY	Every column in SELECT that is not inside an aggregate function must appear in the GROUP BY list.
NULL forms its own group	All rows where the GROUP BY column is NULL are placed in a single NULL group.
HAVING for post-group filtering	To filter groups (e.g., only departments with more than 5 people), use HAVING, not WHERE.
ORDER BY can use aliases	You can ORDER BY an aggregate alias like `ORDER BY headcount DESC` even though it was defined in SELECT.

⚠️

NULL grouping behaviour

If the GROUP BY column contains NULLs, all NULL rows are grouped together as one group. The aggregate functions compute over those NULL rows just like any other group. Keep this in mind when your data has optional columns.

SQL

-- Demonstrate NULL group: employees with no department assigned
SELECT
    department,          -- will show NULL for unassigned employees
    COUNT(*) AS headcount
FROM employees
GROUP BY department;
-- The NULL group appears as a row where department IS NULL

Summary

The GROUP BY Rule Everyone Breaks Once

The cardinal rule: every column in SELECT must either be inside an aggregate function or listed in GROUP BY. Break it and you get an error (or, in older MySQL, silently wrong data):

-- ❌ customer_name is neither aggregated nor grouped
SELECT customer_name, COUNT(*)
FROM orders
GROUP BY customer_id;

-- ✅ group by what you select
SELECT customer_name, COUNT(*)
FROM orders
GROUP BY customer_name;

Think of GROUP BY as collapsing many rows into one per group. A non-grouped, non-aggregated column has many possible values for that single output row — the database can't pick one.

WHERE vs HAVING — filter rows vs filter groups

SELECT region, SUM(amount) AS total
FROM orders
WHERE order_date >= '2024-01-01'   -- filters ROWS before grouping
GROUP BY region
HAVING SUM(amount) > 10000;         -- filters GROUPS after aggregating

WHERE can't see aggregates (it runs before grouping); HAVING is the only place to filter on SUM, COUNT, etc.

🏋️ Practical Exercise

Group rows by one column.
Count rows per group.
Group by multiple columns.
Combine a group with an aggregate.
Order the grouped results.

🔥 Challenge Exercise

On a sales table, compute total and average revenue per region and per region, product. Explain the rule that every non-aggregated SELECT column must appear in GROUP BY.

📋 Summary

GROUP BY collapses all rows sharing the same column value into a single output row.
Aggregate functions (COUNT, SUM, AVG, MIN, MAX) compute one result per group.
Every non-aggregate column in SELECT must also appear in GROUP BY.
You can group by multiple columns — each unique combination becomes a separate group.
NULL values form their own group.
Use HAVING (not WHERE) to filter groups after aggregation.

Interview Questions

What does GROUP BY do?
Which columns can appear in SELECT alongside GROUP BY?
How does GROUP BY interact with aggregates?
What is the difference between GROUP BY and DISTINCT?
In what order do WHERE, GROUP BY, and HAVING execute?

FAQ

Can I use a WHERE clause with GROUP BY? +

Yes. WHERE filters individual rows BEFORE grouping occurs. Only rows that pass the WHERE condition are included in the groups. To filter groups themselves (after aggregation), use HAVING. The order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY.

What happens if I GROUP BY a column not in SELECT? +

That is perfectly valid. You can GROUP BY a column without including it in the SELECT list. The grouping still happens by that column — you just do not see it in the output. This is useful when you want to group by a technical key but only display a human-readable label.

Can I GROUP BY an expression or function result? +

Yes. You can GROUP BY any scalar expression: GROUP BY YEAR(order_date), GROUP BY UPPER(country), or GROUP BY price * quantity. In PostgreSQL and some other databases you can also use a SELECT alias in GROUP BY, but standard SQL requires repeating the expression.

Is GROUP BY the same as ORDER BY? +

No. GROUP BY organises rows into groups for aggregation — it changes the number of output rows. ORDER BY just sorts the output rows without changing them. A GROUP BY query does not guarantee any particular sort order — add an explicit ORDER BY if you need sorted results.