GROUP BY Syntax
The GROUP BY clause comes after WHERE and before ORDER BY in a SELECT statement:
SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition -- optional: filters rows BEFORE grouping
GROUP BY column1
ORDER BY column1; -- optional
This is the fundamental rule. If you select department and COUNT(*), then department must be in the GROUP BY clause. Aggregate functions like COUNT, SUM, AVG are exempt β they summarise the group.
How Grouping Works
Think of GROUP BY as sorting the rows into buckets. Rows with the same value in the GROUP BY column are placed in the same bucket. Then each aggregate function runs once per bucket, producing one output row per group.
Given this employees table:
| id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 90000 |
| 2 | Bob | Engineering | 85000 |
| 3 | Carol | Marketing | 72000 |
| 4 | Dave | Marketing | 68000 |
| 5 | Eve | HR | 60000 |
Using GROUP BY with Aggregate Functions
Count employees and compute average salary per department:
SELECT
department,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees
GROUP BY department
ORDER BY headcount DESC;
Marketing β headcount: 2, avg_salary: 70000, total_payroll: 140000
HR β headcount: 1, avg_salary: 60000, total_payroll: 60000
COUNT(*) counts every row in the group including NULLs. COUNT(column) counts only non-NULL values in that column. Use COUNT(*) for headcounts and COUNT(column) when you want to know how many rows have a value in a specific field.
Grouping by Multiple Columns
You can group by more than one column. Each unique combination of the grouped columns forms its own group:
SELECT
department,
job_title,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, job_title;
This groups by the combination of department AND job_title. "Engineering / Senior Engineer" and "Engineering / Junior Engineer" become separate groups even though they share the same department.
-- Sales totals per year and month
SELECT
YEAR(order_date) AS yr,
MONTH(order_date) AS mo,
SUM(amount) AS monthly_sales
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;
GROUP BY Rules
Follow these rules to write correct GROUP BY queries:
| Rule | Explanation |
|---|---|
| Non-aggregate = GROUP BY | Every column in SELECT that is not inside an aggregate function must appear in the GROUP BY list. |
| NULL forms its own group | All rows where the GROUP BY column is NULL are placed in a single NULL group. |
| HAVING for post-group filtering | To filter groups (e.g., only departments with more than 5 people), use HAVING, not WHERE. |
| ORDER BY can use aliases | You can ORDER BY an aggregate alias like ORDER BY headcount DESC even though it was defined in SELECT. |
If the GROUP BY column contains NULLs, all NULL rows are grouped together as one group. The aggregate functions compute over those NULL rows just like any other group. Keep this in mind when your data has optional columns.
-- Demonstrate NULL group: employees with no department assigned
SELECT
department, -- will show NULL for unassigned employees
COUNT(*) AS headcount
FROM employees
GROUP BY department;
-- The NULL group appears as a row where department IS NULL
Summary
π Summary
- GROUP BY collapses all rows sharing the same column value into a single output row.
- Aggregate functions (
COUNT,SUM,AVG,MIN,MAX) compute one result per group. - Every non-aggregate column in
SELECTmust also appear inGROUP BY. - You can group by multiple columns β each unique combination becomes a separate group.
- NULL values form their own group.
- Use HAVING (not WHERE) to filter groups after aggregation.
FAQ
Yes. WHERE filters individual rows BEFORE grouping occurs. Only rows that pass the WHERE condition are included in the groups. To filter groups themselves (after aggregation), use HAVING. The order is: FROM β WHERE β GROUP BY β HAVING β SELECT β ORDER BY.
That is perfectly valid. You can GROUP BY a column without including it in the SELECT list. The grouping still happens by that column β you just do not see it in the output. This is useful when you want to group by a technical key but only display a human-readable label.
Yes. You can GROUP BY any scalar expression: GROUP BY YEAR(order_date), GROUP BY UPPER(country), or GROUP BY price * quantity. In PostgreSQL and some other databases you can also use a SELECT alias in GROUP BY, but standard SQL requires repeating the expression.
No. GROUP BY organises rows into groups for aggregation β it changes the number of output rows. ORDER BY just sorts the output rows without changing them. A GROUP BY query does not guarantee any particular sort order β add an explicit ORDER BY if you need sorted results.