SQL DISTINCT – Removing Duplicate Rows

DISTINCT Syntax

Place DISTINCT immediately after the SELECT keyword. It applies to the entire SELECT list — all columns together are used to determine uniqueness.

SQL

-- Without DISTINCT: returns every row (may include duplicates)
SELECT department FROM employees;
-- Returns: Engineering, Marketing, Engineering, Marketing, Engineering
-- (one row per employee, repeated departments)

-- With DISTINCT: returns each unique department once
SELECT DISTINCT department FROM employees;
-- Returns: Engineering, Marketing

Without DISTINCT	With DISTINCT
Engineering	Engineering
Marketing	Marketing
Engineering
Marketing
Engineering

ℹ️

Common use cases for DISTINCT

DISTINCT is useful for data exploration: "what unique values exist in this column?" It answers questions like: "what departments do we have?", "which countries have placed orders?", "what product categories exist in the database?". It is also used in joins to collapse repeated rows that result from one-to-many relationships.

DISTINCT with Multiple Columns

When you list multiple columns after SELECT DISTINCT, DISTINCT operates on the combination of all listed columns. A row is a duplicate only if every column value in the row matches another row.

SQL

-- Unique city-country combinations
SELECT DISTINCT city, country
FROM customers
ORDER BY country, city;

-- Example data:
-- (Paris, France), (London, UK), (Paris, France) → removes second Paris/France
-- Returns: (Paris, France), (London, UK)

-- Unique department-role combinations
SELECT DISTINCT department, job_title
FROM employees
ORDER BY department;
-- Returns each unique pairing, even if the same department appears multiple times
-- with different job titles

💡

DISTINCT applies to the entire row

SELECT DISTINCT city, country does not mean "distinct city AND distinct country separately". It means "distinct combinations of (city, country) as a pair". Two rows with the same city but different countries are NOT duplicates and both appear in the result.

COUNT(DISTINCT column)

A very common pattern is using DISTINCT inside the COUNT() aggregate function to count the number of unique values in a column:

SQL

-- How many distinct departments exist?
SELECT COUNT(DISTINCT department) AS num_departments
FROM employees;
-- Returns: 2 (Engineering, Marketing)

-- How many unique customers have placed at least one order?
SELECT COUNT(DISTINCT customer_id) AS active_customers
FROM orders;

-- How many unique products have been sold?
SELECT COUNT(DISTINCT product_id) AS products_sold
FROM order_items;

-- Compare: total rows vs distinct values
SELECT
    COUNT(*)                AS total_employees,
    COUNT(DISTINCT department) AS distinct_departments
FROM employees;
-- total_employees: 5, distinct_departments: 2

DISTINCT vs GROUP BY

For simple deduplication, DISTINCT and GROUP BY on the same columns produce identical results. The choice comes down to intent and whether you need aggregations:

SQL

-- These two queries return the same result:
SELECT DISTINCT department FROM employees ORDER BY department;

SELECT department FROM employees GROUP BY department ORDER BY department;

-- But GROUP BY is more powerful — it can aggregate:
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY headcount DESC;

-- DISTINCT cannot do this — DISTINCT alone cannot produce aggregates

Aspect	DISTINCT	GROUP BY
Removes duplicates	Yes	Yes
Can aggregate (SUM, COUNT, AVG)	No	Yes
Can filter groups (HAVING)	No	Yes
Syntax simplicity	Simpler	More explicit
Best for	Simple deduplication only	Grouping with aggregation

ℹ️

Performance: DISTINCT requires a sort or hash

To find unique values, the database must either sort all rows and remove adjacent duplicates, or build a hash table of seen values. Both operations can be expensive on large tables. If you need deduplication frequently on the same column, ensure that column is indexed. For tables with millions of rows, test whether DISTINCT or GROUP BY performs better in your specific database and query plan.

DISTINCT: Removing Duplicate Rows

DISTINCT drops duplicate rows from the result. The key subtlety: it looks at the whole row of selected columns, not just one — two rows are "duplicates" only if every selected column matches.

SELECT DISTINCT city FROM customers;          -- unique cities
SELECT DISTINCT city, country FROM customers;  -- unique city+country PAIRS

The second query can return two rows with the same city if their countries differ — because distinctness applies across both columns together, not per column.

Query	Counts
`COUNT(*)`	all rows
`COUNT(city)`	non-NULL cities
`COUNT(DISTINCT city)`	unique cities

DISTINCT vs GROUP BY: SELECT DISTINCT city and SELECT city ... GROUP BY city give the same rows — but use GROUP BY when you also need aggregates per group (a count, a sum), and DISTINCT for a plain unique list. Performance note: DISTINCT forces a sort or hash of the data, so it's not free — if duplicates come from a bad join, fix the join instead of masking it with DISTINCT.

🏋️ Practical Exercise

Return unique values with SELECT DISTINCT.
Count distinct values.
Use DISTINCT on multiple columns.
Compare row counts with and without DISTINCT.
Find unique combinations of two columns.

🔥 Challenge Exercise

From an orders table, list each unique country customers ordered from, then count how many distinct countries exist. Explain how DISTINCT works across multiple columns (unique combinations, not per column).

📋 Summary

SELECT DISTINCT column — returns only unique values; duplicate rows are removed from the result.
With multiple columns, DISTINCT operates on the combination of all columns — a row is a duplicate only if every value matches.
COUNT(DISTINCT column) — counts the number of unique values in a column. A common pattern for analytics.
DISTINCT and GROUP BY produce the same result for simple deduplication, but GROUP BY is more flexible — it enables aggregation, HAVING filters, and complex grouping.
DISTINCT requires a sort or hash operation — it can be slow on large tables with many unique values. Use indexes and test performance.

Interview Questions

What does DISTINCT do?
How does DISTINCT behave with multiple columns?
What is the difference between DISTINCT and GROUP BY?
How do you count distinct values?
Does DISTINCT affect performance?

FAQ

Can I use DISTINCT with SELECT *? +

Yes — SELECT DISTINCT * removes rows where every single column value is identical. In practice, if a table has a primary key (which is unique by definition), SELECT DISTINCT * will never remove any rows because no two rows can be identical when a unique ID column exists. DISTINCT with * is only useful on tables without a unique key, or on result sets derived from joins that produce repeated rows.

How does DISTINCT handle NULL values? +

DISTINCT treats all NULLs as equal to each other. If a column has multiple NULL values, SELECT DISTINCT column returns exactly one NULL in the result. This is an exception to the normal rule that NULL is not equal to NULL in comparison expressions — for DISTINCT deduplication purposes, NULLs are considered duplicates of each other.

Can I use DISTINCT on only one column when selecting multiple? +

Standard SQL does not allow SELECT DISTINCT(col1), col2 to mean "distinct col1 only while also selecting col2". The DISTINCT keyword applies to the entire SELECT list. However, some databases (like MySQL) support GROUP_CONCAT(DISTINCT col) or window functions like ROW_NUMBER() OVER (PARTITION BY col ORDER BY ...) for more nuanced deduplication patterns. For "first row per group" queries, window functions (covered later in this course) are the proper tool.

When should I prefer DISTINCT over GROUP BY? +

Use DISTINCT when your only goal is deduplication and you do not need any aggregation. SELECT DISTINCT department FROM employees is more readable and communicates intent more clearly than SELECT department FROM employees GROUP BY department when you just want the list of unique departments. Use GROUP BY when you need counts, sums, averages, or other aggregate calculations alongside the grouped column.