Introduction β The Problem They Solve
Suppose you want to create a list of squares of numbers from 1 to 10. The traditional approach uses a for loop:
# Traditional for loop
squares = []
for n in range(1, 11):
squares.append(n ** 2)
print(squares)
# [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
# Same result with a list comprehension β one line!
squares = [n ** 2 for n in range(1, 11)]
print(squares)
# [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
The list comprehension is shorter, more expressive, and often faster than the equivalent loop.
The Basic Syntax
The general form of a list comprehension is:
# [expression for variable in iterable]
result = [expression for variable in iterable]
# Common patterns:
words = ["hello", "world", "python"]
# 1. Transform each element
upper = [w.upper() for w in words]
print(upper) # ['HELLO', 'WORLD', 'PYTHON']
# 2. Apply a function
lengths = [len(w) for w in words]
print(lengths) # [5, 5, 6]
# 3. From any iterable
chars = [c for c in "Python"]
print(chars) # ['P', 'y', 't', 'h', 'o', 'n']
# 4. From a range
evens = [x * 2 for x in range(10)]
print(evens) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Comprehensions with Conditions
Add an if clause at the end to filter elements. Only items where the condition is True are included:
# [expression for variable in iterable if condition]
numbers = range(1, 21)
# Only even numbers
evens = [n for n in numbers if n % 2 == 0]
print(evens) # [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
# Only odd numbers > 10
big_odds = [n for n in numbers if n % 2 != 0 and n > 10]
print(big_odds) # [11, 13, 15, 17, 19]
# Filter strings by length
words = ["cat", "elephant", "ant", "butterfly", "bee", "hippopotamus"]
long_words = [w for w in words if len(w) > 4]
print(long_words) # ['elephant', 'butterfly', 'hippopotamus']
# Filter and transform together
scores = [45, 88, 92, 31, 70, 65, 99, 12]
passing_grades = [s for s in scores if s >= 60]
print(passing_grades) # [88, 92, 70, 65, 99]
print(f"Average of passing: {sum(passing_grades)/len(passing_grades):.1f}")
if-else Inside the Expression
You can use a ternary expression in the expression part (before for) to transform elements differently based on a condition. This is different from the filtering if after for:
# [value_if_true if condition else value_if_false for var in iterable]
numbers = range(-5, 6)
# Replace negatives with 0, keep positives
non_negative = [n if n >= 0 else 0 for n in numbers]
print(non_negative) # [0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5]
# Label even/odd
labels = ["even" if n % 2 == 0 else "odd" for n in range(1, 8)]
print(labels) # ['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd']
# Clamp values to a [0, 100] range
raw = [120, -10, 85, 101, 50, -5, 99]
clamped = [max(0, min(100, v)) for v in raw]
print(clamped) # [100, 0, 85, 100, 50, 0, 99]
Nested Comprehensions
You can nest one comprehension inside another. This is equivalent to nested for loops:
# Flatten a 2D list (list of lists)
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Equivalent loop:
# flat = []
# for row in matrix:
# for num in row:
# flat.append(num)
# Generate a multiplication table as a 2D list
table = [[i * j for j in range(1, 6)] for i in range(1, 6)]
for row in table:
print(row)
# Cartesian product of two lists
colors = ["red", "blue"]
sizes = ["S", "M", "L"]
products = [(color, size) for color in colors for size in sizes]
print(products)
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
# ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
A comprehension with two for clauses can still be readable. Three or more nested for clauses usually produce code that is harder to understand than a plain loop. Readability always wins.
Generator Expressions
Replace the square brackets with parentheses and you get a generator expression β it produces values lazily on demand instead of building the entire list in memory. This is critical for large datasets:
import sys
# List comprehension β builds entire list in RAM
squares_list = [n ** 2 for n in range(1_000_000)]
print(f"List size: {sys.getsizeof(squares_list):,} bytes")
# Generator expression β produces one value at a time
squares_gen = (n ** 2 for n in range(1_000_000))
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes")
# Use a generator when you only need to iterate once
total = sum(n ** 2 for n in range(1_000_000)) # no [] needed inside sum()!
print(f"Sum of squares: {total:,}")
# Generator with condition
large_squares = (n ** 2 for n in range(100) if n % 7 == 0)
for val in large_squares:
print(val, end=" ")
print()
# Generators are exhausted after one pass
gen = (x * 2 for x in range(5))
print(list(gen)) # [0, 2, 4, 6, 8]
print(list(gen)) # [] β generator is now empty!
Set and Dict Comprehensions
The same comprehension syntax works for sets ({expr for ...}) and dictionaries ({key: value for ...}):
# Set comprehension β automatically removes duplicates
words = ["apple", "banana", "apple", "cherry", "banana", "date"]
unique_lengths = {len(w) for w in words}
print(unique_lengths) # {4, 5, 6} (order may vary)
# Dict comprehension
# Build a word β length mapping
word_length = {w: len(w) for w in words}
print(word_length)
# {'apple': 5, 'banana': 6, 'cherry': 6, 'date': 4}
# Invert a dictionary (swap keys and values)
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted) # {1: 'a', 2: 'b', 3: 'c'}
# Dict comprehension with condition
scores = {"Alice": 88, "Bob": 42, "Carol": 95, "Dave": 58}
passing = {name: score for name, score in scores.items() if score >= 60}
print(passing) # {'Alice': 88, 'Carol': 95}
# Square numbers as a dict
square_map = {n: n**2 for n in range(1, 8)}
print(square_map) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49}
Performance: Comprehension vs for Loop
List comprehensions are generally faster than equivalent for loops because they are optimised at the CPython bytecode level. Here is a comparison:
import timeit
# Method 1: for loop with append
def using_loop(n):
result = []
for i in range(n):
result.append(i ** 2)
return result
# Method 2: list comprehension
def using_comprehension(n):
return [i ** 2 for i in range(n)]
# Method 3: map() with lambda
def using_map(n):
return list(map(lambda i: i ** 2, range(n)))
n = 100_000
t_loop = timeit.timeit(lambda: using_loop(n), number=50)
t_comp = timeit.timeit(lambda: using_comprehension(n), number=50)
t_map = timeit.timeit(lambda: using_map(n), number=50)
print(f"for loop: {t_loop:.3f}s")
print(f"comprehension: {t_comp:.3f}s")
print(f"map(lambda): {t_map:.3f}s")
| Syntax | Result type | Memory | Use when⦠|
|---|---|---|---|
[... for ...] | list | All in RAM | Need to index, sort, or reuse multiple times |
(... for ...) | generator | One item at a time | Large data; only iterate once |
{... for ...} | set | All in RAM | Need unique values |
{k: v for ...} | dict | All in RAM | Build key-value mappings |
When NOT to Use List Comprehensions
List comprehensions are powerful but not always the right tool. Avoid them when:
# β BAD: using a comprehension just for side effects
# This creates a list that's thrown away immediately
[print(x) for x in range(5)] # Wasteful!
# β
GOOD: use a regular loop for side effects
for x in range(5):
print(x)
# β BAD: comprehension that's too complex to read at a glance
result = [f(x) for x in [g(y) for y in data if pred(y)] if condition(x)]
# β
GOOD: break complex logic into named steps
filtered_y = [g(y) for y in data if pred(y)]
transformed = [f(x) for x in filtered_y if condition(x)]
# β BAD: building a huge list you only iterate once
for item in [process(x) for x in huge_data]: # builds entire list first!
do_something(item)
# β
GOOD: use a generator expression
for item in (process(x) for x in huge_data): # one item at a time
do_something(item)
Real-World Examples
import os
from pathlib import Path
# 1. Get all Python files in a directory
py_files = [f for f in Path(".").iterdir() if f.suffix == ".py"]
# 2. Parse CSV rows into dicts (without the csv module)
csv_lines = [
"Alice,30,Engineer",
"Bob,25,Designer",
"Carol,35,Manager",
]
headers = ["name", "age", "role"]
records = [
dict(zip(headers, line.split(",")))
for line in csv_lines
]
print(records)
# [{'name': 'Alice', 'age': '30', 'role': 'Engineer'}, ...]
# 3. Remove duplicates while preserving order
seen = set()
data = [1, 3, 2, 1, 4, 3, 5, 2]
unique_ordered = [x for x in data if not (x in seen or seen.add(x))]
print(unique_ordered) # [1, 3, 2, 4, 5]
# 4. Transpose a matrix
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
transposed = [[row[i] for row in matrix] for i in range(len(matrix[0]))]
for row in transposed:
print(row)
# [1, 4, 7]
# [2, 5, 8]
# [3, 6, 9]
# 5. Extract numbers from mixed list
mixed = [1, "hello", 3.14, True, "world", 42, None, 7]
numbers_only = [x for x in mixed if isinstance(x, (int, float)) and not isinstance(x, bool)]
print(numbers_only) # [1, 3.14, 42, 7]
ποΈ Practical Exercise
Using only list comprehensions (no loops), write solutions for:
- Generate all prime numbers between 2 and 50.
- From a list of sentences, extract only those that contain the word "Python" (case-insensitive).
- Flatten this 3D list:
[[[1,2],[3,4]],[[5,6],[7,8]]]into a 1D list. - Build a dict mapping each word in a sentence to the number of vowels it contains.
π₯ Challenge Exercise
Write a function parse_log(filepath) that reads a log file line by line using a generator expression and returns:
- A list of all
ERRORlevel messages (list comprehension). - A dict mapping log levels to the count of their occurrences (dict comprehension).
- A set of unique IP addresses mentioned in the log (set comprehension).
Process a file of 100,000+ lines memory-efficiently using generators.
Interview Questions
- What is a list comprehension and how does it differ from a regular for loop?
- What is the syntax for a list comprehension with a filter condition?
- How does a generator expression differ from a list comprehension?
- When should you use a generator expression instead of a list comprehension?
- How do you write a nested list comprehension? What is the equivalent for loop?
- What does
[x for x in range(5) if x % 2 == 0]produce? - What is the difference between
[x if x > 0 else 0 for x in lst]and[x for x in lst if x > 0]? - Why is using a list comprehension for side effects (like printing) considered bad practice?
- What is a set comprehension? When is it useful?
- How do you invert a dictionary using a dict comprehension?
π Summary
- List comprehension syntax:
[expression for variable in iterable]. - Add a filter:
[expression for variable in iterable if condition]. - Transform with condition:
[a if cond else b for variable in iterable]. - Nested comprehension:
[expr for x in outer for y in inner]β equivalent to nested for loops. - Generator expressions use
()instead of[]and produce values lazily, saving memory. - Set comprehensions use
{}; dict comprehensions use{key: value for ...}. - Comprehensions are typically faster than equivalent
forloops with.append(). - Avoid comprehensions for side effects, deeply nested logic, or very large datasets where generators are better.
Related Topics
Frequently Asked Questions
For simple transformations, yes β list comprehensions are typically 20β40% faster than for loops with .append() in CPython, because they use a specialised LIST_APPEND bytecode internally. However, for very complex logic with multiple conditions and function calls, the difference narrows, and readability should be your primary guide.
Yes! Python 3.8+ supports the walrus operator (assignment expression) inside comprehensions. This is useful when you want to both compute a value and filter on it without computing it twice: [y for x in data if (y := expensive(x)) > threshold].
No. In Python 3, list comprehensions have their own scope β the loop variable does not leak. This is different from Python 2 behaviour where the loop variable would be accessible in the outer scope after the comprehension. Generator expressions and set/dict comprehensions have always had their own scope.
Python imposes no hard limit, but readability is the practical constraint. Two levels of nesting are common (e.g., flattening a 2D list). Three or more levels should almost always be refactored into named variables, helper functions, or explicit loops for maintainability.