What Are Generators?
A generator is a special kind of iterator that generates values lazily. "Lazily" means it computes the next value only when you request it, rather than computing all values upfront and storing them in memory.
Consider the difference between these two approaches to producing one million numbers:
import sys
# β List β builds ALL 1,000,000 numbers in memory immediately
big_list = [x * 2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(big_list):,} bytes") # ~8,000,056 bytes (~8 MB)
# β
Generator β holds virtually no data; computes on demand
big_gen = (x * 2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(big_gen):,} bytes") # 104 bytes
A generator is like a recipe, not a finished meal. The recipe describes how to produce a value, but no food is made until someone actually asks for it.
yield vs return
The yield keyword is what turns a regular function into a generator function. The difference between return and yield is fundamental:
| Feature | return | yield |
|---|---|---|
| Produces | One value, then exits | One value, then pauses |
| Function state | Destroyed after return | Preserved (local vars kept) |
| Resumable | No | Yes β resumes from yield point |
| Returns type | The value itself | A generator object |
| Memory | All values at once | One value at a time |
| Iteration | Must build full list first | Lazily evaluated |
# Regular function with return β builds entire list
def count_up_list(n):
result = []
for i in range(n):
result.append(i)
return result # returns ALL at once
# Generator function with yield β produces one at a time
def count_up_gen(n):
for i in range(n):
yield i # pauses here, returns i, resumes next call
# Using them
normal = count_up_list(5)
print(type(normal)) #
print(normal) # [0, 1, 2, 3, 4]
gen = count_up_gen(5)
print(type(gen)) #
print(gen) #
Generator Functions in Depth
Any function containing at least one yield statement is a generator function. Calling it does not execute the body β it returns a generator object. The body runs only when you iterate over that object.
def my_generator():
print("-- Step 1: Before first yield --")
yield 10
print("-- Step 2: Between yields --")
yield 20
print("-- Step 3: Before last yield --")
yield 30
print("-- Step 4: Generator exhausted --")
gen = my_generator() # Nothing printed yet!
print("Generator created")
value = next(gen) # Runs until first yield
print(f"Got: {value}")
value = next(gen) # Resumes from after first yield
print(f"Got: {value}")
value = next(gen) # Resumes from after second yield
print(f"Got: {value}")
Each time yield is hit, the function freezes in place β local variables, loop counters, everything β until next() is called again. This is the magic of generators.
next() and StopIteration
The built-in next() function advances a generator by one step. When the generator function returns (or falls off the end), Python raises a StopIteration exception to signal exhaustion.
def three_values():
yield "a"
yield "b"
yield "c"
gen = three_values()
print(next(gen)) # a
print(next(gen)) # b
print(next(gen)) # c
try:
print(next(gen)) # Generator is exhausted
except StopIteration:
print("Generator exhausted β no more values!")
# Provide a default to avoid the exception
gen2 = three_values()
print(next(gen2, "default")) # a
print(next(gen2, "default")) # b
print(next(gen2, "default")) # c
print(next(gen2, "default")) # default (no exception)
In practice, you rarely call next() directly. A for loop calls it automatically and catches StopIteration for you:
def squares(n):
for i in range(1, n + 1):
yield i ** 2
# for loop handles next() and StopIteration automatically
for sq in squares(5):
print(sq, end=" ") # 1 4 9 16 25
Generator Expressions
Just as list comprehensions create lists, generator expressions create generators using a similar syntax β but with parentheses instead of square brackets. They are the fastest way to write a simple generator.
# List comprehension β eagerly builds the full list
squares_list = [x**2 for x in range(10)]
# Generator expression β lazy, computes on demand
squares_gen = (x**2 for x in range(10))
print(type(squares_list)) #
print(type(squares_gen)) #
# Consume the generator
print(list(squares_gen)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# Generator expressions work directly in function calls
total = sum(x**2 for x in range(10)) # No extra parentheses needed
print(total) # 285
# Filtering with a condition
evens = (x for x in range(20) if x % 2 == 0)
print(list(evens)) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Infinite Generators
One of the most compelling uses of generators is producing infinite sequences. Because values are computed on demand, you can define a generator that never ends β and simply stop iterating when you have enough values.
def integers_from(start=0):
"""Infinite counter starting from `start`."""
n = start
while True: # This loop never ends!
yield n
n += 1
def fibonacci():
"""Infinite Fibonacci sequence."""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Take only the first 10 integers from 5
counter = integers_from(5)
first_ten = [next(counter) for _ in range(10)]
print(first_ten) # [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
# First 10 Fibonacci numbers
fib = fibonacci()
fib_list = [next(fib) for _ in range(10)]
print(fib_list) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
# Use itertools.islice to slice infinite generators
from itertools import islice
primes_of_fib = list(islice(fibonacci(), 15))
print(primes_of_fib) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]
Doing list(fibonacci()) will run forever (or until your RAM is exhausted). Always use islice(), a limited loop, or a break condition when consuming infinite generators.
Memory Efficiency β Why It Matters
The real power of generators shows up when processing large datasets. Here is a concrete comparison of reading a large log file:
import tracemalloc
# β Memory-hungry approach β loads ENTIRE file into RAM
def read_errors_list(filepath):
with open(filepath) as f:
lines = f.readlines() # All lines in memory at once
return [line for line in lines if "ERROR" in line]
# β
Generator approach β processes one line at a time
def read_errors_gen(filepath):
with open(filepath) as f:
for line in f: # Python iterates lines lazily
if "ERROR" in line:
yield line.strip()
# Measure memory for each
tracemalloc.start()
errors = read_errors_list("server.log") # hypothetical 500 MB log
current, peak = tracemalloc.get_traced_memory()
print(f"List peak memory: {peak / 1024 / 1024:.1f} MB") # ~500 MB
tracemalloc.stop()
tracemalloc.start()
for error in read_errors_gen("server.log"):
pass # process one line at a time
current, peak = tracemalloc.get_traced_memory()
print(f"Generator peak memory: {peak / 1024:.1f} KB") # ~a few KB
tracemalloc.stop()
Real-World Use Case: Processing Large Files
Generators are the standard way to process CSV, JSON-lines, or log files that are too large to fit in memory:
import csv
def csv_reader(filepath):
"""Yield one row at a time from a CSV file."""
with open(filepath, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
yield row
def filter_rows(rows, key, value):
"""Yield only rows where row[key] == value."""
for row in rows:
if row[key] == value:
yield row
def transform_rows(rows):
"""Yield rows with additional computed fields."""
for row in rows:
row["full_name"] = f"{row['first_name']} {row['last_name']}"
yield row
# Build a pipeline β nothing runs until you iterate!
pipeline = transform_rows(
filter_rows(
csv_reader("users.csv"),
key="country",
value="US"
)
)
# Process without ever loading the whole file
for user in pipeline:
print(user["full_name"])
Data Pipelines with Generators
Generators compose naturally into processing pipelines β each stage reads from the previous generator without buffering intermediate results. This is the generator equivalent of Unix pipes.
def source(data):
"""Stage 1: produce items."""
for item in data:
yield item
def double(items):
"""Stage 2: transform each item."""
for item in items:
yield item * 2
def only_above(items, threshold):
"""Stage 3: filter items."""
for item in items:
if item > threshold:
yield item
def batch(items, size):
"""Stage 4: group items into batches."""
batch_list = []
for item in items:
batch_list.append(item)
if len(batch_list) == size:
yield batch_list
batch_list = []
if batch_list:
yield batch_list # yield remaining items
# Wire up the pipeline
numbers = range(1, 21)
pipeline = batch(only_above(double(source(numbers)), threshold=20), size=3)
for chunk in pipeline:
print(chunk)
# Output:
# [22, 24, 26]
# [28, 30, 32]
# [34, 36, 38]
# [40]
Advanced: send() and close()
Generators also support two-way communication. The send() method resumes the generator and passes a value back in, which becomes the result of the yield expression. The close() method throws a GeneratorExit exception into the generator to shut it down cleanly.
def accumulator():
"""Running total that accepts new values via send()."""
total = 0
while True:
value = yield total # yield current total, receive next value
if value is None:
break
total += value
acc = accumulator()
next(acc) # Prime the generator (advance to first yield)
print(acc.send(10)) # 10
print(acc.send(25)) # 35
print(acc.send(5)) # 40
acc.close() # Clean shutdown
itertools β The Generator Toolkit
The itertools module is Python's standard library of generator-based tools. They are all lazy and composable:
from itertools import (
islice, chain, count, cycle, repeat,
takewhile, dropwhile, groupby, accumulate
)
# islice β slice any iterable lazily
first_5 = list(islice(count(100), 5))
print(first_5) # [100, 101, 102, 103, 104]
# chain β concatenate iterables without copying
combined = list(chain([1, 2], [3, 4], [5, 6]))
print(combined) # [1, 2, 3, 4, 5, 6]
# cycle β repeat an iterable forever
colors = cycle(["red", "green", "blue"])
palette = [next(colors) for _ in range(7)]
print(palette) # ['red', 'green', 'blue', 'red', 'green', 'blue', 'red']
# takewhile / dropwhile β conditional slicing
nums = count(0)
small = list(takewhile(lambda x: x < 5, nums))
print(small) # [0, 1, 2, 3, 4]
# accumulate β running totals
data = [1, 2, 3, 4, 5]
running_sum = list(accumulate(data))
print(running_sum) # [1, 3, 6, 10, 15]
# groupby β group consecutive elements
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
for letter, group in groupby(words, key=lambda w: w[0]):
print(f"{letter}: {list(group)}")
yield from β Delegating to Sub-generators
Python 3.3 introduced yield from to cleanly delegate to another iterable or generator, avoiding a manual inner loop:
def flatten(nested):
"""Flatten arbitrarily nested lists using yield from."""
for item in nested:
if isinstance(item, list):
yield from flatten(item) # delegate to recursive call
else:
yield item
data = [1, [2, 3], [4, [5, 6]], 7, [8, [9, [10]]]]
print(list(flatten(data)))
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# yield from with any iterable
def chained_ranges():
yield from range(1, 4) # 1, 2, 3
yield from range(10, 13) # 10, 11, 12
yield from ["a", "b"] # a, b
print(list(chained_ranges()))
# [1, 2, 3, 10, 11, 12, 'a', 'b']
ποΈ Practical Exercises
- Write a generator
primes()that yields prime numbers indefinitely. Use it to get the first 20 primes. - Create a generator
read_in_chunks(filepath, chunk_size)that yields chunks ofchunk_sizebytes from a file without loading the whole file. - Build a pipeline: generate integers 1β100, keep only those divisible by 3, square each one, and sum the results using
sum()with a generator expression. - Rewrite Python's built-in
enumerate()as a generator function.
π₯ Challenge: Log Analysis Pipeline
Build a complete log-analysis pipeline using generators only (no lists). Given a large log file with lines like "2024-01-15 ERROR: disk full": (1) yield lines lazily, (2) parse each line into a dict with keys date, level, message, (3) filter only ERROR lines, (4) group by date using itertools.groupby, and (5) count errors per day. No intermediate list should be created at any stage.
Generator vs Iterator vs Iterable
| Term | Definition | Example |
|---|---|---|
| Iterable | Any object you can loop over (has __iter__) | list, str, dict, range |
| Iterator | Object with __iter__ and __next__ | file object, zip object |
| Generator | An iterator created by a generator function or expression | (x for x in ...) |
All generators are iterators, and all iterators are iterables. A generator is just the most convenient way to create a custom iterator in Python.
- What is the difference between
yieldandreturnin a generator function? - What is lazy evaluation and how do generators implement it?
- What exception does a generator raise when it is exhausted?
- How do generator expressions differ from list comprehensions syntactically and in behaviour?
- When would you choose a generator over a list?
- What does
yield fromdo and when is it useful? - How do you send data into a generator?
- Name three functions from
itertoolsand explain their use. - Can a generator be reused after it is exhausted?
- What happens to local variables when a generator is paused at a
yield?
π Summary
- A generator produces values one at a time using
yield, pausing execution between each value. yielddiffers fromreturnin that it preserves function state and can be resumed.- Call a generator function to get a generator object; advance it with
next()or aforloop. - When exhausted, generators raise
StopIteration; the second argument tonext()provides a default. - Generator expressions use
( )instead of[ ]and are lazy equivalents of list comprehensions. - Infinite generators are safe because values are only computed when requested.
- Generators are ideal for large file processing, infinite sequences, and data pipelines.
yield fromdelegates to a sub-iterable cleanly without a manual inner loop.- The
itertoolsmodule provides a rich library of composable generator utilities.
Related Topics
Frequently Asked Questions
No. Once a generator is exhausted it stays empty β calling next() again just raises StopIteration. To iterate again, you must create a new generator by calling the generator function again.
Not always. If you need to iterate over the results multiple times, or use indexing, a list is better. Use a generator when you process each item exactly once and especially when memory matters.
Generators produce values (data flows out). Coroutines consume values (data flows in via send()). In modern Python, async def coroutines are preferred for async I/O, while generators remain the standard tool for lazy iteration.
Use list(gen). Be careful with infinite generators β only do this when you are sure the generator will terminate.