Python Generators – yield, Generator Expressions & Lazy Evaluation

What Are Generators?

A generator is a special kind of iterator that generates values lazily. "Lazily" means it computes the next value only when you request it, rather than computing all values upfront and storing them in memory.

Consider the difference between these two approaches to producing one million numbers:

Python

import sys

# ❌ List — builds ALL 1,000,000 numbers in memory immediately
big_list = [x * 2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(big_list):,} bytes")  # ~8,000,056 bytes (~8 MB)

# ✅ Generator — holds virtually no data; computes on demand
big_gen = (x * 2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(big_gen):,} bytes")  # 104 bytes

▶ Output

List size: 8,448,728 bytes Generator size: 104 bytes

💡

The Core Idea

A generator is like a recipe, not a finished meal. The recipe describes how to produce a value, but no food is made until someone actually asks for it.

yield vs return

The yield keyword is what turns a regular function into a generator function. The difference between return and yield is fundamental:

Feature	`return`	`yield`
Produces	One value, then exits	One value, then pauses
Function state	Destroyed after return	Preserved (local vars kept)
Resumable	No	Yes — resumes from yield point
Returns type	The value itself	A generator object
Memory	All values at once	One value at a time
Iteration	Must build full list first	Lazily evaluated

Python

# Regular function with return — builds entire list
def count_up_list(n):
    result = []
    for i in range(n):
        result.append(i)
    return result  # returns ALL at once

# Generator function with yield — produces one at a time
def count_up_gen(n):
    for i in range(n):
        yield i  # pauses here, returns i, resumes next call

# Using them
normal = count_up_list(5)
print(type(normal))  # <class 'list'>
print(normal)        # [0, 1, 2, 3, 4]

gen = count_up_gen(5)
print(type(gen))     # <class 'generator'>
print(gen)           # <generator object count_up_gen at 0x...>

Generator Functions in Depth

Any function containing at least one yield statement is a generator function. Calling it does not execute the body — it returns a generator object. The body runs only when you iterate over that object.

Python

def my_generator():
    print("-- Step 1: Before first yield --")
    yield 10
    print("-- Step 2: Between yields --")
    yield 20
    print("-- Step 3: Before last yield --")
    yield 30
    print("-- Step 4: Generator exhausted --")

gen = my_generator()          # Nothing printed yet!
print("Generator created")

value = next(gen)             # Runs until first yield
print(f"Got: {value}")

value = next(gen)             # Resumes from after first yield
print(f"Got: {value}")

value = next(gen)             # Resumes from after second yield
print(f"Got: {value}")

▶ Output

Generator created -- Step 1: Before first yield -- Got: 10 -- Step 2: Between yields -- Got: 20 -- Step 3: Before last yield -- Got: 30

ℹ️

Execution Pauses at yield

Each time yield is hit, the function freezes in place — local variables, loop counters, everything — until next() is called again. This is the magic of generators.

next() and StopIteration

The built-in next() function advances a generator by one step. When the generator function returns (or falls off the end), Python raises a StopIteration exception to signal exhaustion.

Python

def three_values():
    yield "a"
    yield "b"
    yield "c"

gen = three_values()

print(next(gen))   # a
print(next(gen))   # b
print(next(gen))   # c

try:
    print(next(gen))  # Generator is exhausted
except StopIteration:
    print("Generator exhausted — no more values!")

# Provide a default to avoid the exception
gen2 = three_values()
print(next(gen2, "default"))  # a
print(next(gen2, "default"))  # b
print(next(gen2, "default"))  # c
print(next(gen2, "default"))  # default  (no exception)

▶ Output

a b c Generator exhausted — no more values! a b c default

In practice, you rarely call next() directly. A for loop calls it automatically and catches StopIteration for you:

Python

def squares(n):
    for i in range(1, n + 1):
        yield i ** 2

# for loop handles next() and StopIteration automatically
for sq in squares(5):
    print(sq, end=" ")  # 1 4 9 16 25

Generator Expressions

Just as list comprehensions create lists, generator expressions create generators using a similar syntax — but with parentheses instead of square brackets. They are the fastest way to write a simple generator.

Python

# List comprehension — eagerly builds the full list
squares_list = [x**2 for x in range(10)]

# Generator expression — lazy, computes on demand
squares_gen = (x**2 for x in range(10))

print(type(squares_list))  # <class 'list'>
print(type(squares_gen))   # <class 'generator'>

# Consume the generator
print(list(squares_gen))   # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Generator expressions work directly in function calls
total = sum(x**2 for x in range(10))   # No extra parentheses needed
print(total)  # 285

# Filtering with a condition
evens = (x for x in range(20) if x % 2 == 0)
print(list(evens))  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Infinite Generators

One of the most compelling uses of generators is producing infinite sequences. Because values are computed on demand, you can define a generator that never ends — and simply stop iterating when you have enough values.

Python

def integers_from(start=0):
    """Infinite counter starting from `start`."""
    n = start
    while True:         # This loop never ends!
        yield n
        n += 1

def fibonacci():
    """Infinite Fibonacci sequence."""
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Take only the first 10 integers from 5
counter = integers_from(5)
first_ten = [next(counter) for _ in range(10)]
print(first_ten)  # [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

# First 10 Fibonacci numbers
fib = fibonacci()
fib_list = [next(fib) for _ in range(10)]
print(fib_list)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

# Use itertools.islice to slice infinite generators
from itertools import islice
primes_of_fib = list(islice(fibonacci(), 15))
print(primes_of_fib)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

⚠️

Never call list() on an infinite generator

Doing list(fibonacci()) will run forever (or until your RAM is exhausted). Always use islice(), a limited loop, or a break condition when consuming infinite generators.

Memory Efficiency – Why It Matters

The real power of generators shows up when processing large datasets. Here is a concrete comparison of reading a large log file:

Python

import tracemalloc

# ❌ Memory-hungry approach — loads ENTIRE file into RAM
def read_errors_list(filepath):
    with open(filepath) as f:
        lines = f.readlines()          # All lines in memory at once
    return [line for line in lines if "ERROR" in line]

# ✅ Generator approach — processes one line at a time
def read_errors_gen(filepath):
    with open(filepath) as f:
        for line in f:                 # Python iterates lines lazily
            if "ERROR" in line:
                yield line.strip()

# Measure memory for each
tracemalloc.start()
errors = read_errors_list("server.log")   # hypothetical 500 MB log
current, peak = tracemalloc.get_traced_memory()
print(f"List peak memory: {peak / 1024 / 1024:.1f} MB")  # ~500 MB
tracemalloc.stop()

tracemalloc.start()
for error in read_errors_gen("server.log"):
    pass  # process one line at a time
current, peak = tracemalloc.get_traced_memory()
print(f"Generator peak memory: {peak / 1024:.1f} KB")    # ~a few KB
tracemalloc.stop()

Real-World Use Case: Processing Large Files

Generators are the standard way to process CSV, JSON-lines, or log files that are too large to fit in memory:

Python

import csv

def csv_reader(filepath):
    """Yield one row at a time from a CSV file."""
    with open(filepath, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            yield row

def filter_rows(rows, key, value):
    """Yield only rows where row[key] == value."""
    for row in rows:
        if row[key] == value:
            yield row

def transform_rows(rows):
    """Yield rows with additional computed fields."""
    for row in rows:
        row["full_name"] = f"{row['first_name']} {row['last_name']}"
        yield row

# Build a pipeline — nothing runs until you iterate!
pipeline = transform_rows(
    filter_rows(
        csv_reader("users.csv"),
        key="country",
        value="US"
    )
)

# Process without ever loading the whole file
for user in pipeline:
    print(user["full_name"])

Data Pipelines with Generators

Generators compose naturally into processing pipelines — each stage reads from the previous generator without buffering intermediate results. This is the generator equivalent of Unix pipes.

Python

def source(data):
    """Stage 1: produce items."""
    for item in data:
        yield item

def double(items):
    """Stage 2: transform each item."""
    for item in items:
        yield item * 2

def only_above(items, threshold):
    """Stage 3: filter items."""
    for item in items:
        if item > threshold:
            yield item

def batch(items, size):
    """Stage 4: group items into batches."""
    batch_list = []
    for item in items:
        batch_list.append(item)
        if len(batch_list) == size:
            yield batch_list
            batch_list = []
    if batch_list:
        yield batch_list  # yield remaining items

# Wire up the pipeline
numbers = range(1, 21)
pipeline = batch(only_above(double(source(numbers)), threshold=20), size=3)

for chunk in pipeline:
    print(chunk)

# Output:
# [22, 24, 26]
# [28, 30, 32]
# [34, 36, 38]
# [40]

Advanced: send() and close()

Generators also support two-way communication. The send() method resumes the generator and passes a value back in, which becomes the result of the yield expression. The close() method throws a GeneratorExit exception into the generator to shut it down cleanly.

Python

def accumulator():
    """Running total that accepts new values via send()."""
    total = 0
    while True:
        value = yield total   # yield current total, receive next value
        if value is None:
            break
        total += value

acc = accumulator()
next(acc)         # Prime the generator (advance to first yield)

print(acc.send(10))   # 10
print(acc.send(25))   # 35
print(acc.send(5))    # 40
acc.close()           # Clean shutdown

itertools – The Generator Toolkit

The itertools module is Python's standard library of generator-based tools. They are all lazy and composable:

Python

from itertools import (
    islice, chain, count, cycle, repeat,
    takewhile, dropwhile, groupby, accumulate
)

# islice — slice any iterable lazily
first_5 = list(islice(count(100), 5))
print(first_5)   # [100, 101, 102, 103, 104]

# chain — concatenate iterables without copying
combined = list(chain([1, 2], [3, 4], [5, 6]))
print(combined)  # [1, 2, 3, 4, 5, 6]

# cycle — repeat an iterable forever
colors = cycle(["red", "green", "blue"])
palette = [next(colors) for _ in range(7)]
print(palette)   # ['red', 'green', 'blue', 'red', 'green', 'blue', 'red']

# takewhile / dropwhile — conditional slicing
nums = count(0)
small = list(takewhile(lambda x: x < 5, nums))
print(small)     # [0, 1, 2, 3, 4]

# accumulate — running totals
data = [1, 2, 3, 4, 5]
running_sum = list(accumulate(data))
print(running_sum)  # [1, 3, 6, 10, 15]

# groupby — group consecutive elements
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
for letter, group in groupby(words, key=lambda w: w[0]):
    print(f"{letter}: {list(group)}")

▶ Output

a: ['apple', 'avocado'] b: ['banana', 'blueberry'] c: ['cherry']

yield from — Delegating to Sub-generators

Python 3.3 introduced yield from to cleanly delegate to another iterable or generator, avoiding a manual inner loop:

Python

def flatten(nested):
    """Flatten arbitrarily nested lists using yield from."""
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)   # delegate to recursive call
        else:
            yield item

data = [1, [2, 3], [4, [5, 6]], 7, [8, [9, [10]]]]
print(list(flatten(data)))
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# yield from with any iterable
def chained_ranges():
    yield from range(1, 4)   # 1, 2, 3
    yield from range(10, 13) # 10, 11, 12
    yield from ["a", "b"]    # a, b

print(list(chained_ranges()))
# [1, 2, 3, 10, 11, 12, 'a', 'b']

Lazy Evaluation: Why Generators Save Memory

A list builds every element up front and holds them all in RAM. A generator produces one value at a time, on demand, remembering only where it paused. For big or infinite streams that's the difference between running and crashing.

	List `[x for x in ...]`	Generator `(x for x in ...)`
Memory	All items at once	One item at a time
Reusable	Yes, iterate many times	No — exhausted after one pass
Infinite data	Impossible	Fine (`while True: yield`)

def read_big(path):
    with open(path) as f:
        for line in f:      # yields lazily — file never fully loaded
            yield line.strip()

# sum a billion numbers using ~no memory
total = sum(n for n in range(1_000_000_000))

The exhaustion gotcha: a generator runs once. After the first full loop it's empty — looping again yields nothing, no error. Need the data twice? Materialize with list(gen) first.

🏋️ Practical Exercises

Write a generator primes() that yields prime numbers indefinitely. Use it to get the first 20 primes.
Create a generator read_in_chunks(filepath, chunk_size) that yields chunks of chunk_size bytes from a file without loading the whole file.
Build a pipeline: generate integers 1–100, keep only those divisible by 3, square each one, and sum the results using sum() with a generator expression.
Rewrite Python's built-in enumerate() as a generator function.

🔥 Challenge: Log Analysis Pipeline

Build a complete log-analysis pipeline using generators only (no lists). Given a large log file with lines like "2024-01-15 ERROR: disk full": (1) yield lines lazily, (2) parse each line into a dict with keys date, level, message, (3) filter only ERROR lines, (4) group by date using itertools.groupby, and (5) count errors per day. No intermediate list should be created at any stage.

Generator vs Iterator vs Iterable

Term	Definition	Example
Iterable	Any object you can loop over (has `__iter__`)	list, str, dict, range
Iterator	Object with `__iter__` and `__next__`	file object, zip object
Generator	An iterator created by a generator function or expression	`(x for x in ...)`

All generators are iterators, and all iterators are iterables. A generator is just the most convenient way to create a custom iterator in Python.

📋 Summary

A generator produces values one at a time using yield, pausing execution between each value.
yield differs from return in that it preserves function state and can be resumed.
Call a generator function to get a generator object; advance it with next() or a for loop.
When exhausted, generators raise StopIteration; the second argument to next() provides a default.
Generator expressions use ( ) instead of [ ] and are lazy equivalents of list comprehensions.
Infinite generators are safe because values are only computed when requested.
Generators are ideal for large file processing, infinite sequences, and data pipelines.
yield from delegates to a sub-iterable cleanly without a manual inner loop.
The itertools module provides a rich library of composable generator utilities.

Interview Questions

What is the difference between yield and return in a generator function?
What is lazy evaluation and how do generators implement it?
What exception does a generator raise when it is exhausted?
How do generator expressions differ from list comprehensions syntactically and in behaviour?
When would you choose a generator over a list?
What does yield from do and when is it useful?
How do you send data into a generator?
Name three functions from itertools and explain their use.
Can a generator be reused after it is exhausted?
What happens to local variables when a generator is paused at a yield?

FAQ

Can I iterate over a generator more than once? +

No. Once a generator is exhausted it stays empty — calling next() again just raises StopIteration. To iterate again, you must create a new generator by calling the generator function again.

Is a generator expression always better than a list comprehension? +

Not always. If you need to iterate over the results multiple times, or use indexing, a list is better. Use a generator when you process each item exactly once and especially when memory matters.

What is the difference between a generator and a coroutine? +

Generators produce values (data flows out). Coroutines consume values (data flows in via send()). In modern Python, async def coroutines are preferred for async I/O, while generators remain the standard tool for lazy iteration.

How do I convert a generator to a list? +

Use list(gen). Be careful with infinite generators — only do this when you are sure the generator will terminate.

What Are Generators?

yield vs return

Generator Functions in Depth

next() and StopIteration

Generator Expressions

Infinite Generators

Memory Efficiency – Why It Matters

Real-World Use Case: Processing Large Files

Data Pipelines with Generators

Advanced: send() and close()

itertools – The Generator Toolkit

yield from — Delegating to Sub-generators

Lazy Evaluation: Why Generators Save Memory

🏋️ Practical Exercises

🔥 Challenge: Log Analysis Pipeline

Generator vs Iterator vs Iterable

📋 Summary

Interview Questions

Related Topics

FAQ