The yield keyword is Python’s core mechanism for creating generators — special iterable functions that produce values one at a time, lazily, without loading the entire sequence into memory at once. When a function contains yield, calling it returns a generator object instead of executing the body immediately. Each call to next() (or iteration in a for loop) resumes execution from the last yield, runs until the next yield (or end), returns the yielded value, and pauses again — preserving local state between yields. In 2026, yield remains essential — powering memory-efficient data processing, infinite sequences, coroutines (pre-async/await era), custom iterators, and streaming pipelines in pandas/Polars, ETL jobs, web scraping, ML data loaders, and any code that handles large or unbounded data.
Here’s a complete, practical guide to the yield keyword in Python: how generators work, basic to advanced patterns, generator expressions, send/throw/close, real-world use cases, and modern best practices with type hints, performance, error handling, and pandas/Polars integration.
Basic generator function — yield pauses and returns a value; state is saved for resumption.
def even_numbers(n: int):
"""Generate even numbers from 0 to n-1."""
for i in range(n):
if i % 2 == 0:
yield i
# Usage
for num in even_numbers(10):
print(num) # 0 2 4 6 8
# Or collect into list (defeats laziness if n is large)
evens = list(even_numbers(1000000)) # still memory-efficient until list()
Generator expressions — concise, lazy alternative to list comprehensions — use parentheses instead of brackets.
squares = (x**2 for x in range(10)) # generator, not computed yet
print(next(squares)) # 0
print(next(squares)) # 1
# Memory-efficient sum of large sequence
total = sum(x**2 for x in range(1_000_000)) # no list created
Real-world pattern: streaming data processing with generators — yield items one at a time for memory efficiency in ETL, file reading, web scraping, or ML data loading.
def read_large_csv(file_path: str):
"""Yield rows from CSV one at a time — memory safe."""
with open(file_path) as f:
header = next(f).strip().split(',')
for line in f:
row = line.strip().split(',')
yield dict(zip(header, row))
# Process without loading entire file
for row in read_large_csv('huge_data.csv'):
# process row (dict)
if int(row['age']) > 30:
print(row['name'])
Best practices make generator usage safe, readable, and performant. Prefer generators over lists for large/unbounded data — yield avoids memory spikes. Use generator expressions (...) for simple lazy sequences — like list comprehensions but without the memory cost. Modern tip: use Polars for streaming data — pl.scan_csv(...).collect() or .sink_parquet() for lazy/eager processing. Add type hints — def gen() -> Generator[int, None, None] — improves clarity and mypy checks. Handle exhaustion — generators raise StopIteration once done; use next(gen, default) for safe single access. Use yield from to delegate to sub-generators — cleaner delegation. Avoid side effects in generators — keep them pure where possible. Combine with itertools — chain, islice, groupby — for composable pipelines. Test generators — consume fully in tests, check list(gen) or next(gen). Use @contextmanager with yield for resource management. Close generators explicitly if needed — gen.close() — calls cleanup if yield in try/finally.
The yield keyword creates lazy, memory-efficient generators — pause/resume execution, yield values one by one, preserve state. In 2026, use yield for streaming, large data, custom iterators; prefer generator expressions for simple cases; integrate with Polars for scalable pipelines. Master yield, and you’ll handle massive datasets, infinite sequences, and iterative processing elegantly and efficiently.
Next time you need a sequence without loading everything — reach for yield. It’s Python’s cleanest way to say: “Give me values one at a time, when I ask.”