Building Custom Generator Functions in Python – Advanced Memory-Efficient Patterns 2026
When generator expressions are not enough, you can create powerful custom generator functions using the yield keyword. These functions are the most flexible way to implement memory-efficient data processing pipelines in data science.
TL;DR — How to Build a Generator Function
- Use
defandyieldinstead ofreturn - The function becomes a generator when it contains
yield - Perfect for streaming, chunked, or complex multi-step processing
1. Basic Custom Generator Function
def high_value_generator(data):
for row in data:
if row["amount"] > 1500:
yield {
"customer_id": row["customer_id"],
"amount": row["amount"],
"region": row["region"]
}
# Usage
for record in high_value_generator(sales_records):
print(f"High value: {record['customer_id']} - ${record['amount']:.2f}")
2. Real-World Data Science Examples
import pandas as pd
import csv
# Example 1: Chunked CSV reader generator
def read_csv_in_chunks(file_path, chunk_size=10000):
with open(file_path, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
chunk = []
for row in reader:
chunk.append(row)
if len(chunk) >= chunk_size:
yield chunk
chunk = []
if chunk:
yield chunk
# Example 2: Advanced feature engineering generator
def process_sales_with_features(df):
for row in df.itertuples():
profit = row.amount * 0.25
category = "Premium" if profit > 500 else "Standard"
yield {
"customer_id": row.customer_id,
"amount": row.amount,
"profit": round(profit, 2),
"category": category,
"log_amount": round(row.amount ** 0.5, 2) if row.amount > 0 else 0
}
# Usage
for processed_row in process_sales_with_features(df):
print(processed_row)
3. Best Practices in 2026
- Use generator functions when you need complex logic, multiple yields, or stateful processing
- Keep generators pure and stateless when possible
- Combine with
itertuples()for fast DataFrame iteration - Use
yield fromto delegate to another generator - Always document what the generator yields
Conclusion
Custom generator functions using yield give you maximum control while maintaining excellent memory efficiency. In 2026 data science projects, they are the go-to solution for building reusable, streaming data pipelines, chunked readers, and complex transformation workflows that would otherwise consume too much memory.
Next steps:
- Convert one of your complex data processing loops into a custom generator function and experience the memory savings