Iterating with .iterrows()

Iterating with .iterrows() is pandas’ built-in way to loop over DataFrame rows one by one, yielding each row as a (index, Series) pair — giving you access to row data via column names. While simple and intuitive, iterrows() is generally the slowest iteration method in pandas due to Series creation overhead per row, making it unsuitable for large DataFrames (millions of rows). In 2026, iterrows() is best reserved for small DataFrames, debugging, or complex row-wise logic that can’t be vectorized — for performance on real data, prefer vectorized operations, itertuples(), apply(), or Polars equivalents.

Here’s a complete, practical guide to using .iterrows(): syntax, row-by-row calculation patterns, performance pitfalls, real-world use cases, and modern best practices for when to use it (and when to avoid it).

iterrows() returns an iterator over (index, row Series) pairs — row data is accessed like a dict (row['column']) or with .at for assignment.


import pandas as pd

df = pd.DataFrame({
    'Team': ['A', 'B', 'C'],
    'Wins': [20, 15, 10],
    'Games': [30, 25, 20]
})

for index, row in df.iterrows():
    print(f"Index {index}: {row['Team']} won {row['Wins']} out of {row['Games']} games")
# Index 0: A won 20 out of 30 games
# Index 1: B won 15 out of 25 games
# Index 2: C won 10 out of 20 games

Row-by-row calculation and assignment with iterrows() — compute win percentage and add it back to the DataFrame using .at — works but is slow for large DataFrames.


# Inefficient: iterrows() + .at assignment
for index, row in df.iterrows():
    win_pct = row['Wins'] / row['Games'] if row['Games'] > 0 else 0
    df.at[index, 'Win Percentage'] = win_pct

print(df)
#   Team  Wins  Games  Win Percentage
# 0    A    20     30        0.666667
# 1    B    15     25        0.600000
# 2    C    10     20        0.500000

Real-world pattern: row-wise processing with state or complex logic — iterrows() is useful when vectorization is difficult (e.g., cumulative calculations, conditional branching per row).


# Running win percentage (cumulative)
df['Cumulative Wins'] = 0
df['Cumulative Games'] = 0

for index, row in df.iterrows():
    if index == 0:
        df.at[index, 'Cumulative Wins'] = row['Wins']
        df.at[index, 'Cumulative Games'] = row['Games']
    else:
        prev = df.iloc[index - 1]
        df.at[index, 'Cumulative Wins'] = prev['Cumulative Wins'] + row['Wins']
        df.at[index, 'Cumulative Games'] = prev['Cumulative Games'] + row['Games']
    df.at[index, 'Running Win %'] = df.at[index, 'Cumulative Wins'] / df.at[index, 'Cumulative Games']

print(df)

Best practices make .iterrows() iteration safe and efficient. Avoid iterrows() on large DataFrames (>10,000 rows) — it’s 10–100× slower than vectorized ops or itertuples() due to Series creation per row. Prefer itertuples() for simple row iteration — returns namedtuples (faster access, less overhead). Use vectorized operations whenever possible — df['Win %'] = df['Wins'] / df['Games'] is fastest and cleanest. Modern tip: switch to Polars for large data — df.with_columns((pl.col("Wins") / pl.col("Games")).alias("Win %")) is 10–100× faster than pandas iteration. Add type hints — pd.DataFrame with column types — improves static analysis. In production, profile with timeit or cProfile — iteration is often the bottleneck. Use chunking for huge files — pd.read_csv(chunksize=...) or Polars streaming — keeps memory flat. Combine with shift() for lag/lead comparisons — df['Prev Wins'] = df['Wins'].shift(1) — vectorized and fast. Avoid modifying DataFrame structure inside loop — pre-allocate columns to prevent fragmentation.

Iterating with .iterrows() gives easy row access when vectorization isn’t feasible — but it’s slow and should be a last resort. In 2026, prefer vectorized ops, itertuples(), Polars, and chunking for speed and memory safety. Master when to iterate vs. vectorize, and you’ll process tabular data efficiently — fast, clean, and at scale.

Next time you need to process rows one by one — reach for .iterrows() only if necessary. It’s pandas’ way to say: “Here’s each row as a Series — but consider vectorizing first.”

Generating content...