Iterating with .itertuples()

Iterating with .itertuples() is pandas’ fastest built-in way to loop over DataFrame rows — it yields namedtuples containing the row’s index and column values, allowing dot access (row.ColumnName) instead of dictionary lookups (row['ColumnName']). Compared to iterrows(), itertuples() avoids creating a Series object per row, making it 10–50× faster for simple row processing. In 2026, itertuples() is the go-to iteration method when vectorization isn’t possible — it’s ideal for row-wise calculations, stateful processing, or when you need positional access with minimal overhead. For large DataFrames, it’s still slower than vectorized operations, but far better than iterrows() or .iloc loops.

Here’s a complete, practical guide to using .itertuples(): syntax, row-by-row calculation patterns, performance advantages, real-world use cases, and modern best practices for when to use it (and when to avoid it).

itertuples() returns an iterator of namedtuples — each tuple has an Index field plus one field per column (names sanitized to be valid Python identifiers). Access values with dot notation for speed and readability.


import pandas as pd

df = pd.DataFrame({
    'Team': ['A', 'B', 'C'],
    'Wins': [20, 15, 10],
    'Games': [30, 25, 20]
})

for row in df.itertuples():
    print(f"Index {row.Index}: {row.Team} won {row.Wins} out of {row.Games} games")
# Index 0: A won 20 out of 30 games
# Index 1: B won 15 out of 25 games
# Index 2: C won 10 out of 20 games

Row-by-row calculation with assignment — compute win percentage and add it back using .at or pre-allocate the column for speed.


# Pre-allocate column for faster assignment
df['Win Percentage'] = 0.0

for row in df.itertuples():
    win_pct = row.Wins / row.Games if row.Games > 0 else 0
    df.at[row.Index, 'Win Percentage'] = win_pct

print(df)
#   Team  Wins  Games  Win Percentage
# 0    A    20     30        0.666667
# 1    B    15     25        0.600000
# 2    C    10     20        0.500000

Real-world pattern: stateful or conditional row processing — itertuples() is useful when vectorization is hard (e.g., cumulative stats, per-row decisions, or external lookups).


# Running win percentage (cumulative)
df['Cumulative Wins'] = 0
df['Cumulative Games'] = 0

for row in df.itertuples():
    if row.Index == 0:
        df.at[row.Index, 'Cumulative Wins'] = row.Wins
        df.at[row.Index, 'Cumulative Games'] = row.Games
    else:
        prev = df.iloc[row.Index - 1]
        df.at[row.Index, 'Cumulative Wins'] = prev['Cumulative Wins'] + row.Wins
        df.at[row.Index, 'Cumulative Games'] = prev['Cumulative Games'] + row.Games
    df.at[row.Index, 'Running Win %'] = df.at[row.Index, 'Cumulative Wins'] / df.at[row.Index, 'Cumulative Games']

print(df)

Best practices make .itertuples() iteration fast and safe. Prefer itertuples() over iterrows() — namedtuples are faster to access (dot notation vs dict lookup) and avoid Series overhead. Use name='Pandas' or name=None to control tuple type name — name=None gives plain tuples for speed. Avoid modifying DataFrame size inside loop — pre-allocate columns to prevent fragmentation. Modern tip: switch to Polars for large data — df.with_row_count().with_columns(...) or df.with_columns(pl.col("Wins") / pl.col("Games")) is 10–100× faster than pandas iteration. Add type hints — pd.DataFrame with column types — improves static analysis. In production, profile with timeit or cProfile — iteration is often the bottleneck. Use chunking for huge files — pd.read_csv(chunksize=...) or Polars streaming — keeps memory flat. Combine with shift() for lag/lead comparisons — df['Prev Wins'] = df['Wins'].shift(1) — vectorized and fast. Avoid itertuples() on very large DataFrames if possible — vectorize or use Polars instead.

Iterating with .itertuples() gives fast, named row access when vectorization isn’t feasible — better than iterrows(), but still slower than vectorized ops. In 2026, use it for complex row logic, prefer vectorization/Polars for scale, and profile to confirm. Master when to iterate vs. vectorize, and you’ll process tabular data efficiently — fast, clean, and at scale.

Next time you need row-by-row access with speed — reach for .itertuples(). It’s pandas’ cleanest way to say: “Here’s each row as a fast namedtuple — use dot access.”

Generating content...