Pivot tables

Pivot tables are one of the most powerful tools in data analysis — they let you reshape, summarize, and cross-tabulate data in seconds, turning long raw datasets into concise, multi-dimensional views (just like Excel pivot tables, but programmatic and reproducible). In Pandas, the pivot_table() function is the go-to method for this, offering flexibility that basic pivot() lacks.

In 2026, pivot tables remain essential for dashboards, cohort analysis, sales breakdowns, A/B test results, and any time you need to see metrics across categories. Here’s a practical guide with real examples you can copy and adapt.

1. Basic Setup & Sample Data


import pandas as pd

data = {
    'Region': ['North', 'North', 'South', 'South', 'West', 'West'],
    'Salesperson': ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve', 'Frank'],
    'Sales': [100, 200, 150, 50, 75, 125]
}

df = pd.DataFrame(data)
print(df)

2. Simple Pivot Table: One Index, One Column, One Value

Summarize sales by region and salesperson — classic use case.


pivot_basic = pd.pivot_table(
    df,
    values='Sales',           # what to summarize
    index='Region',           # rows
    columns='Salesperson',    # columns
    aggfunc='sum'             # aggregation function (default is mean)
)

print(pivot_basic)

Output (NaN where no data exists):


Salesperson  Alice  Bob  Charlie  Dave  Eve  Frank
Region                                            
North          100  200      NaN   NaN  NaN    NaN
South          NaN  NaN    150.0  50.0  NaN    NaN
West           NaN  NaN      NaN   NaN 75.0  125.0

3. Multiple Aggregations & Multiple Values

Calculate sum, mean, count — or use different functions for different metrics.


pivot_multi = pd.pivot_table(
    df,
    values=['Sales'],
    index='Region',
    columns='Salesperson',
    aggfunc=['sum', 'mean', 'count'],
    fill_value=0,             # replace NaN with 0
    margins=True              # add grand totals
)

print(pivot_multi)

4. Group by Multiple Indexes + Custom Aggregations

Add another dimension (e.g., year or product) and mix functions per column.


# Add a 'Year' column for realism
df['Year'] = [2025, 2025, 2026, 2026, 2025, 2026]

pivot_advanced = pd.pivot_table(
    df,
    values='Sales',
    index=['Region', 'Year'],
    columns='Salesperson',
    aggfunc={'Sales': ['sum', 'mean']},
    fill_value=0,
    margins=True
)

print(pivot_advanced)

5. Modern Alternative in 2026: Polars

For large datasets, Polars is often faster and more memory-efficient with similar syntax.


import polars as pl

df_pl = pl.DataFrame(data)
pivot_pl = df_pl.pivot(
    index="Region",
    columns="Salesperson",
    values="Sales",
    aggregate_function="sum",
    sort_columns=True
)
print(pivot_pl)

Best Practices & Common Pitfalls

Always specify aggfunc — default is mean, which surprises many
Use fill_value=0 or dropna=False to handle missing combinations
Add margins=True for grand totals — great for reports
Convert index/columns back to regular columns with .reset_index() if needed
For huge data, prefer Polars or chunked processing
Visualize: pivot_basic.plot(kind='bar', stacked=True) for instant insights

Conclusion

Pivot tables with pivot_table() turn long, raw data into concise, cross-tabulated summaries — perfect for multi-dimensional analysis. In 2026, use Pandas for readability and flexibility on small-to-medium data, and Polars for speed on large datasets. Master index, columns, values, aggfunc, margins, and fill_value, and you'll build powerful reports in minutes.

Next time you need to break down metrics by category and subcategory — reach for pivot_table first.