Group by to pivot table

Converting grouped data into a pivot table is a common and powerful workflow in Pandas — you first use groupby() to aggregate raw data, then reshape the result into a clean, cross-tabulated pivot table for reporting, dashboards, or further analysis. This combination gives you the flexibility of groupby with the readability of pivot tables (similar to Excel, but fully programmable).

In 2026, this pattern remains essential for sales breakdowns, cohort reports, A/B test summaries, and multi-dimensional views. Here’s a practical guide with real examples you can copy and adapt.

1. Basic Setup & Sample Data


import pandas as pd

data = {
    'Region': ['North', 'North', 'South', 'South', 'West', 'West'],
    'Salesperson': ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve', 'Frank'],
    'Sales': [100, 200, 150, 50, 75, 125]
}

df = pd.DataFrame(data)
print(df)

2. Step 1: Groupby Aggregation

First, group by the dimensions you care about and aggregate the metric(s).


# Group by Region and Salesperson ? sum of Sales
grouped = df.groupby(['Region', 'Salesperson'])['Sales'].sum()
print(grouped)

Output (Series with multi-index):


Region  Salesperson
North   Alice          100
        Bob            200
South   Charlie        150
        Dave            50
West    Eve             75
        Frank          125
Name: Sales, dtype: int64

3. Step 2: Convert Grouped Result to Pivot Table

Use pivot_table() on the grouped result (or directly on the original df) to reshape it into a clean table.


# Reshape into pivot table
pivot_from_grouped = grouped.unstack()  # or use pivot_table directly
print(pivot_from_grouped)

Output (clean pivot with NaN for missing combinations):


Salesperson  Alice  Bob  Charlie  Dave  Eve  Frank
Region                                            
North          100  200      NaN   NaN  NaN    NaN
South          NaN  NaN    150.0  50.0  NaN    NaN
West           NaN  NaN      NaN   NaN 75.0  125.0

4. Preferred Way: Use pivot_table() Directly (Simpler & More Flexible)

Skip the intermediate groupby step — pivot_table() does grouping and aggregation internally.


pivot_direct = pd.pivot_table(
    df,
    values='Sales',
    index='Region',
    columns='Salesperson',
    aggfunc='sum',
    fill_value=0,          # replace NaN with 0
    margins=True           # add grand totals
)

print(pivot_direct)

Output (with totals):


Salesperson  Alice  Bob  Charlie  Dave  Eve  Frank  All
Region                                                   
North          100  200        0     0    0      0  300
South            0    0      150    50    0      0  200
West             0    0        0     0   75    125  200
All            100  200      150    50   75    125  700

5. Advanced: Multiple Aggregations & Multiple Indexes

Group by multiple dimensions and compute multiple metrics.


# Add 'Year' for multi-index example
df['Year'] = [2025, 2025, 2026, 2026, 2025, 2026]

pivot_advanced = pd.pivot_table(
    df,
    values='Sales',
    index=['Region', 'Year'],
    columns='Salesperson',
    aggfunc=['sum', 'mean'],
    fill_value=0,
    margins=True
)

print(pivot_advanced)

6. Modern Alternative in 2026: Polars

For large datasets, Polars is often faster and more memory-efficient with a similar pivot API.


import polars as pl

df_pl = pl.DataFrame(data)
pivot_pl = df_pl.pivot(
    index="Region",
    columns="Salesperson",
    values="Sales",
    aggregate_function="sum",
    sort_columns=True
)
print(pivot_pl)

Best Practices & Common Pitfalls

Prefer pivot_table() directly over groupby + unstack — it's simpler and handles missing combinations better
Always specify aggfunc — default is mean, which surprises many
Use fill_value=0 or dropna=False to control missing data display
Add margins=True for grand totals — great for reports
Reset index after pivot if you need Region/Year as regular columns
For huge data, prefer Polars or chunked processing
Visualize: pivot_direct.plot(kind='bar', stacked=True) for instant insights

Conclusion

Grouping by multiple variables and reshaping into pivot tables with pivot_table() turns raw data into clean, multi-dimensional summaries — perfect for cross-tab reports, cohort views, and breakdowns by category. In 2026, use Pandas for readability and flexibility on small-to-medium data, and Polars for speed on large datasets. Master values, index, columns, aggfunc, margins, and fill_value, and you'll build powerful reports in minutes.

Next time you need to break down metrics by multiple dimensions — reach for pivot_table first.