Grouping by multiple variables is one of the most powerful techniques in data analysis — it lets you slice data across several dimensions at once (e.g., sales by region and product, users by cohort and device, experiments by variant and country). In Pandas, you pass a list of column names to groupby() to create a multi-level grouping, then apply aggregations like mean, sum, count, min/max, or custom functions.
In 2026, this pattern is essential for dashboards, cohort reports, segmentation, and cross-tab analysis. Here’s a practical guide with real examples you can copy and adapt immediately.
1. Basic Setup & Sample Data
import pandas as pd
data = {
'Group1': ['A', 'B', 'C', 'A', 'B', 'C'],
'Group2': ['X', 'X', 'Y', 'Y', 'Z', 'Z'],
'Value': [1, 2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)
print(df)
2. Group by Multiple Columns + Single Aggregation
Group by two (or more) columns and apply one function — results in a multi-index Series.
# Group by Group1 and Group2 ? mean of Value
multi_group = df.groupby(['Group1', 'Group2'])['Value'].mean()
print(multi_group)
Output (hierarchical index):
Group1 Group2
A X 1.0
Y 4.0
B X 2.0
Z 5.0
C Y 3.0
Z 6.0
Name: Value, dtype: float64
3. Multiple Aggregations on Multiple Columns
Use .agg() with a dictionary to apply different functions to different columns — very common in real reports.
# Add more columns for realism
df['Sales'] = [100, 200, 150, 300, 250, 400]
df['Quantity'] = [10, 20, 15, 30, 25, 40]
# Group by two columns + different aggregations per metric
grouped_multi = df.groupby(['Group1', 'Group2']).agg({
'Value': 'mean',
'Sales': ['sum', 'mean'],
'Quantity': 'sum'
})
print(grouped_multi)
Output (multi-level columns):
Value Sales Quantity
mean sum mean sum
Group1 Group2
A X 1.0 100 50.0 10
Y 4.0 300 150.0 30
B X 2.0 200 100.0 20
Z 5.0 250 125.0 25
C Y 3.0 150 75.0 15
Z 6.0 400 200.0 40
4. Clean Output with Named Aggregations
Use NamedAgg or alias syntax for flat, readable column names (highly recommended).
from pandas import NamedAgg
named_summary = df.groupby(['Group1', 'Group2']).agg(
avg_value=('Value', 'mean'),
total_sales=('Sales', 'sum'),
avg_sales=('Sales', 'mean'),
total_qty=('Quantity', 'sum')
)
print(named_summary)
Output (clean & flat):
avg_value total_sales avg_sales total_qty
Group1 Group2
A X 1.0 100 50.0 10
Y 4.0 300 150.0 30
B X 2.0 200 100.0 20
Z 5.0 250 125.0 25
C Y 3.0 150 75.0 15
Z 6.0 400 200.0 40
5. Modern Alternative in 2026: Polars
For large datasets, Polars is often faster and more memory-efficient with similar syntax.
import polars as pl
df_pl = pl.DataFrame(data)
grouped_pl = df_pl.group_by(["Group1", "Group2"]).agg(
avg_value=pl.col("Value").mean(),
total_sales=pl.col("Sales").sum(),
total_qty=pl.col("Quantity").sum()
)
print(grouped_pl)
Best Practices & Common Pitfalls
- Always sort or reset index after multi-groupby if order matters
- Use
as_index=Falsein Pandas groupby if you want grouping columns as regular columns - Handle missing data before aggregation (
fillnaordropna) - For huge data, prefer Polars or chunked processing
- Visualize results:
grouped_multi.plot(kind='bar')for instant insights
Conclusion
Grouping by multiple variables + multi-column aggregations with groupby() + .agg() turns raw data into rich, multi-dimensional insights — averages by region and product, totals by cohort and channel, counts by category and time. In 2026, use Pandas for readability on small-to-medium data, and Polars for speed on large datasets. Master dictionary aggregations, NamedAgg, and custom functions, and you'll write concise, powerful summaries that scale from exploration to production reporting.
Next time you need cross-dimensional metrics — reach for multi-column groupby + agg first.