Grouping by multiple variables

Grouping by multiple variables is one of the most powerful techniques in data analysis — it lets you slice data across several dimensions at once (e.g., sales by region and product, users by cohort and device, experiments by variant and country). In Pandas, you pass a list of column names to groupby() to create a multi-level grouping, then apply aggregations like mean, sum, count, min/max, or custom functions.

In 2026, this pattern is essential for dashboards, cohort reports, segmentation, and cross-tab analysis. Here’s a practical guide with real examples you can copy and adapt immediately.

1. Basic Setup & Sample Data


import pandas as pd

data = {
    'Group1': ['A', 'B', 'C', 'A', 'B', 'C'],
    'Group2': ['X', 'X', 'Y', 'Y', 'Z', 'Z'],
    'Value': [1, 2, 3, 4, 5, 6]
}

df = pd.DataFrame(data)
print(df)

2. Group by Multiple Columns + Single Aggregation

Group by two (or more) columns and apply one function — results in a multi-index Series.


# Group by Group1 and Group2 ? mean of Value
multi_group = df.groupby(['Group1', 'Group2'])['Value'].mean()
print(multi_group)

Output (hierarchical index):


Group1  Group2
A       X         1.0
        Y         4.0
B       X         2.0
        Z         5.0
C       Y         3.0
        Z         6.0
Name: Value, dtype: float64

3. Multiple Aggregations on Multiple Columns

Use .agg() with a dictionary to apply different functions to different columns — very common in real reports.


# Add more columns for realism
df['Sales'] = [100, 200, 150, 300, 250, 400]
df['Quantity'] = [10, 20, 15, 30, 25, 40]

# Group by two columns + different aggregations per metric
grouped_multi = df.groupby(['Group1', 'Group2']).agg({
    'Value': 'mean',
    'Sales': ['sum', 'mean'],
    'Quantity': 'sum'
})

print(grouped_multi)

Output (multi-level columns):


               Value Sales       Quantity
                mean   sum  mean      sum
Group1 Group2                            
A      X         1.0   100  50.0       10
       Y         4.0   300 150.0       30
B      X         2.0   200 100.0       20
       Z         5.0   250 125.0       25
C      Y         3.0   150  75.0       15
       Z         6.0   400 200.0       40

4. Clean Output with Named Aggregations

Use NamedAgg or alias syntax for flat, readable column names (highly recommended).


from pandas import NamedAgg

named_summary = df.groupby(['Group1', 'Group2']).agg(
    avg_value=('Value', 'mean'),
    total_sales=('Sales', 'sum'),
    avg_sales=('Sales', 'mean'),
    total_qty=('Quantity', 'sum')
)

print(named_summary)

Output (clean & flat):


                 avg_value  total_sales  avg_sales  total_qty
Group1 Group2                                                  
A      X               1.0          100       50.0         10
       Y               4.0          300      150.0         30
B      X               2.0          200      100.0         20
       Z               5.0          250      125.0         25
C      Y               3.0          150       75.0         15
       Z               6.0          400      200.0         40

5. Modern Alternative in 2026: Polars

For large datasets, Polars is often faster and more memory-efficient with similar syntax.


import polars as pl

df_pl = pl.DataFrame(data)
grouped_pl = df_pl.group_by(["Group1", "Group2"]).agg(
    avg_value=pl.col("Value").mean(),
    total_sales=pl.col("Sales").sum(),
    total_qty=pl.col("Quantity").sum()
)
print(grouped_pl)

Best Practices & Common Pitfalls

Always sort or reset index after multi-groupby if order matters
Use as_index=False in Pandas groupby if you want grouping columns as regular columns
Handle missing data before aggregation (fillna or dropna)
For huge data, prefer Polars or chunked processing
Visualize results: grouped_multi.plot(kind='bar') for instant insights

Conclusion

Grouping by multiple variables + multi-column aggregations with groupby() + .agg() turns raw data into rich, multi-dimensional insights — averages by region and product, totals by cohort and channel, counts by category and time. In 2026, use Pandas for readability on small-to-medium data, and Polars for speed on large datasets. Master dictionary aggregations, NamedAgg, and custom functions, and you'll write concise, powerful summaries that scale from exploration to production reporting.

Next time you need cross-dimensional metrics — reach for multi-column groupby + agg first.