Summarizing data by group is one of the most powerful and common operations in data analysis — think sales by region, user metrics by cohort, sensor readings by device, or experiment results by category. In Pandas, the groupby() method combined with aggregation functions (mean, sum, count, etc.) makes this clean, fast, and expressive.
In 2026, this pattern remains essential for EDA, reporting, and feature engineering. Here’s a practical guide with real examples you can copy and adapt.
1. Basic Grouped Summary (Single Function)
Group by one column and apply a single aggregation — the simplest and most common case.
import pandas as pd
data = {
'Group': ['A', 'B', 'C', 'A', 'B', 'C'],
'Value': [1, 2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)
# Group by 'Group' and calculate mean of 'Value'
grouped_mean = df.groupby('Group')['Value'].mean()
print(grouped_mean)
Output:
Group
A 2.5
B 3.5
C 4.5
Name: Value, dtype: float64
2. Multiple Aggregations (Using .agg())
Apply several functions at once — mean, sum, count, min, max, etc.
multi_agg = df.groupby('Group')['Value'].agg(['mean', 'sum', 'count', 'min', 'max'])
print(multi_agg)
Output:
mean sum count min max
Group
A 2.5 5 2 1 4
B 3.5 7 2 2 5
C 4.5 9 2 3 6
3. Different Aggregations per Column
Use a dictionary to assign specific functions to specific columns — very powerful for multi-metric summaries.
# Add more columns for realism
df['Sales'] = [100, 200, 150, 300, 250, 400]
df['Quantity'] = [10, 20, 15, 30, 25, 40]
grouped_multi = df.groupby('Group').agg({
'Value': 'mean',
'Sales': ['sum', 'mean'],
'Quantity': 'sum'
})
print(grouped_multi)
4. Named Aggregations (Clean, Readable Output)
Use NamedAgg for meaningful column names (Python 3.6+).
from pandas import NamedAgg
named_agg = df.groupby('Group').agg(
avg_value=('Value', 'mean'),
total_sales=('Sales', 'sum'),
avg_sales=('Sales', 'mean'),
total_qty=('Quantity', 'sum')
)
print(named_agg)
5. Modern Alternative in 2026: Polars
For large datasets, Polars is often faster and more memory-efficient.
import polars as pl
df_pl = pl.DataFrame(data)
grouped_pl = df_pl.group_by("Group").agg(
mean_value=pl.col("Value").mean(),
sum_sales=pl.col("Sales").sum()
)
print(grouped_pl)
Best Practices & Common Pitfalls
- Always sort or reset index after groupby if order matters
- Use
as_index=Falsein Pandas groupby if you want 'Group' as a regular column - Handle missing data before aggregation (fillna or dropna)
- For huge data, prefer Polars or chunked processing
- Visualize results:
grouped_mean.plot(kind='bar')for instant insights
Conclusion
Grouped summaries with groupby() + .agg() turn raw data into actionable insights — averages by category, totals by group, counts per segment. In 2026, use Pandas for readability on small-to-medium data, and Polars for speed on large datasets. Master this pattern, and you'll spend less time calculating and more time understanding what the numbers really mean.
Next time you need group-level metrics — sales by region, users by cohort, errors by device — reach for groupby + agg first.