Summaries by group

Summarizing data by group is one of the most powerful and common operations in data analysis — think sales by region, user metrics by cohort, sensor readings by device, or experiment results by category. In Pandas, the groupby() method combined with aggregation functions (mean, sum, count, etc.) makes this clean, fast, and expressive.

In 2026, this pattern remains essential for EDA, reporting, and feature engineering. Here’s a practical guide with real examples you can copy and adapt.

1. Basic Grouped Summary (Single Function)

Group by one column and apply a single aggregation — the simplest and most common case.


import pandas as pd

data = {
    'Group': ['A', 'B', 'C', 'A', 'B', 'C'],
    'Value': [1, 2, 3, 4, 5, 6]
}

df = pd.DataFrame(data)

# Group by 'Group' and calculate mean of 'Value'
grouped_mean = df.groupby('Group')['Value'].mean()
print(grouped_mean)

Output:


Group
A    2.5
B    3.5
C    4.5
Name: Value, dtype: float64

2. Multiple Aggregations (Using .agg())

Apply several functions at once — mean, sum, count, min, max, etc.


multi_agg = df.groupby('Group')['Value'].agg(['mean', 'sum', 'count', 'min', 'max'])
print(multi_agg)

Output:


       mean  sum  count  min  max
Group                              
A       2.5    5      2    1    4
B       3.5    7      2    2    5
C       4.5    9      2    3    6

3. Different Aggregations per Column

Use a dictionary to assign specific functions to specific columns — very powerful for multi-metric summaries.


# Add more columns for realism
df['Sales'] = [100, 200, 150, 300, 250, 400]
df['Quantity'] = [10, 20, 15, 30, 25, 40]

grouped_multi = df.groupby('Group').agg({
    'Value': 'mean',
    'Sales': ['sum', 'mean'],
    'Quantity': 'sum'
})

print(grouped_multi)

4. Named Aggregations (Clean, Readable Output)

Use NamedAgg for meaningful column names (Python 3.6+).


from pandas import NamedAgg

named_agg = df.groupby('Group').agg(
    avg_value=('Value', 'mean'),
    total_sales=('Sales', 'sum'),
    avg_sales=('Sales', 'mean'),
    total_qty=('Quantity', 'sum')
)

print(named_agg)

5. Modern Alternative in 2026: Polars

For large datasets, Polars is often faster and more memory-efficient.


import polars as pl

df_pl = pl.DataFrame(data)
grouped_pl = df_pl.group_by("Group").agg(
    mean_value=pl.col("Value").mean(),
    sum_sales=pl.col("Sales").sum()
)
print(grouped_pl)

Best Practices & Common Pitfalls

Always sort or reset index after groupby if order matters
Use as_index=False in Pandas groupby if you want 'Group' as a regular column
Handle missing data before aggregation (fillna or dropna)
For huge data, prefer Polars or chunked processing
Visualize results: grouped_mean.plot(kind='bar') for instant insights

Conclusion

Grouped summaries with groupby() + .agg() turn raw data into actionable insights — averages by category, totals by group, counts per segment. In 2026, use Pandas for readability on small-to-medium data, and Polars for speed on large datasets. Master this pattern, and you'll spend less time calculating and more time understanding what the numbers really mean.

Next time you need group-level metrics — sales by region, users by cohort, errors by device — reach for groupby + agg first.