Cumulative Statistics in Pandas – cumsum, cummax, cummin, expanding() & More in Python 2026

Cumulative Statistics in Pandas – cumsum, cummax, cummin, expanding() & More in Python 2026

Cumulative statistics allow you to calculate running totals, running maximums, running averages, and other metrics that update as you move through your data. These are extremely valuable for trend analysis, growth tracking, and creating powerful features in data manipulation workflows.

TL;DR — Key Cumulative Functions

.cumsum() – Running total
.cummax() – Running maximum
.cummin() – Running minimum
.cumprod() – Running product
.expanding().mean() – Running (expanding window) average
.expanding().std() – Running standard deviation

1. Basic Cumulative Statistics

import pandas as pd

df = pd.read_csv("sales_data.csv", parse_dates=["order_date"])
df = df.sort_values("order_date")

# Running totals
df["cum_sales"] = df["amount"].cumsum()
df["cum_quantity"] = df["quantity"].cumsum()

# Running maximum and minimum
df["running_max_sale"] = df["amount"].cummax()
df["running_min_sale"] = df["amount"].cummin()

2. Running Averages and Statistics with expanding()

# Running (expanding window) statistics
df["running_avg_sale"] = df["amount"].expanding().mean()
df["running_std_sale"] = df["amount"].expanding().std()
df["running_min"] = df["amount"].expanding().min()

3. Grouped Cumulative Statistics (Most Powerful Pattern)

# Cumulative sales per region
df["cum_sales_by_region"] = df.groupby("region")["amount"].cumsum()

# Running average per customer
df["customer_running_avg"] = df.groupby("customer_id")["amount"].expanding().mean().reset_index(level=0, drop=True)

4. Real-World Example: Monthly Cumulative Growth

monthly = (
    df
    .groupby([df["order_date"].dt.to_period("M"), "region"])
    .agg(total_sales=("amount", "sum"))
    .reset_index()
)

monthly["cumulative_sales"] = monthly.groupby("region")["total_sales"].cumsum()
monthly["cumulative_growth"] = monthly.groupby("region")["cumulative_sales"].pct_change()

Best Practices in 2026

Always sort your data by the time column before applying cumulative functions
Use groupby() + cumulative methods for segmented running statistics
Combine cumsum() with expanding() for rich feature engineering
Use .fillna(0) if your data contains NaNs before cumulative calculations
These operations are very fast in Pandas and scale well to large datasets

Conclusion

Cumulative statistics are a powerful addition to any data manipulation toolkit. In 2026, combining cumsum(), cummax(), expanding(), and groupby() allows you to create insightful running metrics, growth trends, and historical comparisons with minimal code. These techniques turn static transactional data into dynamic, time-aware insights.

Next steps:

Add cumulative sales and running average columns to one of your current datasets and explore the new insights they reveal

Cumulative Statistics in Pandas – cumsum, cummax, cummin, expanding() & More in Python 2026

TL;DR — Key Cumulative Functions

1. Basic Cumulative Statistics

2. Running Averages and Statistics with expanding()

3. Grouped Cumulative Statistics (Most Powerful Pattern)

4. Real-World Example: Monthly Cumulative Growth

Best Practices in 2026

Conclusion

Related Articles in Data Manipulation 2026

Data Manipulation with Pandas & Polars – Complete Guide & Best Practices 2026

Summarizing Dates in Pandas – GroupBy, Resample & Date Features in Python 2026

Slicing the Inner Index Levels Correctly – MultiIndex Best Practices 2026

Generating content...