Cumulative sum

Cumulative sum (cumsum) is one of the most useful operations in data analysis — it shows the running total of values as you move through a series or column. In Pandas, the .cumsum() method makes this trivial, and it becomes especially powerful for time-series data, financial tracking, inventory levels, or any scenario where you need to see how totals accumulate over time or rows.

Here’s a practical guide to using cumulative sum in Pandas — with real examples, common applications, and 2026 tips.

1. Basic Cumulative Sum on a Single Column

Apply .cumsum() to a Series or column — it adds each value to the running total from the top.


import pandas as pd

data = {
    'Name': ['John', 'Mary', 'Peter', 'Anna', 'Mike'],
    'Age': [25, 32, 18, 47, 23],
    'Salary': [50000, 80000, 35000, 65000, 45000]
}

df = pd.DataFrame(data)

# Cumulative sum of Age
cum_age = df['Age'].cumsum()
print(cum_age)

Output (running total):


0     25
1     57
2     75
3    122
4    145
Name: Age, dtype: int64

2. Cumulative Sum on Multiple Columns

Apply to several columns at once — returns a new DataFrame with running totals for each.


cum_multi = df[['Age', 'Salary']].cumsum()
print(cum_multi)

Output:


   Age  Salary
0   25   50000
1   57  130000
2   75  165000
3  122  230000
4  145  275000

3. Real-World Use Cases (2026 Examples)

Running Total Sales Over Time


# Assume df has 'date' and 'daily_sales'
df = df.sort_values('date')
df['cum_sales'] = df['daily_sales'].cumsum()
print(df[['date', 'daily_sales', 'cum_sales']])

Cumulative Inventory or Balance


df['cum_balance'] = df['inflow'] - df['outflow']
df['cum_balance'] = df['cum_balance'].cumsum()

Cumulative Sum in Time-Series with Groupby


# Cumulative sales per customer
df['cum_sales_per_customer'] = df.groupby('customer_id')['sales'].cumsum()

4. Modern Alternative in 2026: Polars

For large datasets, Polars is often faster and more memory-efficient.


import polars as pl

df_pl = pl.DataFrame(data)
df_pl = df_pl.with_columns(
    pl.col("Age").cumsum().alias("cum_age"),
    pl.col("Salary").cumsum().alias("cum_salary")
)
print(df_pl)

5. Best Practices & Common Pitfalls

Sort data first if order matters (e.g., time-series): df = df.sort_values('date')
Handle missing values early: df.fillna(0).cumsum() or df.cumsum(skipna=True)
Use axis=1 for row-wise cumulative sum (rare but useful)
For huge data, switch to Polars or process in chunks
Visualize: df['cum_sales'].plot() to see trends instantly

Conclusion

Cumulative sum is a simple but incredibly powerful operation — it turns raw values into running totals that reveal trends, growth, balances, and progress over time. In 2026, use Pandas .cumsum() for quick insights on small-to-medium data, and Polars for speed on large datasets.

Next time you’re tracking totals — sales, inventory, scores, balances — reach for cumsum() — it’s one of the fastest ways to turn sequential data into meaningful stories.