Cumulative sum (cumsum) is one of the most useful operations in data analysis — it shows the running total of values as you move through a series or column. In Pandas, the .cumsum() method makes this trivial, and it becomes especially powerful for time-series data, financial tracking, inventory levels, or any scenario where you need to see how totals accumulate over time or rows.
Here’s a practical guide to using cumulative sum in Pandas — with real examples, common applications, and 2026 tips.
1. Basic Cumulative Sum on a Single Column
Apply .cumsum() to a Series or column — it adds each value to the running total from the top.
import pandas as pd
data = {
'Name': ['John', 'Mary', 'Peter', 'Anna', 'Mike'],
'Age': [25, 32, 18, 47, 23],
'Salary': [50000, 80000, 35000, 65000, 45000]
}
df = pd.DataFrame(data)
# Cumulative sum of Age
cum_age = df['Age'].cumsum()
print(cum_age)
Output (running total):
0 25
1 57
2 75
3 122
4 145
Name: Age, dtype: int64
2. Cumulative Sum on Multiple Columns
Apply to several columns at once — returns a new DataFrame with running totals for each.
cum_multi = df[['Age', 'Salary']].cumsum()
print(cum_multi)
Output:
Age Salary
0 25 50000
1 57 130000
2 75 165000
3 122 230000
4 145 275000
3. Real-World Use Cases (2026 Examples)
Running Total Sales Over Time
# Assume df has 'date' and 'daily_sales'
df = df.sort_values('date')
df['cum_sales'] = df['daily_sales'].cumsum()
print(df[['date', 'daily_sales', 'cum_sales']])
Cumulative Inventory or Balance
df['cum_balance'] = df['inflow'] - df['outflow']
df['cum_balance'] = df['cum_balance'].cumsum()
Cumulative Sum in Time-Series with Groupby
# Cumulative sales per customer
df['cum_sales_per_customer'] = df.groupby('customer_id')['sales'].cumsum()
4. Modern Alternative in 2026: Polars
For large datasets, Polars is often faster and more memory-efficient.
import polars as pl
df_pl = pl.DataFrame(data)
df_pl = df_pl.with_columns(
pl.col("Age").cumsum().alias("cum_age"),
pl.col("Salary").cumsum().alias("cum_salary")
)
print(df_pl)
5. Best Practices & Common Pitfalls
- Sort data first if order matters (e.g., time-series):
df = df.sort_values('date') - Handle missing values early:
df.fillna(0).cumsum()ordf.cumsum(skipna=True) - Use
axis=1for row-wise cumulative sum (rare but useful) - For huge data, switch to Polars or process in chunks
- Visualize:
df['cum_sales'].plot()to see trends instantly
Conclusion
Cumulative sum is a simple but incredibly powerful operation — it turns raw values into running totals that reveal trends, growth, balances, and progress over time. In 2026, use Pandas .cumsum() for quick insights on small-to-medium data, and Polars for speed on large datasets.
Next time you’re tracking totals — sales, inventory, scores, balances — reach for cumsum() — it’s one of the fastest ways to turn sequential data into meaningful stories.