Calculating summary statistics across columns is one of the fastest ways to get a high-level view of your data in pandas — mean, median, standard deviation, min/max, sum, and more, all computed column-wise in a single call. These operations help you spot trends, outliers, distributions, and scale differences between variables before diving deeper into modeling or visualization.
In 2026, this remains a core EDA step — quick, readable, and essential for understanding wide datasets. Here’s a practical guide with real examples you can copy and adapt.
1. Basic Setup & Sample Data
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [6, 7, 8, 9, 10],
'C': [11, 12, 13, 14, 15]
})
print(df)
Output:
A B C
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
2. Single-Statistic Summaries Across Columns
Most common methods default to axis=0 (column-wise).
print("Mean:\n", df.mean())
print("\nMedian:\n", df.median())
print("\nStandard Deviation:\n", df.std())
print("\nSum:\n", df.sum())
print("\nMin:\n", df.min())
print("\nMax:\n", df.max())
Output:
Mean:
A 3.0
B 8.0
C 13.0
dtype: float64
Median:
A 3.0
B 8.0
C 13.0
dtype: float64
Standard Deviation:
A 1.581139
B 1.581139
C 1.581139
dtype: float64
Sum:
A 15
B 40
C 65
dtype: int64
Min:
A 1
B 6
C 11
dtype: int64
Max:
A 5
B 10
C 15
dtype: int64
3. All Summary Statistics at Once with .describe()
The quickest way to get a full overview — count, mean, std, min, 25%, 50%, 75%, max.
print(df.describe())
Output:
A B C
count 5.000000 5.000000 5.000000
mean 3.000000 8.000000 13.000000
std 1.581139 1.581139 1.581139
min 1.000000 6.000000 11.000000
25% 2.000000 7.000000 12.000000
50% 3.000000 8.000000 13.000000
75% 4.000000 9.000000 14.000000
max 5.000000 10.000000 15.000000
4. Row-wise Summaries (axis=1)
Flip the axis to compute across columns per row (e.g., total per observation).
# Sum across columns for each row
row_sums = df.sum(axis=1)
print(row_sums)
# Mean across columns for each row
row_means = df.mean(axis=1)
print(row_means)
Output:
0 18
1 21
2 24
3 27
4 30
dtype: int64
0 6.0
1 7.0
2 8.0
3 9.0
4 10.0
dtype: float64
5. Modern Alternative in 2026: Polars
For large datasets, Polars is often faster and more memory-efficient — summary stats are computed column-wise by default.
import polars as pl
df_pl = pl.DataFrame(data)
print(df_pl.describe())
Best Practices & Common Pitfalls
- Always check
axis— most methods default toaxis=0(columns), but it's easy to mix up - Use
.describe()first — it gives a quick full picture of numeric columns - Handle non-numeric columns —
df.describe(include='all')for categorical too - Watch for NaN — summaries skip NaN by default; use
skipna=Falseif needed - For huge data, prefer Polars — it's faster and more memory-efficient for stats across columns
- Visualize:
df.boxplot()ordf.hist()after summary stats
Conclusion
Calculating summary statistics across columns with mean(), median(), std(), describe(), and friends gives you instant insights into scale, spread, and central tendency for every variable. In 2026, start every EDA session with these methods — they reveal outliers, imbalances, and data quality issues before you invest time in deeper analysis. Master axis direction, include/exclude options, and pair with visualization, and you'll understand your data faster and more deeply than ever.
Next time you load a new dataset — run df.describe() and df.mean(axis=0) first. It's the fastest way to know what you're dealing with.