Calculating summary stats across columns

Calculating summary statistics across columns is one of the fastest ways to get a high-level view of your data in pandas — mean, median, standard deviation, min/max, sum, and more, all computed column-wise in a single call. These operations help you spot trends, outliers, distributions, and scale differences between variables before diving deeper into modeling or visualization.

In 2026, this remains a core EDA step — quick, readable, and essential for understanding wide datasets. Here’s a practical guide with real examples you can copy and adapt.

1. Basic Setup & Sample Data


import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [6, 7, 8, 9, 10],
    'C': [11, 12, 13, 14, 15]
})

print(df)

Output:


   A   B   C
0  1   6  11
1  2   7  12
2  3   8  13
3  4   9  14
4  5  10  15

2. Single-Statistic Summaries Across Columns

Most common methods default to axis=0 (column-wise).


print("Mean:\n", df.mean())
print("\nMedian:\n", df.median())
print("\nStandard Deviation:\n", df.std())
print("\nSum:\n", df.sum())
print("\nMin:\n", df.min())
print("\nMax:\n", df.max())

Output:


Mean:
 A     3.0
B     8.0
C    13.0
dtype: float64

Median:
 A     3.0
B     8.0
C    13.0
dtype: float64

Standard Deviation:
 A    1.581139
B    1.581139
C    1.581139
dtype: float64

Sum:
 A     15
B     40
C     65
dtype: int64

Min:
 A     1
B     6
C    11
dtype: int64

Max:
 A     5
B    10
C    15
dtype: int64

3. All Summary Statistics at Once with .describe()

The quickest way to get a full overview — count, mean, std, min, 25%, 50%, 75%, max.


print(df.describe())

Output:


              A          B          C
count  5.000000   5.000000   5.000000
mean   3.000000   8.000000  13.000000
std    1.581139   1.581139   1.581139
min    1.000000   6.000000  11.000000
25%    2.000000   7.000000  12.000000
50%    3.000000   8.000000  13.000000
75%    4.000000   9.000000  14.000000
max    5.000000  10.000000  15.000000

4. Row-wise Summaries (axis=1)

Flip the axis to compute across columns per row (e.g., total per observation).


# Sum across columns for each row
row_sums = df.sum(axis=1)
print(row_sums)

# Mean across columns for each row
row_means = df.mean(axis=1)
print(row_means)

Output:


0    18
1    21
2    24
3    27
4    30
dtype: int64

0    6.0
1    7.0
2    8.0
3    9.0
4   10.0
dtype: float64

5. Modern Alternative in 2026: Polars

For large datasets, Polars is often faster and more memory-efficient — summary stats are computed column-wise by default.


import polars as pl

df_pl = pl.DataFrame(data)
print(df_pl.describe())

Best Practices & Common Pitfalls

Always check axis — most methods default to axis=0 (columns), but it's easy to mix up
Use .describe() first — it gives a quick full picture of numeric columns
Handle non-numeric columns — df.describe(include='all') for categorical too
Watch for NaN — summaries skip NaN by default; use skipna=False if needed
For huge data, prefer Polars — it's faster and more memory-efficient for stats across columns
Visualize: df.boxplot() or df.hist() after summary stats

Conclusion

Calculating summary statistics across columns with mean(), median(), std(), describe(), and friends gives you instant insights into scale, spread, and central tendency for every variable. In 2026, start every EDA session with these methods — they reveal outliers, imbalances, and data quality issues before you invest time in deeper analysis. Master axis direction, include/exclude options, and pair with visualization, and you'll understand your data faster and more deeply than ever.

Next time you load a new dataset — run df.describe() and df.mean(axis=0) first. It's the fastest way to know what you're dealing with.

1. Basic Setup & Sample Data

2. Single-Statistic Summaries Across Columns

3. All Summary Statistics at Once with .describe()

4. Row-wise Summaries (axis=1)

5. Modern Alternative in 2026: Polars

Best Practices & Common Pitfalls

Conclusion

Generating content...