Counting missing values is one of the most important early steps in data quality assessment. After detecting whether missing values exist (.isna().any()), the next question is: how many are there, and where? Accurate counting helps you decide whether to drop rows/columns, impute, or investigate why data is missing.
In 2026, you still rely on .isna().sum() in pandas for most work, but Polars offers faster alternatives for large datasets. Here’s a complete, practical guide with real examples, visuals, and best practices.
1. Basic Counting in Pandas
import pandas as pd
# Realistic example: survey data with various missingness
data = {
'respondent_id': [101, 102, 103, 104, 105],
'age': [28, None, 45, 33, None],
'income': [55000, 72000, None, 61000, 48000],
'city': ['New York', 'Chicago', 'Los Angeles', None, 'Seattle'],
'satisfaction': [8, 7, None, 9, 6]
}
df = pd.DataFrame(data)
# Count missing values per column
print("Missing count per column:")
print(df.isna().sum())
# Total missing values in the entire DataFrame
total_missing = df.isna().sum().sum()
print(f"\nTotal missing values: {total_missing} out of {df.size} cells "
f"({total_missing / df.size * 100:.1f}%)")
**Typical output:**
Missing count per column:
- respondent_id|>0
- age|2
- income|1
- city|1
- satisfaction|1
2. Percentage Missing (Most Useful View)
# Missing percentage per column (sorted descending)
missing_pct = (df.isna().mean() * 100).round(2).sort_values(ascending=False)
print("Missing percentage per column:")
print(missing_pct[missing_pct > 0])
# Quick bar plot of missing percentages
plt.figure(figsize=(8, 5))
sns.barplot(x=missing_pct[missing_pct > 0].values, y=missing_pct[missing_pct > 0].index, palette='viridis')
plt.title('Columns with Missing Values (%)', fontsize=14)
plt.xlabel('Missing Percentage')
for i, v in enumerate(missing_pct[missing_pct > 0].values):
plt.text(v + 0.5, i, f"{v}%", va='center')
plt.tight_layout()
plt.show()
3. Fast Counting with Polars (2026 Speed Choice for Large Data)
import polars as pl
df_pl = pl.from_pandas(df)
# Missing count & percentage per column (very fast)
missing_summary = df_pl.null_count().transpose(include_header=True, header_name="column", column_names=["count"])
missing_summary = missing_summary.with_columns(
(pl.col("count") / df_pl.height * 100).round(2).alias("pct")
).filter(pl.col("count") > 0).sort("count", descending=True)
print("Missing summary (Polars):")
print(missing_summary)
4. Counting Missing Values by Group (Real-World Insight)
# Example: missing age by gender (if we had that column)
# df.groupby('gender')['age'].apply(lambda x: x.isna().sum())
# Or with Polars (cleaner syntax)
print(df_pl.group_by('city').agg(pl.col('age').null_count().alias('missing_age')))
5. Quick Decision Guide: What to Do After Counting
- 0–2% missing ? usually safe to drop rows (
dropna()) - 5–20% missing ? simple imputation (median, mode, forward-fill)
- >30% missing in a column ? consider dropping the column unless critical
- Missingness varies strongly by group ? imputation should be group-aware or use advanced methods (KNN/MICE)
- Time series ? never use global mean/median ? prefer interpolation or forward/backward fill
Best Practices & Common Pitfalls (2026 Edition)
- Always report both **count** and **percentage** — raw counts mislead on large datasets
- Sort by missingness descending — focus effort on worst columns first
- Visualize after counting — bar plot or heatmap reveals clustering
- Pitfall:
.isna().sum().sum()counts total cells — use.isna().any().sum()to count affected columns - Pitfall: forgetting to handle missings before modeling — many sklearn models crash or give poor results
- Large data? Use Polars —
null_count()is significantly faster than pandasisna().sum() - Production pipelines: log missing percentages — alert if > threshold
Conclusion
Counting missing values with .isna().sum() (or Polars null_count()) is the bridge from “there’s a problem” to “how big is the problem and where?” In 2026, do it immediately after loading data, always report percentages, visualize the results, and use the numbers to decide on drop vs fill vs advanced imputation. Master this step, and you’ll catch data quality issues early — saving time, avoiding bias, and building more reliable models.
Next time you load a DataFrame — count missing values first. The numbers you get will guide every decision that follows.