Missing values (NaN, None, null) are one of the most common — and most dangerous — realities in real-world data. They appear from sensor failures, non-responses in surveys, data entry errors, filtering bugs, or intentional non-collection. Ignoring them leads to biased models, crashed algorithms, or misleading insights. In 2026, handling missing data intelligently is still a core skill for any data scientist or analyst.
Here’s a practical, up-to-date guide to detecting, understanding, and treating missing values using pandas (classic), Polars (fast modern alternative), and visualization tools.
1. Quick Detection & Summary
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Example: Titanic-like dataset with realistic missingness
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
# Fast missing overview
print("Missing counts:\n", df.isna().sum())
print("\nMissing %:\n", df.isna().mean() * 100)
# Visual heatmap (very useful!)
plt.figure(figsize=(10, 6))
sns.heatmap(df.isna(), cbar=False, cmap='viridis', yticklabels=False)
plt.title('Missing Values Heatmap (Yellow = Missing)', fontsize=14)
plt.tight_layout()
plt.show()
2. Common Strategies: When to Use What (Decision Guide 2026)
| Scenario | Best Method | When to Avoid | Code Example |
|---|---|---|---|
| Very few missing (<1–2%) | Drop rows | Small dataset or important rows | df.dropna(inplace=True) |
| Numeric, random missing | Mean / Median imputation | Strong skew or outliers | df['Age'].fillna(df['Age'].median(), inplace=True) |
| Time series / ordered data | Forward-fill, backward-fill, linear interpolation | Long gaps | df['value'].interpolate(method='linear', inplace=True) |
| Categorical / low cardinality | Mode or new 'missing' category | High cardinality | df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True) |
| Predictive power matters (modeling) | KNN / Iterative imputation (sklearn) | Very large data (slow) | See advanced section |
| Large data & speed critical | Polars + simple fill / forward fill | Need complex logic | df.with_columns(pl.col('col').fill_null(strategy='forward')) |
3. Advanced Imputation: When Simple Isn’t Enough
from sklearn.impute import KNNImputer, IterativeImputer
# KNN imputation (uses nearest neighbors based on other features)
imputer = KNNImputer(n_neighbors=5)
df_numeric = df.select_dtypes(include='number')
df_imputed = pd.DataFrame(imputer.fit_transform(df_numeric), columns=df_numeric.columns)
# Iterative (model-based, like MICE)
iter_imputer = IterativeImputer(max_iter=10, random_state=42)
df_iter = pd.DataFrame(iter_imputer.fit_transform(df_numeric), columns=df_numeric.columns)
4. Visualizing Missingness Patterns
# Missingness correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.isna().corr(), annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation of Missingness Across Columns', fontsize=14)
plt.tight_layout()
plt.show()
Best Practices & Common Pitfalls (2026 Edition)
- Always explore why values are missing — MCAR, MAR, MNAR? (affects strategy)
- Never blindly fill before understanding patterns — use heatmaps and groupby
- Create a missing indicator column — helps models learn from missingness itself
- For time series — prefer interpolation / forward-fill over mean
- Compare model performance before/after imputation — sometimes dropping is better
- Large data? Use Polars —
fill_nullstrategies are blazing fast - Production? Use sklearn Pipeline with SimpleImputer or custom transformers
Conclusion
Missing values are not just noise — they are information. In 2026, start every dataset with detection (heatmap + counts), understand mechanisms, then choose the right treatment: drop for tiny missingness, simple fill for speed, interpolation for time series, or advanced (KNN/Iterative) when modeling performance matters. Visualize patterns, create indicators, compare strategies, and never assume “mean fill and move on” is enough. Master missing data handling, and your models will be more robust, less biased, and far more trustworthy.
Next time you see NaNs — don’t panic. Plot them, understand them, treat them thoughtfully. Your future self (and your models) will thank you.