Replacing missing values (imputation via fillna() in pandas or fill_null() in Polars) is often the best compromise when dropping rows/columns would destroy too much data. The key is choosing the right strategy: simple fills preserve row count but can distort distributions; advanced methods preserve realism but take more compute.
In 2026, start simple (mean/median/mode/ffill), then move to model-based imputation (KNN/MICE) when accuracy matters. Always visualize before and after — never assume the fill is harmless.
1. Basic Replacement in Pandas
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Realistic example: sales data with common missing patterns
data = {
'date': pd.date_range('2025-01-01', periods=6),
'sales': [120, None, 180, 210, None, 150],
'region': ['North', 'South', 'East', None, 'West', 'North'],
'price': [15.5, 14.9, None, 16.2, 15.0, 14.8]
}
df = pd.DataFrame(data)
print("Original missing count:\n", df.isna().sum())
# Option 1: Fill numeric columns with median
df_median = df.copy()
df_median['sales'] = df_median['sales'].fillna(df_median['sales'].median())
df_median['price'] = df_median['price'].fillna(df_median['price'].median())
# Option 2: Fill categorical with mode (most frequent)
df_median['region'] = df_median['region'].fillna(df_median['region'].mode()[0])
print("\nAfter median/mode fill:\n", df_median)
**Typical output (after fill):**
2. Time-Series Friendly: Forward-Fill & Backward-Fill
# Forward-fill (carry last valid observation forward) — great for time series
df_ffill = df.copy()
df_ffill = df_ffill.fillna(method='ffill')
# Backward-fill (carry next valid observation backward)
df_bfill = df.copy()
df_bfill = df_bfill.fillna(method='bfill')
print("After forward-fill:\n", df_ffill)
print("\nAfter backward-fill:\n", df_bfill)
3. Fast & Modern: Filling in Polars (2026 Large-Data Choice)
import polars as pl
df_pl = pl.from_pandas(df)
# Fill numeric with median, categorical with mode
df_pl_filled = df_pl.with_columns([
pl.col('sales').fill_null(pl.col('sales').median()),
pl.col('price').fill_null(pl.col('price').median()),
pl.col('region').fill_null(pl.col('region').mode().first())
])
print("After median/mode fill (Polars):\n", df_pl_filled)
4. Before & After Visual Check (Critical Step)
import missingno as msno
# Before
plt.figure(figsize=(10, 4))
msno.bar(df, color='teal')
plt.title('Missing Values BEFORE Replacement', fontsize=14)
plt.show()
# After
plt.figure(figsize=(10, 4))
msno.bar(df_median, color='darkorange')
plt.title('Missing Values AFTER Median/Mode Fill', fontsize=14)
plt.show()
When to Use Each Replacement Strategy (2026 Decision Framework)
| Scenario | Best Method | Why / Risk |
|---|---|---|
| Numeric, low skew | Mean fill | Preserves mean; distorts if outliers |
| Numeric, skewed/outliers | Median fill | Robust to outliers |
| Categorical | Mode or 'Unknown' category | 'Unknown' preserves missingness info |
| Time-series / ordered | ffill / bfill / linear interpolate | Preserves temporal continuity |
| Modeling performance critical | KNNImputer or IterativeImputer | Uses other features for realistic fill |
Best Practices & Common Pitfalls
- Always visualize before/after with
missingno.bar()— confirms you filled what you intended - Fill numeric with median (not mean) unless data is symmetric — protects against outliers
- For time-series: prefer
method='ffill'orinterpolate(method='linear')over global stats - Pitfall: filling before EDA ? can hide real patterns (e.g., missing salary for unemployed group)
- Pitfall: global fill on grouped data ? use groupby + transform for group-aware imputation
- Large data? Use Polars
fill_null()— much faster than pandasfillna() - Production: log filled values & strategy — audit trail for reproducibility
Conclusion
Replacing missing values is an art: mean/median/mode/ffill for speed and simplicity, group-aware or model-based (KNN/Iterative) when realism matters. In 2026, always visualize before and after, choose method based on data type and missing mechanism, and test impact on model performance. Done right, imputation preserves data volume and signal; done wrong, it introduces noise or bias. Master replacement strategies, and your datasets stay powerful and trustworthy.
Next time you see missing values — don’t just drop or mean-fill. Choose thoughtfully, visualize the change, and let the data guide you.