Removing Missing Values in Pandas – When and How to Use dropna() 2026
Removing missing values using dropna() is one of the simplest and fastest ways to clean your dataset. While not always the best strategy, it is often appropriate when missing values are few or when complete cases are required for analysis.
TL;DR — dropna() Parameters
axis=0→ Drop rows (default)axis=1→ Drop columnshow="any"→ Drop if any value is missing (default)how="all"→ Drop only if all values are missingthresh=n→ Keep rows with at least n non-missing values
1. Basic Usage of dropna()
import pandas as pd
df = pd.read_csv("sales_data.csv", parse_dates=["order_date"])
print(f"Original shape: {df.shape}")
# Drop rows with any missing values
df_clean = df.dropna()
print(f"After dropna(): {df_clean.shape}")
print(f"Rows removed: {len(df) - len(df_clean)}")
2. Common dropna() Strategies
# 1. Drop rows only if ALL values are missing
df_clean = df.dropna(how="all")
# 2. Drop columns that have too many missing values
df_clean = df.dropna(thresh=len(df)*0.7, axis=1) # Keep columns with at least 70% non-missing
# 3. Drop rows based on specific columns only
df_clean = df.dropna(subset=["amount", "region", "customer_id"])
# 4. Drop rows with missing values in critical columns
critical_cols = ["order_date", "amount", "customer_id"]
df_clean = df.dropna(subset=critical_cols)
3. Real-World Example
# Before cleaning
print("Missing values before:")
print(df.isna().sum()[df.isna().sum() > 0])
# Clean strategy: Keep only complete records for key business columns
df_clean = df.dropna(subset=["order_date", "amount", "region", "customer_id"])
print(f"
Rows before: {len(df)}")
print(f"Rows after: {len(df_clean)}")
print(f"Percentage kept: {(len(df_clean)/len(df)*100):.1f}%")
4. Best Practices in 2026
- Always check how many rows/columns will be removed before dropping
- Use
subsetto drop based only on important business columns - Use
threshto keep columns that are mostly complete - Consider imputation instead of dropping when missingness is high (>10-20%)
- Document your dropping strategy and the percentage of data lost
- Never drop rows blindly without understanding the business impact
Conclusion
Removing missing values with dropna() is fast and simple, but it should be used thoughtfully. In 2026, the best practice is to first understand the pattern of missingness, then decide whether to drop rows, drop columns, or use imputation. Use subset and thresh parameters to make your dropping strategy more intelligent and business-aligned.
Next steps:
- Analyze the missing values in your current dataset and decide which strategy (drop rows, drop columns, or impute) is most appropriate