NumPy array boolean indexing is one of the most powerful and efficient features in NumPy — it lets you select, filter, mask, and modify array elements based on conditions expressed as boolean arrays (masks). Instead of slow Python loops or list comprehensions, boolean indexing operates directly on the array in optimized C code — often 10–100× faster and far more memory-efficient. In 2026, boolean indexing remains essential for data cleaning, feature selection, anomaly detection, conditional updates, and exploratory analysis in pandas, Polars, scikit-learn, and production pipelines.
Here’s a complete, practical guide to NumPy boolean indexing: how masks work, filtering, assignment, combining conditions, real-world patterns, and modern best practices for fast, readable, and scalable code.
A boolean mask is an array of the same shape as the target, with True/False values. When you index an array with a mask, NumPy returns only the elements where the mask is True — flattened into a 1D array (unless advanced indexing is used).
import numpy as np
arr = np.array([10, 20, 30, 40, 50, 60])
# Create a boolean mask
mask = arr > 30
print(mask) # [False False False True True True]
# Filter: select elements where mask is True
print(arr[mask]) # [40 50 60]
# Boolean indexing with condition directly
print(arr[arr > 30]) # Same as above — [40 50 60]
Boolean indexing for assignment is just as powerful — set values where the condition is True without loops.
# Clip negative values to 0
prices = np.array([19.99, -5.0, 29.99, -10.0, 150.0])
prices[prices < 0] = 0
print(prices) # [ 19.99 0. 29.99 0. 150. ]
# Conditional replacement with where()
clean_prices = np.where(prices > 100, 100, prices) # Cap at 100
print(clean_prices) # [ 19.99 0. 29.99 0. 100. ]
Combine multiple conditions with logical operators (&, |, ~) — parentheses are important for precedence.
# Select values between 20 and 50
mask = (arr >= 20) & (arr <= 50)
print(arr[mask]) # [20 30 40 50]
# Invert: exclude values in range
not_in_range = ~((arr >= 20) & (arr <= 50))
print(arr[not_in_range]) # [10 60]
Real-world pattern: data cleaning and feature engineering — boolean indexing filters invalid data, masks outliers, or selects subsets for analysis.
# Clean temperature data: replace invalid values
temps = np.array([23.5, -999, 25.0, 9999, 22.8, np.nan])
# Mask invalid: -999, 9999, NaN
invalid_mask = (temps < -50) | (temps > 60) | np.isnan(temps)
temps[invalid_mask] = np.mean(temps[~invalid_mask]) # Replace with mean
print(temps) # [23.5 23.77 25.0 23.77 22.8 ]
Best practices unlock NumPy’s full power with boolean indexing. Prefer boolean masks over loops — arr[arr > 0] is faster and clearer than [x for x in arr if x > 0]. Use np.where(condition, x, y) for conditional replacement — it’s vectorized and returns a new array. Combine masks with logical operators — always use parentheses for clarity: (arr > 0) & (arr < 10). Avoid chained indexing — arr[arr > 0][arr[arr > 0] < 10] creates copies; use single mask instead. Modern tip: use np.select() or np.piecewise() for multiple conditions — more readable than chained where(). In production, profile memory — boolean indexing creates temporary mask arrays; reuse masks when possible. Combine with pandas/Polars — NumPy boolean indexing is the engine under df[df['col'] > 0] or pl.col("col").filter(...).
NumPy boolean indexing turns conditional selection, filtering, and assignment into fast, vectorized operations — no loops, no copies (when possible), just elegant masks. In 2026, master boolean masks, np.where(), logical operators, and type-safe code. You’ll clean, filter, and transform arrays with C-like speed while writing clean, readable Python.
Next time you need to select or modify parts of an array based on a condition — reach for boolean indexing. It’s NumPy’s cleanest way to say: “Take only what passes the test — fast.”