Aggregating while ignoring NaNs for Analyzing Earthquake Data

Aggregating while ignoring NaNs for Analyzing Earthquake Data is crucial when working with real-world seismic catalogs — NaNs (missing values) frequently appear due to incomplete reports, sensor failures, or data gaps, and standard aggregations (mean, sum, std) propagate NaNs, corrupting results. NumPy, Dask, and pandas provide nan-aware functions (nanmean, nansum, nanstd, nanmax, nanmin) that skip NaNs automatically, while pandas defaults to skipna=True in most methods. In 2026, robust nan-handling ensures accurate magnitude statistics, depth averages, event counts, and spatial/temporal trends — essential for USGS/iris data analysis, risk assessment, and ML feature engineering on large or incomplete catalogs.

Here’s a complete, practical guide to aggregating earthquake data while ignoring NaNs: NumPy/Dask nan-functions, pandas skipna, real-world patterns (magnitude means, depth stats, country counts), and modern best practices with chunking, lazy evaluation, visualization, Polars equivalents, and performance tips.

NumPy nan-safe aggregation — dedicated functions ignore NaNs automatically.


import numpy as np

# Example magnitude array with NaNs (missing events)
mags = np.array([6.1, 7.2, np.nan, 5.8, np.nan, 8.0, 6.5, 7.9])

# Standard mean propagates NaN
print(np.mean(mags))          # nan

# Nan-safe versions
print(np.nanmean(mags))       # 6.916666666666667 (ignores NaNs)
print(np.nansum(mags))        # 41.5
print(np.nanstd(mags))        # 0.8703882797784891
print(np.nanmax(mags))        # 8.0
print(np.nanmin(mags))        # 5.8
print(np.nanargmax(mags))     # 5 (index of max non-NaN)

Pandas aggregation ignoring NaNs — skipna=True is default in most methods.


import pandas as pd

df = pd.DataFrame({
    'mag': [6.1, 7.2, np.nan, 5.8, np.nan, 8.0, 6.5, 7.9],
    'depth': [10.0, 35.0, np.nan, 15.0, 20.0, np.nan, 8.0, 12.0],
    'country': ['Japan', 'Chile', 'USA', 'Indonesia', 'Mexico', 'Japan', 'Peru', 'Chile']
})

# Mean magnitude & depth (skips NaNs by default)
print(df.mean(numeric_only=True))
# mag      6.916667
# depth   16.666667

# Explicit skipna=False propagates NaN
print(df.mean(skipna=False, numeric_only=True))
# mag           NaN
# depth    16.666667

# Grouped stats: mean mag per country
print(df.groupby('country')['mag'].mean())

Dask aggregation ignoring NaNs — use nanmean, nansum, etc., on chunked arrays or DataFrames.


import dask.array as da
import dask.dataframe as dd

# Dask array example (from HDF5 or large CSV)
with h5py.File('earthquakes.h5', 'r') as f:
    mag_dask = da.from_array(f['magnitude'], chunks='auto')

# Lazy nan-aware mean
mean_mag = mag_dask.nanmean().compute()
print(f"Mean magnitude: {mean_mag:.2f}")

# Dask DataFrame example (large CSV)
ddf = dd.read_csv('earthquakes.csv')
mean_mag_ddf = ddf['mag'].nanmean().compute()
count_by_country = ddf.groupby('country')['mag'].count().compute()  # count skips NaNs
print(count_by_country.nlargest(10))

Real-world pattern: robust earthquake statistics from large catalogs — mean magnitude, depth distribution, event counts per region.


# Load large catalog (Dask for scale)
ddf = dd.read_csv('all_earthquakes.csv', assume_missing=True)

# Mean magnitude ignoring NaNs (parallel)
global_mean_mag = ddf['mag'].nanmean().compute()
print(f"Global mean magnitude: {global_mean_mag:.2f}")

# Mean depth by magnitude bin
ddf['mag_bin'] = ddf['mag'].map_partitions(lambda s: pd.cut(s, bins=[0,5,6,7,8,10]))
depth_by_bin = ddf.groupby('mag_bin')['depth'].nanmean().compute()
print(depth_by_bin)

# Count events per country, ignoring missing places
top_countries = ddf['place'].value_counts().nlargest(10).compute()
print(top_countries)

Best practices for nan-ignoring aggregation in earthquake analysis. Prefer nan-aware functions — np.nanmean, da.nanmean — over manual np.isnan masks. Modern tip: use Polars — pl.col('mag').mean(ignore_nulls=True) — fastest columnar aggregation with null handling. Use skipna=True in pandas — default, but explicit for clarity. Handle all-NaN groups — check count() before mean. Add type hints — def agg_nan(df: dd.DataFrame) -> float. Monitor memory — use Dask for >1 GB files. Use assume_missing=True in Dask CSV — handle mixed types/NaNs. Test with NaN patterns — all-NaN, edge cases. Use ddf.map_partitions — custom nan-handling per chunk. Use xarray — .mean(skipna=True) — labeled, dimension-aware. Use fillna — only when meaningful (e.g., depth=0). Profile with timeit — nanmean vs mask + mean. Use dask.diagnostics.ProgressBar() — progress during compute. Use client dashboard — monitor task/memory for large aggregations.

Aggregating while ignoring NaNs in earthquake data uses nan-aware functions in NumPy/Dask and skipna in pandas — compute robust magnitude means, depth averages, and event counts. In 2026, prefer nanmean/nansum, Polars ignore_nulls, xarray skipna, and test edge cases. Master nan-ignoring aggregation, and you’ll derive accurate insights from noisy or incomplete seismic datasets reliably and efficiently.

Next time your earthquake catalog has gaps — aggregate without NaN interference. It’s Python’s cleanest way to say: “Give me the true average magnitude — skip the missing shakes.”

Generating content...