Analyzing Earthquake Data

Analyzing Earthquake Data is a classic real-world task in data science and geophysics — exploring seismic events to understand patterns, magnitudes, locations, depths, and risks. The USGS provides open, real-time earthquake catalogs via their FDSN web services, downloadable as CSV, making it easy to fetch, process, and visualize recent or historical data. In 2026, this workflow is standard for students, researchers, journalists, and disaster analysts — using pandas/Dask for loading and aggregation, matplotlib/seaborn/hvplot for visualization, and geospatial tools (geopandas, cartopy, folium) for mapping epicenters. With Dask, you can scale to years of global data without memory limits; with Polars, you get blazing-fast columnar queries on large files.

Here’s a complete, practical guide to analyzing earthquake data in Python: downloading USGS data, loading with pandas/Dask/Polars, cleaning & exploration, aggregation (by country, magnitude, depth), visualization (histograms, time series, maps), and modern best practices with type hints, lazy loading, geospatial integration, and performance tips.

Downloading USGS earthquake data — use FDSN query for CSV (customizable by time, magnitude, etc.).


import requests
import pandas as pd
from datetime import datetime

# Customize query: magnitude ? 6, last 2 years, ordered by time
base_url = "https://earthquake.usgs.gov/fdsnws/event/1/query"
params = {
    "format": "csv",
    "starttime": "2024-01-01",
    "endtime": datetime.now().strftime("%Y-%m-%d"),
    "minmagnitude": "6",
    "orderby": "time-asc"
}

response = requests.get(base_url, params=params)
response.raise_for_status()  # raise if download fails

# Save to file or load directly
with open("earthquakes_recent.csv", "w") as f:
    f.write(response.text)

# Or load into pandas immediately
df = pd.read_csv(response.url)  # direct from URL
print(df.head())

Loading & initial exploration — use pandas for small data, Dask/Polars for large catalogs.


# Pandas (simple, in-memory)
df = pd.read_csv("earthquakes_recent.csv")
print(df.info())
print(df.describe())

# Dask (lazy, scales to huge files)
import dask.dataframe as dd
ddf = dd.read_csv("earthquakes_recent.csv")
print(ddf.head())  # computes small preview
print(ddf['mag'].mean().compute())  # parallel mean

# Polars (fast columnar, memory-efficient)
import polars as pl
pl_df = pl.read_csv("earthquakes_recent.csv")
print(pl_df.describe())
print(pl_df.group_by("country").len().sort("len", descending=True).head(10))

Cleaning & feature engineering — handle missing values, convert time, extract useful columns.


# Pandas cleaning
df['time'] = pd.to_datetime(df['time'])
df = df.dropna(subset=['mag', 'latitude', 'longitude', 'depth'])
df['country'] = df['place'].str.split(',').str[-1].str.strip()  # rough country extraction

# Dask equivalent (lazy)
ddf['time'] = dd.to_datetime(ddf['time'])
ddf = ddf.dropna(subset=['mag', 'latitude', 'longitude', 'depth'])

# Polars (fast)
pl_df = pl_df.with_columns(pl.col("time").str.strptime(pl.Datetime))
pl_df = pl_df.drop_nulls(subset=["mag", "latitude", "longitude", "depth"])

Aggregation examples — count by country, magnitude distribution, depth stats.


# Top countries by count (Polars fastest)
top_countries = pl_df.group_by("country").len().sort("len", descending=True).head(10)
print(top_countries)

# Magnitude histogram (Dask)
mag_hist = ddf['mag'].value_counts().compute()
mag_hist.plot(kind='bar')
plt.title("Magnitude Distribution")
plt.show()

# Average depth by magnitude bin (pandas)
df['mag_bin'] = pd.cut(df['mag'], bins=[5, 6, 7, 8, 10])
depth_by_mag = df.groupby('mag_bin')['depth'].mean()
print(depth_by_mag)

Visualization patterns — magnitude histogram, time series count, geospatial map.


import matplotlib.pyplot as plt
import seaborn as sns
import cartopy.crs as ccrs

# Magnitude histogram
sns.histplot(df['mag'], bins=30)
plt.title("Earthquake Magnitude Distribution")
plt.xlabel("Magnitude")
plt.show()

# Time series of daily counts
daily_counts = df.resample('D', on='time').size()
daily_counts.plot()
plt.title("Daily Earthquake Count")
plt.show()

# Geospatial map (cartopy)
fig = plt.figure(figsize=(12, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
scatter = ax.scatter(df['longitude'], df['latitude'], 
                     c=df['mag'], s=df['mag']**2, cmap='Reds', alpha=0.6,
                     transform=ccrs.PlateCarree())
plt.colorbar(scatter, label='Magnitude')
plt.title("Global Earthquakes (M ? 6)")
plt.show()

Best practices for earthquake data analysis. Use USGS FDSN query — customize time/magnitude/location for targeted downloads. Modern tip: use Polars for fast aggregation/counts — columnar speed beats pandas/Dask on large CSVs. Handle time correctly — pd.to_datetime or pl.Datetime. Clean place/country — use regex or geopandas for accurate mapping. Visualize with cartopy/holoviews — interactive maps for exploration. Add type hints — def analyze_eq(df: pd.DataFrame) -> None. Monitor memory — use Dask for >1 GB files. Use dd.read_csv(assume_missing=True) — handle mixed types. Test on subsets — df.head(1000) for quick iteration. Use xarray — for gridded earthquake data (if available). Use hvplot — interactive time series/maps. Use geopandas — accurate spatial joins/countries. Profile with timeit — compare pandas vs Polars vs Dask.

Analyzing earthquake data with USGS CSV downloads, pandas/Dask/Polars loading, cleaning, aggregation (by country/magnitude), and visualization (histograms, time series, maps) reveals seismic patterns and risks. In 2026, use Polars for speed, Dask for scale, cartopy/hvplot for maps, and always handle time & missing values correctly. Master earthquake analysis, and you’ll turn raw seismic data into meaningful insights efficiently and reliably.

Next time you want to study earthquakes — download, load, aggregate, and map them. It’s Python’s cleanest way to say: “Let’s understand the shaking world — one data point at a time.”

Generating content...