Using json module

Using json module is fundamental for working with JSON data in Python — the built-in json library provides simple, reliable methods to serialize (encode) Python objects to JSON strings/files and deserialize (decode) JSON back into Python data structures (dicts, lists, etc.). In 2026, json remains the standard for API responses, configuration files, data exchange, logs, earthquake metadata, web scraping, and NoSQL exports — fast enough for most use cases, human-readable, and universally compatible. For higher performance on large JSON/JSONL files, pair with orjson or ujson; for parallel processing, use Dask Bags; for tabular JSON, use pandas/Polars; for labeled multi-dimensional JSON-derived data, use xarray.

Here’s a complete, practical guide to using the json module in Python: loading/dumping single files, handling JSONL/NDJSON, parsing errors/safe loading, real-world patterns (earthquake metadata, API responses), and modern best practices with type hints, performance, error handling, and integration with Dask/Polars/pandas/xarray.

Basic JSON loading & dumping — read/write single JSON objects or arrays.


import json

# Load from file (object or array)
with open('earthquake.json', 'r') as f:
    data = json.load(f)  # dict or list

print(data['features'][0]['properties']['mag'])  # access nested magnitude

# Dump to file (write Python dict/list to JSON)
events = [
    {'id': 1, 'mag': 6.2, 'place': 'California'},
    {'id': 2, 'mag': 7.1, 'place': 'Japan'}
]

with open('events_output.json', 'w') as f:
    json.dump(events, f, indent=2)  # pretty-print with indent

# String operations
json_str = json.dumps(events, indent=2)
print(json_str)
loaded_back = json.loads(json_str)
print(loaded_back == events)  # True

Handling JSONL/NDJSON — line-delimited JSON, common for large datasets/logs.


# Read JSONL line-by-line (memory-efficient)
records = []
with open('quakes.jsonl', 'r') as f:
    for line in f:
        if line.strip():  # skip empty lines
            records.append(json.loads(line))

print(len(records))  # number of events

# Or list comprehension
records = [json.loads(line) for line in open('quakes.jsonl') if line.strip()]

# Write JSONL
with open('filtered_quakes.jsonl', 'w') as f:
    for event in records:
        if event.get('mag', 0) >= 6.0:
            f.write(json.dumps(event) + '\n')

Real-world pattern: processing USGS-style earthquake JSON/JSONL — parse, filter, aggregate.


# Load multi-event JSON array
with open('usgs_batch.json', 'r') as f:
    events = json.load(f)['features']  # USGS format: {'features': [...]}

# Filter strong events (M?7)
strong_events = [e for e in events if e['properties']['mag'] >= 7.0]

# Extract key fields
extracted = [{
    'id': e['id'],
    'time': e['properties']['time'],
    'mag': e['properties']['mag'],
    'lat': e['geometry']['coordinates'][1],
    'lon': e['geometry']['coordinates'][0],
    'depth': e['geometry']['coordinates'][2],
    'place': e['properties']['place']
} for e in strong_events]

# Save filtered to JSONL
with open('strong_quakes.jsonl', 'w') as f:
    for e in extracted:
        f.write(json.dumps(e) + '\n')

# Quick stats
print(f"Strong events: {len(extracted)}")
print(f"Max magnitude: {max(e['mag'] for e in extracted):.1f}")

Best practices for using json module. Prefer orjson or ujson — 2–5× faster parsing/dumping for large JSON. Modern tip: use Polars pl.read_ndjson('file.jsonl') — fastest for tabular JSONL; use Dask Bags db.read_text(...).map(json.loads) for parallel processing. Validate early — wrap json.loads in try/except for robust pipelines. Use indent=2 — for human-readable output during debugging. Use ensure_ascii=False — preserve Unicode characters. Handle large files — process line-by-line for JSONL, avoid loading entire JSON array. Use json.tool — command-line validation: python -m json.tool file.json. Add type hints — def load_json(path: str) -> dict | list. Use json.decoder.JSONDecodeError — specific error catching. Use default= — custom serialization for objects. Use object_hook= — custom deserialization. Profile with timeit — compare json vs orjson. Use db.from_sequence(lines).map(json.loads) — parallel parsing. Use pandas.read_json(lines=True) — for quick tabular JSONL. Use xarray — for multidimensional JSON-derived data.

Using the json module handles JSON/JSONL loading, dumping, parsing, and validation — read single files or line-by-line, filter/transform safely, and integrate with pandas/Polars/Dask/xarray. In 2026, use fast parsers like orjson, process line-by-line for large files, validate early, and prefer Polars/Dask for scale. Master json, and you’ll work with structured data exchange reliably, efficiently, and at any scale.

Next time you encounter JSON or JSONL — use the json module. It’s Python’s cleanest way to say: “Parse and generate structured data — simple, fast, and everywhere.”

Generating content...