Creating a Dictionary from a File in Python: Simplify Data Mapping and Access

Creating a Dictionary from a File in Python: Simplify Data Mapping and Access is a fundamental skill for data ingestion, configuration loading, metadata parsing, and ETL workflows. Files like CSV, JSON, YAML, text (key=value), or even custom formats are common sources of structured data, and converting them into dictionaries enables fast lookups, dynamic access, and easy manipulation. In 2026, Python’s ecosystem has matured: Polars dominates for high-speed CSV/JSON/Parquet, pydantic validates and structures dicts, ruamel.yaml handles modern YAML safely, and tomllib (stdlib) parses TOML natively. This guide covers every practical technique — from basics to high-performance patterns — with real-world earthquake metadata examples.

Here’s a complete, practical guide to loading files into dictionaries in Python: CSV, JSON, YAML, text (key=value), TOML, real-world patterns (earthquake station config, event metadata mapping, multi-format parsing), and modern best practices with type hints, validation, performance, and integration with Polars/pandas/Dask/pydantic.

1. CSV ? Dictionary (Row-by-Row or Column-Mapped)


# csv.DictReader — classic, row-by-row as dicts
import csv

events = []
with open('earthquakes.csv', 'r', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        events.append({
            'time': row['time'],
            'mag': float(row['mag']),
            'place': row['place']
        })

print(events[:2])

# Polars: fastest columnar ? dict of lists or list of dicts
import polars as pl
df = pl.read_csv('earthquakes.csv')
data = df.to_dicts()  # list of dicts
print(data[:2])

# Or transpose to dict of columns
col_dict = df.to_dict(as_series=False)
print(col_dict['mag'][:5])  # list of magnitudes

2. JSON ? Dictionary (Native & Safe)


import json

# Single JSON object
with open('event.json', 'r', encoding='utf-8') as f:
    event = json.load(f)
print(event['mag'])

# JSON Lines (ndjson) — common for large datasets
events = []
with open('events.jsonl', 'r', encoding='utf-8') as f:
    for line in f:
        events.append(json.loads(line))

# Polars: read_ndjson — fast & memory-efficient
df_pl = pl.read_ndjson('events.jsonl')
print(df_pl.head())

3. Text (key=value) ? Dictionary


config = {}
with open('config.txt', 'r', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if line and not line.startswith('#'):
            key, value = line.split('=', 1)
            config[key.strip()] = value.strip()

print(config)
# {'threshold': '7.0', 'alert': 'yellow', ...}

4. YAML ? Dictionary (Safe Loading)


from ruamel.yaml import YAML

yaml = YAML(typ='safe')
with open('config.yaml', 'r', encoding='utf-8') as f:
    config = yaml.load(f)

print(config['database']['host'])

5. TOML ? Dictionary (stdlib since 3.11)


import tomllib

with open('pyproject.toml', 'rb') as f:
    config = tomllib.load(f)

print(config['tool']['black']['line-length'])

Real-world pattern: earthquake station config & metadata mapping — multi-format loading.


# Unified loader for different formats
def load_config(path: str) -> dict:
    ext = path.rsplit('.', 1)[-1].lower()
    if ext == 'csv':
        import csv
        data = {}
        with open(path, 'r', encoding='utf-8') as f:
            reader = csv.DictReader(f)
            for row in reader:
                data[row['station_id']] = row
        return data
    elif ext == 'json':
        import json
        with open(path, 'r', encoding='utf-8') as f:
            return json.load(f)
    elif ext == 'yaml' or ext == 'yml':
        from ruamel.yaml import YAML
        yaml = YAML(typ='safe')
        with open(path, 'r', encoding='utf-8') as f:
            return yaml.load(f)
    elif ext == 'toml':
        import tomllib
        with open(path, 'rb') as f:
            return tomllib.load(f)
    else:
        raise ValueError(f"Unsupported format: {ext}")

# Usage
stations = load_config('stations.csv')
config = load_config('analysis_config.yaml')
print(stations['STA001']['location'])

Best practices for loading files into dictionaries in 2026 Python. Prefer Polars read_csv()/read_ndjson() — fastest & memory-efficient for tabular data. Use pandas read_csv() — for legacy compatibility & rich ecosystem. Use Dask read_csv() — only for truly massive files. Always use encoding='utf-8' — avoid encoding errors. Use parse_dates / str.to_datetime — for time columns. Use dtype specification — prevent type inference issues. Use context manager — with open(...) as f: — for resource safety. Use try/except — catch FileNotFoundError, JSONDecodeError, etc. Add type hints — def load_data(path: str) -> dict[str, Any]: .... Use Pydantic models — for validated, typed output: model.model_validate(data). Use json.loads() with strict=False — for lenient JSON parsing. Use ruamel.yaml — for safe, round-trip YAML. Use tomllib — for TOML (stdlib). Use configparser — for INI-style files. Use csv.DictReader — for CSV row-as-dict. Use csv.reader — for raw rows. Use pl.read_csv(low_memory=False) — in pandas for mixed types. Use pl.scan_csv() — for lazy Polars queries. Use dd.read_csv(blocksize='64MB') — for Dask partitioning. Use df.to_dict('records') — pandas list-of-dicts. Use df.to_dict('list') — dict-of-lists. Use pl.DataFrame.to_dicts() — Polars list-of-dicts. Use pl.DataFrame.to_dict(as_series=False) — dict-of-lists. Use json.dump() — for writing dicts back. Use yaml.dump() — for YAML output. Use tomli_w.dump() — for TOML writing. Use csv.DictWriter — for writing dict rows. Use df.to_csv() — pandas export. Use df.write_csv() — Polars export. Use ddf.to_csv() — Dask export.

Creating dictionaries from files is foundational for data ingestion — master csv.DictReader, json.load, yaml.safe_load, tomllib, Polars read_*, and Pydantic validation. In 2026, lean on Polars for speed, Pydantic for safety, and choose the right loader for format & scale. These patterns simplify mapping file data to accessible, queryable structures.

Next time you have structured data in a file — reach for the right loader. It’s Python’s cleanest way to say: “Turn this file into a dictionary — fast, safe, and ready for use.”

1. CSV ? Dictionary (Row-by-Row or Column-Mapped)

2. JSON ? Dictionary (Native & Safe)

3. Text (key=value) ? Dictionary

4. YAML ? Dictionary (Safe Loading)

5. TOML ? Dictionary (stdlib since 3.11)

Generating content...