Creating a Dictionary from a File in Python: Simplify Data Mapping and Access is a fundamental skill for data ingestion, configuration loading, metadata parsing, and ETL workflows. Files like CSV, JSON, YAML, text (key=value), or even custom formats are common sources of structured data, and converting them into dictionaries enables fast lookups, dynamic access, and easy manipulation. In 2026, Python’s ecosystem has matured: Polars dominates for high-speed CSV/JSON/Parquet, pydantic validates and structures dicts, ruamel.yaml handles modern YAML safely, and tomllib (stdlib) parses TOML natively. This guide covers every practical technique — from basics to high-performance patterns — with real-world earthquake metadata examples.
Here’s a complete, practical guide to loading files into dictionaries in Python: CSV, JSON, YAML, text (key=value), TOML, real-world patterns (earthquake station config, event metadata mapping, multi-format parsing), and modern best practices with type hints, validation, performance, and integration with Polars/pandas/Dask/pydantic.
1. CSV ? Dictionary (Row-by-Row or Column-Mapped)
# csv.DictReader — classic, row-by-row as dicts
import csv
events = []
with open('earthquakes.csv', 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
events.append({
'time': row['time'],
'mag': float(row['mag']),
'place': row['place']
})
print(events[:2])
# Polars: fastest columnar ? dict of lists or list of dicts
import polars as pl
df = pl.read_csv('earthquakes.csv')
data = df.to_dicts() # list of dicts
print(data[:2])
# Or transpose to dict of columns
col_dict = df.to_dict(as_series=False)
print(col_dict['mag'][:5]) # list of magnitudes
2. JSON ? Dictionary (Native & Safe)
import json
# Single JSON object
with open('event.json', 'r', encoding='utf-8') as f:
event = json.load(f)
print(event['mag'])
# JSON Lines (ndjson) — common for large datasets
events = []
with open('events.jsonl', 'r', encoding='utf-8') as f:
for line in f:
events.append(json.loads(line))
# Polars: read_ndjson — fast & memory-efficient
df_pl = pl.read_ndjson('events.jsonl')
print(df_pl.head())
3. Text (key=value) ? Dictionary
config = {}
with open('config.txt', 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if line and not line.startswith('#'):
key, value = line.split('=', 1)
config[key.strip()] = value.strip()
print(config)
# {'threshold': '7.0', 'alert': 'yellow', ...}
4. YAML ? Dictionary (Safe Loading)
from ruamel.yaml import YAML
yaml = YAML(typ='safe')
with open('config.yaml', 'r', encoding='utf-8') as f:
config = yaml.load(f)
print(config['database']['host'])
5. TOML ? Dictionary (stdlib since 3.11)
import tomllib
with open('pyproject.toml', 'rb') as f:
config = tomllib.load(f)
print(config['tool']['black']['line-length'])
Real-world pattern: earthquake station config & metadata mapping — multi-format loading.
# Unified loader for different formats
def load_config(path: str) -> dict:
ext = path.rsplit('.', 1)[-1].lower()
if ext == 'csv':
import csv
data = {}
with open(path, 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
data[row['station_id']] = row
return data
elif ext == 'json':
import json
with open(path, 'r', encoding='utf-8') as f:
return json.load(f)
elif ext == 'yaml' or ext == 'yml':
from ruamel.yaml import YAML
yaml = YAML(typ='safe')
with open(path, 'r', encoding='utf-8') as f:
return yaml.load(f)
elif ext == 'toml':
import tomllib
with open(path, 'rb') as f:
return tomllib.load(f)
else:
raise ValueError(f"Unsupported format: {ext}")
# Usage
stations = load_config('stations.csv')
config = load_config('analysis_config.yaml')
print(stations['STA001']['location'])
Best practices for loading files into dictionaries in 2026 Python. Prefer Polars read_csv()/read_ndjson() — fastest & memory-efficient for tabular data. Use pandas read_csv() — for legacy compatibility & rich ecosystem. Use Dask read_csv() — only for truly massive files. Always use encoding='utf-8' — avoid encoding errors. Use parse_dates / str.to_datetime — for time columns. Use dtype specification — prevent type inference issues. Use context manager — with open(...) as f: — for resource safety. Use try/except — catch FileNotFoundError, JSONDecodeError, etc. Add type hints — def load_data(path: str) -> dict[str, Any]: .... Use Pydantic models — for validated, typed output: model.model_validate(data). Use json.loads() with strict=False — for lenient JSON parsing. Use ruamel.yaml — for safe, round-trip YAML. Use tomllib — for TOML (stdlib). Use configparser — for INI-style files. Use csv.DictReader — for CSV row-as-dict. Use csv.reader — for raw rows. Use pl.read_csv(low_memory=False) — in pandas for mixed types. Use pl.scan_csv() — for lazy Polars queries. Use dd.read_csv(blocksize='64MB') — for Dask partitioning. Use df.to_dict('records') — pandas list-of-dicts. Use df.to_dict('list') — dict-of-lists. Use pl.DataFrame.to_dicts() — Polars list-of-dicts. Use pl.DataFrame.to_dict(as_series=False) — dict-of-lists. Use json.dump() — for writing dicts back. Use yaml.dump() — for YAML output. Use tomli_w.dump() — for TOML writing. Use csv.DictWriter — for writing dict rows. Use df.to_csv() — pandas export. Use df.write_csv() — Polars export. Use ddf.to_csv() — Dask export.
Creating dictionaries from files is foundational for data ingestion — master csv.DictReader, json.load, yaml.safe_load, tomllib, Polars read_*, and Pydantic validation. In 2026, lean on Polars for speed, Pydantic for safety, and choose the right loader for format & scale. These patterns simplify mapping file data to accessible, queryable structures.
Next time you have structured data in a file — reach for the right loader. It’s Python’s cleanest way to say: “Turn this file into a dictionary — fast, safe, and ready for use.”