A dictionary of lists — by column is one of the most common and efficient ways to represent tabular data in Python before converting it to a DataFrame. Each key is a column name, and each value is a list of values for that column. This structure appears constantly: columnar JSON APIs, database query results, CSV parsing without pandas, or when building data column-by-column in loops.
In 2026, dict-of-lists remains the go-to intermediate format — especially when column order matters or you're streaming/accumulating data. Here’s a complete, practical guide: creating, modifying, converting, and using dict-of-lists efficiently.
1. Creating a Dictionary of Lists (By Column)
# Classic by-column style — keys are columns, values are full lists
data = {
'name': ['John', 'Jane', 'Mike', 'Susan'],
'age': [30, 25, 35, 40],
'gender': ['M', 'F', 'M', 'F'],
'city': ['New York', 'Chicago', 'Los Angeles', 'Seattle']
}
# Quick preview
for col, values in data.items():
print(f"{col}: {values}")
**Typical output:**
name: ['John', 'Jane', 'Mike', 'Susan']
age: [30, 25, 35, 40]
gender: ['M', 'F', 'M', 'F']
city: ['New York', 'Chicago', 'Los Angeles', 'Seattle']
text
2. Adding & Modifying Columns Dynamically
# Add a new column (entire list at once)
data['salary'] = [50000, 60000, 70000, 80000]
# Modify values in an existing column
data['age'][1] = 26 # Jane's age updated
data['city'][-1] = 'Denver' # Susan moved
# Add value to every row in a column (append to each list)
for lst in data.values():
lst.append(None) # or some default
# Or safer: append to specific columns only
data['salary'].append(75000)
data['age'].append(29)
print("Updated columns:")
for col, values in data.items():
print(f"{col}: {values}")
3. Converting to DataFrame (Most Common Next Step)
import pandas as pd
# Easiest & fastest way
df = pd.DataFrame(data)
print(df.head())
# Or with Polars (2026 speed favorite for large dicts)
import polars as pl
df_pl = pl.DataFrame(data)
print(df_pl.head())
4. From Dict of Lists to Other Formats (JSON, CSV, etc.)
import json
# To JSON (columnar style — common for APIs)
json_str = json.dumps(data, indent=2)
print("JSON output (columnar):\n", json_str[:300], "...")
# To CSV string (no file needed)
csv_str = pd.DataFrame(data).to_csv(index=False)
print("\nCSV preview:\n", '\n'.join(csv_str.splitlines()[:5]))
5. Common Gotchas & Best Practices (2026 Edition)
- Unequal list lengths ? pandas/Polars will raise error or fill with NaN ? always validate first:
all(len(lst) == len(data['name']) for lst in data.values()) - Performance ? appending to many lists in loop ? slow on large data ? prefer building full columns at once or use list comprehension
- Order preservation ? Python 3.7+ dicts preserve key insertion order — safe for column order in DataFrame
- Nested data ? if lists contain dicts/lists ? flatten first or use
pd.json_normalize() - Production tip ? log column lengths before conversion — catch silent misalignment early
- Modern alternative ? for streaming/very large columnar data ? consider Polars
pl.DataFramedirectly orpyarrowRecordBatch
Conclusion
A dictionary of lists — by column — is Python’s most efficient columnar intermediate format. It’s perfect for building data incrementally, parsing columnar JSON, or preparing inputs before pandas/Polars conversion. In 2026, create columns naturally, validate lengths, append/modify safely, and convert to DataFrame quickly. Master this structure, and you’ll handle structured data flows with confidence — from raw lists to powerful analysis in seconds.
Next time you’re parsing API results column-by-column or building data incrementally — use a dictionary of lists. It’s the clean bridge between Python primitives and full tabular power.