PyInns - Home

A dictionary of lists — by column is one of the most common and efficient ways to represent tabular data in Python before converting it to a DataFrame. Each key is a column name, and each value is a list of values for that column. This structure appears constantly: columnar JSON APIs, database query results, CSV parsing without pandas, or when building data column-by-column in loops.

In 2026, dict-of-lists remains the go-to intermediate format — especially when column order matters or you're streaming/accumulating data. Here’s a complete, practical guide: creating, modifying, converting, and using dict-of-lists efficiently.

1. Creating a Dictionary of Lists (By Column)


# Classic by-column style — keys are columns, values are full lists
data = {
    'name':   ['John',   'Jane',   'Mike',   'Susan'],
    'age':    [30,       25,       35,       40],
    'gender': ['M',      'F',      'M',      'F'],
    'city':   ['New York', 'Chicago', 'Los Angeles', 'Seattle']
}

# Quick preview
for col, values in data.items():
    print(f"{col}: {values}")


**Typical output:**
name: ['John', 'Jane', 'Mike', 'Susan']
age: [30, 25, 35, 40]
gender: ['M', 'F', 'M', 'F']
city: ['New York', 'Chicago', 'Los Angeles', 'Seattle']
text

2. Adding & Modifying Columns Dynamically


# Add a new column (entire list at once)
data['salary'] = [50000, 60000, 70000, 80000]

# Modify values in an existing column
data['age'][1] = 26          # Jane's age updated
data['city'][-1] = 'Denver'  # Susan moved

# Add value to every row in a column (append to each list)
for lst in data.values():
    lst.append(None)  # or some default

# Or safer: append to specific columns only
data['salary'].append(75000)
data['age'].append(29)

print("Updated columns:")
for col, values in data.items():
    print(f"{col}: {values}")

3. Converting to DataFrame (Most Common Next Step)


import pandas as pd

# Easiest & fastest way
df = pd.DataFrame(data)
print(df.head())

# Or with Polars (2026 speed favorite for large dicts)
import polars as pl
df_pl = pl.DataFrame(data)
print(df_pl.head())

4. From Dict of Lists to Other Formats (JSON, CSV, etc.)


import json

# To JSON (columnar style — common for APIs)
json_str = json.dumps(data, indent=2)
print("JSON output (columnar):\n", json_str[:300], "...")

# To CSV string (no file needed)
csv_str = pd.DataFrame(data).to_csv(index=False)
print("\nCSV preview:\n", '\n'.join(csv_str.splitlines()[:5]))

5. Common Gotchas & Best Practices (2026 Edition)

Unequal list lengths ? pandas/Polars will raise error or fill with NaN ? always validate first: all(len(lst) == len(data['name']) for lst in data.values())
Performance ? appending to many lists in loop ? slow on large data ? prefer building full columns at once or use list comprehension
Order preservation ? Python 3.7+ dicts preserve key insertion order — safe for column order in DataFrame
Nested data ? if lists contain dicts/lists ? flatten first or use pd.json_normalize()
Production tip ? log column lengths before conversion — catch silent misalignment early
Modern alternative ? for streaming/very large columnar data ? consider Polars pl.DataFrame directly or pyarrow RecordBatch

Conclusion

A dictionary of lists — by column — is Python’s most efficient columnar intermediate format. It’s perfect for building data incrementally, parsing columnar JSON, or preparing inputs before pandas/Polars conversion. In 2026, create columns naturally, validate lengths, append/modify safely, and convert to DataFrame quickly. Master this structure, and you’ll handle structured data flows with confidence — from raw lists to powerful analysis in seconds.

Next time you’re parsing API results column-by-column or building data incrementally — use a dictionary of lists. It’s the clean bridge between Python primitives and full tabular power.