Querying DataFrame memory usage

Querying DataFrame memory usage in Python is essential for optimizing performance, debugging memory issues, and ensuring scalability — especially with large datasets in pandas or Polars pipelines. Pandas provides memory_usage() for per-column and total usage (including deep object memory with deep=True), while Polars offers estimated_size() and memory_usage() for columnar, compressed estimates. In 2026, accurate memory profiling helps choose dtypes (int32/float32 vs int64/float64), detect leaks, compare pandas vs Polars efficiency, and prevent OOM errors in production. Combine with psutil for process-level RSS/USS and tracemalloc for allocation tracking to get the full picture.

Here’s a complete, practical guide to querying DataFrame memory usage in Python: pandas memory_usage(), Polars estimated_size()/memory_usage(), deep vs shallow, real-world patterns, and modern best practices with type hints, profiling, and comparison across libraries.

Pandas memory_usage() — returns Series with per-column usage; deep=True includes object memory (strings, lists, etc.).


import pandas as pd

df = pd.DataFrame({
    'A': range(100_000),
    'B': [f"text_{i}" for i in range(100_000)],
    'C': [1.0] * 100_000
})

# Shallow usage (ignores object internals)
print(df.memory_usage(deep=False))
# Index    128
# A      800000
# B      800000
# C      800000
# dtype: int64

# Deep usage (includes strings, etc.)
print(df.memory_usage(deep=True))
# Index       128
# A        800000
# B      9200000   # strings take more
# C        800000
# dtype: int64

total_deep = df.memory_usage(deep=True).sum() / (1024 ** 2)
print(f"Total deep memory: {total_deep:.2f} MiB")

Polars estimated_size() — fast columnar estimate; memory_usage() for detailed breakdown.


import polars as pl

pl_df = pl.DataFrame({
    'A': range(100_000),
    'B': [f"text_{i}" for i in range(100_000)],
    'C': [1.0] * 100_000
})

print(pl_df.estimated_size() / (1024 ** 2), "MiB")  # ~9.5 MiB (columnar compression)

# Detailed per-column
print(pl_df.memory_usage())
# shape: (3,)
# ??????????????????????
# ? column ? memory_usage_bytes ?
# ? ---   ? ---        ?
# ? str   ? u64        ?
# ??????????????????????
# ? A     ? 800000     ?
# ? B     ? 9200000    ?
# ? C     ? 800000     ?
# ??????????????????????

Real-world pattern: memory profiling in pandas/Polars pipelines — track before/after heavy operations to detect leaks or optimize dtypes.


def log_memory(df, label: str = "DataFrame"):
    mem_mb = df.memory_usage(deep=True).sum() / (1024 ** 2)
    print(f"{label} memory: {mem_mb:.2f} MiB")

df = pd.read_csv('large.csv')  # 1 GB
log_memory(df, "Raw")

cleaned = df.dropna().groupby('category').sum()
log_memory(cleaned, "Cleaned")

del df  # release original
log_memory(cleaned, "After del")

Best practices for querying DataFrame memory usage. Prefer deep=True in pandas — captures object memory (strings, lists). Use Polars estimated_size() — fast, columnar estimate. Modern tip: use Polars — lower memory than pandas for large data (columnar, compressed, lazy). Profile before/after ops — detect temporary spikes or leaks. Use psutil.Process().memory_info().uss — process-level unique memory (most accurate). Monitor in production — log memory periodically with psutil. Downcast dtypes — df.astype({'A': 'int32', 'B': 'category'}) — halves memory. Use del + gc.collect() — free memory early. Use memory_profiler — @profile decorator for line-by-line memory. Use tracemalloc — track allocations for leaks. Compare pandas vs Polars — Polars often 2–5× lower memory. Add type hints — def func(df: pd.DataFrame) -> pd.DataFrame. Use df.memory_usage(index=True, deep=True) — include index. Use pl.Config.set_tbl_rows() — for Polars display without full load.

Querying DataFrame memory usage with memory_usage(deep=True) (pandas), estimated_size() (Polars), and psutil.uss gives accurate insights — optimize dtypes, detect leaks, compare libraries. In 2026, prefer Polars for lower footprint, profile before/after ops, use memory_profiler, and downcast aggressively. Master DataFrame memory queries, and you’ll build efficient, scalable Python code that handles massive datasets without OOM or waste.

Next time you load a large DataFrame — query its memory. It’s Python’s cleanest way to say: “How much RAM is this really taking?”

Generating content...