Querying DataFrame memory usage in Python is essential for optimizing performance, debugging memory issues, and ensuring scalability — especially with large datasets in pandas or Polars pipelines. Pandas provides memory_usage() for per-column and total usage (including deep object memory with deep=True), while Polars offers estimated_size() and memory_usage() for columnar, compressed estimates. In 2026, accurate memory profiling helps choose dtypes (int32/float32 vs int64/float64), detect leaks, compare pandas vs Polars efficiency, and prevent OOM errors in production. Combine with psutil for process-level RSS/USS and tracemalloc for allocation tracking to get the full picture.
Here’s a complete, practical guide to querying DataFrame memory usage in Python: pandas memory_usage(), Polars estimated_size()/memory_usage(), deep vs shallow, real-world patterns, and modern best practices with type hints, profiling, and comparison across libraries.
Pandas memory_usage() — returns Series with per-column usage; deep=True includes object memory (strings, lists, etc.).
import pandas as pd
df = pd.DataFrame({
'A': range(100_000),
'B': [f"text_{i}" for i in range(100_000)],
'C': [1.0] * 100_000
})
# Shallow usage (ignores object internals)
print(df.memory_usage(deep=False))
# Index 128
# A 800000
# B 800000
# C 800000
# dtype: int64
# Deep usage (includes strings, etc.)
print(df.memory_usage(deep=True))
# Index 128
# A 800000
# B 9200000 # strings take more
# C 800000
# dtype: int64
total_deep = df.memory_usage(deep=True).sum() / (1024 ** 2)
print(f"Total deep memory: {total_deep:.2f} MiB")
Polars estimated_size() — fast columnar estimate; memory_usage() for detailed breakdown.
import polars as pl
pl_df = pl.DataFrame({
'A': range(100_000),
'B': [f"text_{i}" for i in range(100_000)],
'C': [1.0] * 100_000
})
print(pl_df.estimated_size() / (1024 ** 2), "MiB") # ~9.5 MiB (columnar compression)
# Detailed per-column
print(pl_df.memory_usage())
# shape: (3,)
# ??????????????????????
# ? column ? memory_usage_bytes ?
# ? --- ? --- ?
# ? str ? u64 ?
# ??????????????????????
# ? A ? 800000 ?
# ? B ? 9200000 ?
# ? C ? 800000 ?
# ??????????????????????
Real-world pattern: memory profiling in pandas/Polars pipelines — track before/after heavy operations to detect leaks or optimize dtypes.
def log_memory(df, label: str = "DataFrame"):
mem_mb = df.memory_usage(deep=True).sum() / (1024 ** 2)
print(f"{label} memory: {mem_mb:.2f} MiB")
df = pd.read_csv('large.csv') # 1 GB
log_memory(df, "Raw")
cleaned = df.dropna().groupby('category').sum()
log_memory(cleaned, "Cleaned")
del df # release original
log_memory(cleaned, "After del")
Best practices for querying DataFrame memory usage. Prefer deep=True in pandas — captures object memory (strings, lists). Use Polars estimated_size() — fast, columnar estimate. Modern tip: use Polars — lower memory than pandas for large data (columnar, compressed, lazy). Profile before/after ops — detect temporary spikes or leaks. Use psutil.Process().memory_info().uss — process-level unique memory (most accurate). Monitor in production — log memory periodically with psutil. Downcast dtypes — df.astype({'A': 'int32', 'B': 'category'}) — halves memory. Use del + gc.collect() — free memory early. Use memory_profiler — @profile decorator for line-by-line memory. Use tracemalloc — track allocations for leaks. Compare pandas vs Polars — Polars often 2–5× lower memory. Add type hints — def func(df: pd.DataFrame) -> pd.DataFrame. Use df.memory_usage(index=True, deep=True) — include index. Use pl.Config.set_tbl_rows() — for Polars display without full load.
Querying DataFrame memory usage with memory_usage(deep=True) (pandas), estimated_size() (Polars), and psutil.uss gives accurate insights — optimize dtypes, detect leaks, compare libraries. In 2026, prefer Polars for lower footprint, profile before/after ops, use memory_profiler, and downcast aggressively. Master DataFrame memory queries, and you’ll build efficient, scalable Python code that handles massive datasets without OOM or waste.
Next time you load a large DataFrame — query its memory. It’s Python’s cleanest way to say: “How much RAM is this really taking?”