Querying array memory Usage

Querying array memory usage in Python is crucial for optimizing performance, debugging memory issues, and ensuring scalability — especially with large numerical data in pandas/Polars, ML arrays, or scientific computing. Different tools give different views: nbytes for raw data size (NumPy/Polars), sys.getsizeof() for object overhead, psutil for process-level RSS/USS, and tracemalloc for allocation tracking. In 2026, the most accurate picture combines these: nbytes for array payload, psutil.memory_info().uss for unique process memory, and memory_profiler or tracemalloc for leaks. This helps you choose dtypes (e.g., int32 vs int64), use views instead of copies, and monitor peak usage in pipelines.

Here’s a complete, practical guide to querying memory usage of arrays in Python: NumPy/Polars nbytes, sys.getsizeof(), psutil process stats, tracemalloc tracking, real-world patterns, and modern best practices with type hints, profiling, and comparison across methods.

NumPy array memory — nbytes gives raw data size (elements × dtype.itemsize), excluding object overhead.


import numpy as np

arr = np.zeros(10_000_000, dtype=np.int32)  # 10M × 4 bytes = 40 MB
print(arr.nbytes / (1024 ** 2), "MiB")      # 38.15 MiB (raw data only)

# Compare dtypes
arr64 = np.zeros(10_000_000, dtype=np.int64)  # 80 MB
print(arr64.nbytes / (1024 ** 2), "MiB")      # 76.29 MiB

Polars Series/DataFrame — estimated_size() or memory_usage() gives columnar, compressed memory estimate.


import polars as pl

s = pl.Series("zeros", [0] * 10_000_000)  # efficient columnar
print(s.estimated_size() / (1024 ** 2), "MiB")  # ~38 MiB (int32-like)

df = pl.DataFrame({"A": range(10_000_000)})
print(df.estimated_size() / (1024 ** 2), "MiB")  # ~76 MiB (two columns)

sys.getsizeof() — object overhead only (pointers, metadata); does NOT include array buffer for NumPy/Polars.


import sys

print(sys.getsizeof(arr) / 1024, "KiB")         # ~80 KiB (NumPy object overhead)
print(sys.getsizeof(s) / 1024, "KiB")           # ~40 KiB (Polars Series overhead)

psutil — process-level memory (RSS, USS) — most accurate for total Python usage.


import psutil

process = psutil.Process()
print(f"RSS: {process.memory_info().rss / (1024 ** 2):.2f} MiB")   # physical memory
print(f"USS: {process.memory_info().uss / (1024 ** 2):.2f} MiB")   # unique to this process

Best practices for querying array memory usage. Prefer nbytes/estimated_size() — raw data size for NumPy/Polars. Use psutil.memory_info().uss — most accurate for Python process footprint. Modern tip: use Polars — lower memory than pandas for large data (columnar, compressed). Profile with memory_profiler — @profile decorator for line-by-line memory. Monitor peak usage — psutil before/after heavy ops. Use tracemalloc — track allocations for leaks: tracemalloc.start(), then snapshot = tracemalloc.take_snapshot(). Choose dtype wisely — int32/float32 halves memory vs defaults. Use views — arr[::2] (no copy). Pre-allocate — np.zeros(n) faster than append. Use del + gc.collect() — free memory early. Add type hints — def func(arr: np.ndarray[np.int32]) -> None. Benchmark allocation — timeit or asv for speed/memory tradeoffs. Use numpy.empty — fastest (uninitialized). Use polars.scan_* — lazy allocation for large files.

Querying array memory usage combines nbytes (raw data), psutil.uss (process footprint), and tracemalloc (allocations) — choose based on need: data size, total usage, or leaks. In 2026, prefer Polars/NumPy dtypes, views, lazy mode, psutil monitoring, and memory_profiler. Master array memory queries, and you’ll build efficient, scalable Python code that handles massive datasets without OOM or waste.

Next time you allocate an array — query its memory. It’s Python’s cleanest way to say: “How much space does this really take?”

Generating content...