Querying array memory usage in Python is crucial for optimizing performance, debugging memory issues, and ensuring scalability — especially with large numerical data in pandas/Polars, ML arrays, or scientific computing. Different tools give different views: nbytes for raw data size (NumPy/Polars), sys.getsizeof() for object overhead, psutil for process-level RSS/USS, and tracemalloc for allocation tracking. In 2026, the most accurate picture combines these: nbytes for array payload, psutil.memory_info().uss for unique process memory, and memory_profiler or tracemalloc for leaks. This helps you choose dtypes (e.g., int32 vs int64), use views instead of copies, and monitor peak usage in pipelines.
Here’s a complete, practical guide to querying memory usage of arrays in Python: NumPy/Polars nbytes, sys.getsizeof(), psutil process stats, tracemalloc tracking, real-world patterns, and modern best practices with type hints, profiling, and comparison across methods.
NumPy array memory — nbytes gives raw data size (elements × dtype.itemsize), excluding object overhead.
import numpy as np
arr = np.zeros(10_000_000, dtype=np.int32) # 10M × 4 bytes = 40 MB
print(arr.nbytes / (1024 ** 2), "MiB") # 38.15 MiB (raw data only)
# Compare dtypes
arr64 = np.zeros(10_000_000, dtype=np.int64) # 80 MB
print(arr64.nbytes / (1024 ** 2), "MiB") # 76.29 MiB
Polars Series/DataFrame — estimated_size() or memory_usage() gives columnar, compressed memory estimate.
import polars as pl
s = pl.Series("zeros", [0] * 10_000_000) # efficient columnar
print(s.estimated_size() / (1024 ** 2), "MiB") # ~38 MiB (int32-like)
df = pl.DataFrame({"A": range(10_000_000)})
print(df.estimated_size() / (1024 ** 2), "MiB") # ~76 MiB (two columns)
sys.getsizeof() — object overhead only (pointers, metadata); does NOT include array buffer for NumPy/Polars.
import sys
print(sys.getsizeof(arr) / 1024, "KiB") # ~80 KiB (NumPy object overhead)
print(sys.getsizeof(s) / 1024, "KiB") # ~40 KiB (Polars Series overhead)
psutil — process-level memory (RSS, USS) — most accurate for total Python usage.
import psutil
process = psutil.Process()
print(f"RSS: {process.memory_info().rss / (1024 ** 2):.2f} MiB") # physical memory
print(f"USS: {process.memory_info().uss / (1024 ** 2):.2f} MiB") # unique to this process
Best practices for querying array memory usage. Prefer nbytes/estimated_size() — raw data size for NumPy/Polars. Use psutil.memory_info().uss — most accurate for Python process footprint. Modern tip: use Polars — lower memory than pandas for large data (columnar, compressed). Profile with memory_profiler — @profile decorator for line-by-line memory. Monitor peak usage — psutil before/after heavy ops. Use tracemalloc — track allocations for leaks: tracemalloc.start(), then snapshot = tracemalloc.take_snapshot(). Choose dtype wisely — int32/float32 halves memory vs defaults. Use views — arr[::2] (no copy). Pre-allocate — np.zeros(n) faster than append. Use del + gc.collect() — free memory early. Add type hints — def func(arr: np.ndarray[np.int32]) -> None. Benchmark allocation — timeit or asv for speed/memory tradeoffs. Use numpy.empty — fastest (uninitialized). Use polars.scan_* — lazy allocation for large files.
Querying array memory usage combines nbytes (raw data), psutil.uss (process footprint), and tracemalloc (allocations) — choose based on need: data size, total usage, or leaks. In 2026, prefer Polars/NumPy dtypes, views, lazy mode, psutil monitoring, and memory_profiler. Master array memory queries, and you’ll build efficient, scalable Python code that handles massive datasets without OOM or waste.
Next time you allocate an array — query its memory. It’s Python’s cleanest way to say: “How much space does this really take?”