len() is one of Python’s most frequently used built-in functions — it returns the number of items (length) in an object that supports the length protocol (__len__()), such as strings, lists, tuples, dictionaries, sets, ranges, NumPy arrays, pandas Series/DataFrames, Polars DataFrames, and Dask objects. In 2026, len() remains a cornerstone in data science (checking DataFrame rows/columns, array sizes), software engineering (input validation, loop bounds), and performance-critical code — fast (O(1) for most built-ins), readable, and universally supported across Python’s data ecosystem.
Here’s a complete, practical guide to using len() in Python: basic length checks, common types & behaviors, real-world patterns (earthquake DataFrame inspection, chunk sizing, validation), and modern best practices with type hints, performance, edge cases, and integration with pandas/Polars/Dask/NumPy/xarray.
Basic len() usage — length of strings, lists, tuples, dicts, sets.
print(len("Hello, World!")) # 13 (characters)
print(len([1, 2, 3, 4])) # 4 (items)
print(len((10, 20))) # 2
print(len({"a": 1, "b": 2})) # 2 (key count)
print(len({1, 2, 3, 3})) # 3 (unique items)
print(len(range(100))) # 100
print(len("")) # 0 (empty string)
print(len([])) # 0 (empty list)
len() with data science objects — pandas, Polars, Dask, NumPy, xarray.
import pandas as pd
import polars as pl
import dask.dataframe as dd
import numpy as np
import xarray as xr
df_pd = pd.DataFrame({"mag": [7.2, 6.8, 5.9]})
print(len(df_pd)) # 3 (rows)
print(len(df_pd.columns)) # 1 (columns)
df_pl = pl.DataFrame({"mag": [7.2, 6.8, 5.9]})
print(len(df_pl)) # 3 (rows)
print(len(df_pl.columns)) # 1 (columns)
ddf = dd.from_pandas(df_pd, npartitions=2)
print(len(ddf)) # 3 (rows — computes if needed)
arr = np.array([[1, 2], [3, 4], [5, 6]])
print(len(arr)) # 3 (first axis length)
ds = xr.Dataset({"mag": (("time",), [7.2, 6.8])})
print(len(ds["mag"])) # 2 (length along 'time')
print(len(ds.dims)) # 1 (number of dimensions)
Real-world pattern: earthquake data inspection & chunking — use len() for validation & sizing.
import dask.dataframe as dd
ddf = dd.read_csv('earthquakes/*.csv', blocksize='64MB')
# Basic inspection
print(f"Total events: {len(ddf)}") # computes row count
print(f"Columns: {len(ddf.columns)}") # column count (fast)
print(f"Partitions: {ddf.npartitions}") # Dask-specific
# Validate required columns
required = ['time', 'mag', 'latitude', 'longitude', 'depth']
missing = [col for col in required if col not in ddf.columns]
if missing:
print(f"Missing columns: {missing}")
else:
print("All required columns present")
# Chunk-aware processing
for i, chunk in enumerate(ddf.to_delayed()):
df_chunk = chunk.compute()
print(f"Chunk {i+1}: {len(df_chunk)} rows")
strong = df_chunk[df_chunk['mag'] >= 7.0]
if len(strong) > 0:
print(f" Strong events in chunk {i+1}: {len(strong)}")
Best practices for len() in Python & data workflows. Prefer len(obj) — over obj.__len__() (cleaner, safer). Modern tip: use Polars df.shape[0] — for row count; Dask len(ddf) computes lazily. Use len(df.columns) — for column count (pandas/Polars). Use len(df) — for rows in pandas/Dask. Use len(arr.shape) — for number of dimensions in NumPy/xarray. Add type hints — def check_length(seq: Iterable[Any]) -> int: return len(seq). Avoid len() on generators — consumes them; use sum(1 for _ in gen) instead. Use len() in assertions — assert len(df) > 0. Use len() with enumerate() — for i, item in enumerate(seq): ... if i == len(seq)-1. Use len(set(seq)) — for unique count. Use len(df.dropna()) — for non-null rows. Use len(df.query('mag >= 7.0')) — filtered count (pandas). Use df.shape[0] — preferred over len(df) in pandas for clarity. Use len(ddf.compute()) — careful with large Dask objects (materializes). Use ddf.shape[0].compute() — Dask row count. Use pl.DataFrame.shape[0] — Polars row count. Use np.size(arr) — total elements in NumPy array. Use arr.shape[0] — first axis length in NumPy/xarray. Use len(ds.dims) — number of dimensions in xarray.
len(obj) returns the number of items in an object — strings (chars), lists/tuples (elements), dicts/sets (keys/items), DataFrames (rows), arrays (first axis). In 2026, use for validation, sizing, chunking, and integrate with pandas/Polars/Dask/NumPy for data inspection. Master len(), and you’ll write concise, efficient code for any collection or data structure.
Next time you need to know “how many?” — use len(). It’s Python’s cleanest way to say: “Tell me the size of this thing — fast and reliable.”