Using pd.concat() is the go-to method in pandas for combining multiple DataFrames (or Series) — either vertically (stacking rows, axis=0, default) or horizontally (joining columns, axis=1). It’s essential when working with chunked data (from pd.read_csv(chunksize=...)), merging partial results, appending filtered chunks, or combining datasets from different sources. In 2026, pd.concat() remains the most reliable and performant way to concatenate — especially with ignore_index=True for clean row numbering, join='outer'/'inner' for column alignment, and keys for multi-level indexing when combining many sources. It pairs perfectly with chunk processing, Polars concatenation, and memory-efficient workflows.
Here’s a complete, practical guide to using pd.concat() in Python: basic row/column concatenation, handling chunked data, alignment options, multi-DataFrame combining, real-world patterns, and modern best practices with type hints, memory optimization, Polars comparison, and pandas/Polars integration.
Basic vertical concatenation (axis=0) — stack DataFrames row-wise, most common for chunked reading or appending.
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})
# Concatenate vertically, reset index
result = pd.concat([df1, df2], ignore_index=True)
print(result)
# A B
# 0 1 4
# 1 2 5
# 2 3 6
# 3 4 7
# 4 5 8
# 5 6 9
Horizontal concatenation (axis=1) — join DataFrames column-wise, align on index.
df3 = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]})
# Concatenate horizontally
result_h = pd.concat([df1, df3], axis=1)
print(result_h)
# A B C D
# 0 1 4 7 10
# 1 2 5 8 11
# 2 3 6 9 12
Real-world pattern: concatenating filtered chunks from large CSV — memory-efficient big data processing.
file_path = 'very_large.csv'
chunksize = 100_000
filtered_chunks = []
for chunk in pd.read_csv(file_path, chunksize=chunksize):
# Filter example
filtered = chunk[(chunk['category'] == 'A') & (chunk['value'] > 100)]
if not filtered.empty:
filtered_chunks.append(filtered)
# Final concatenation
if filtered_chunks:
df_final = pd.concat(filtered_chunks, ignore_index=True)
print(f"Total rows after filtering: {len(df_final)}")
else:
df_final = pd.DataFrame()
Best practices make pd.concat safe, efficient, and scalable. Prefer ignore_index=True — clean row numbering after vertical concat. Use axis=0 (default) for row stacking, axis=1 for columns. Modern tip: use Polars pl.concat() — faster, lower memory for large concatenations; Polars is often 2–10× quicker. Use join='outer'/'inner'/'left'/'right' — control column alignment (default 'outer'). Use keys — add multi-level index for source identification: pd.concat([df1, df2], keys=['source1', 'source2']). Use verify_integrity=True — check for duplicate indices. Use sort=False — preserve column order. Avoid repeated concat in loop — collect in list, concat once at end (quadratic time otherwise). Use pd.concat(..., copy=False) — avoid unnecessary copies. Monitor memory — psutil.Process().memory_info().rss before/after concat. Add type hints — def concat_chunks(chunks: list[pd.DataFrame]) -> pd.DataFrame. Use pd.concat([df1, df2], sort=False) — avoid sorting columns. Use pd.concat with generator — pd.concat(chunk for chunk in reader) — memory-efficient for very large chunk lists. Prefer Polars for concatenation — pl.concat([df1, df2]) or pl.concat([pl.scan_csv(f) for f in files]) — lazy and faster.
pd.concat() combines DataFrames row-wise (axis=0) or column-wise (axis=1) — essential for chunked processing, appending, or merging. In 2026, use ignore_index=True, collect chunks in list then concat once, prefer Polars pl.concat() for speed/memory, and monitor usage with psutil. Master pd.concat(), and you’ll assemble large datasets efficiently, scalably, and without memory surprises.
Next time you have multiple DataFrames or chunks — concat them. It’s Python’s cleanest way to say: “Stack or join these tables — easily and safely.”