Using pd.concat()

Using pd.concat() is the go-to method in pandas for combining multiple DataFrames (or Series) — either vertically (stacking rows, axis=0, default) or horizontally (joining columns, axis=1). It’s essential when working with chunked data (from pd.read_csv(chunksize=...)), merging partial results, appending filtered chunks, or combining datasets from different sources. In 2026, pd.concat() remains the most reliable and performant way to concatenate — especially with ignore_index=True for clean row numbering, join='outer'/'inner' for column alignment, and keys for multi-level indexing when combining many sources. It pairs perfectly with chunk processing, Polars concatenation, and memory-efficient workflows.

Here’s a complete, practical guide to using pd.concat() in Python: basic row/column concatenation, handling chunked data, alignment options, multi-DataFrame combining, real-world patterns, and modern best practices with type hints, memory optimization, Polars comparison, and pandas/Polars integration.

Basic vertical concatenation (axis=0) — stack DataFrames row-wise, most common for chunked reading or appending.


import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})

# Concatenate vertically, reset index
result = pd.concat([df1, df2], ignore_index=True)
print(result)
#    A  B
# 0  1  4
# 1  2  5
# 2  3  6
# 3  4  7
# 4  5  8
# 5  6  9

Horizontal concatenation (axis=1) — join DataFrames column-wise, align on index.


df3 = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]})

# Concatenate horizontally
result_h = pd.concat([df1, df3], axis=1)
print(result_h)
#    A  B  C   D
# 0  1  4  7  10
# 1  2  5  8  11
# 2  3  6  9  12

Real-world pattern: concatenating filtered chunks from large CSV — memory-efficient big data processing.


file_path = 'very_large.csv'
chunksize = 100_000

filtered_chunks = []

for chunk in pd.read_csv(file_path, chunksize=chunksize):
    # Filter example
    filtered = chunk[(chunk['category'] == 'A') & (chunk['value'] > 100)]
    if not filtered.empty:
        filtered_chunks.append(filtered)

# Final concatenation
if filtered_chunks:
    df_final = pd.concat(filtered_chunks, ignore_index=True)
    print(f"Total rows after filtering: {len(df_final)}")
else:
    df_final = pd.DataFrame()

Best practices make pd.concat safe, efficient, and scalable. Prefer ignore_index=True — clean row numbering after vertical concat. Use axis=0 (default) for row stacking, axis=1 for columns. Modern tip: use Polars pl.concat() — faster, lower memory for large concatenations; Polars is often 2–10× quicker. Use join='outer'/'inner'/'left'/'right' — control column alignment (default 'outer'). Use keys — add multi-level index for source identification: pd.concat([df1, df2], keys=['source1', 'source2']). Use verify_integrity=True — check for duplicate indices. Use sort=False — preserve column order. Avoid repeated concat in loop — collect in list, concat once at end (quadratic time otherwise). Use pd.concat(..., copy=False) — avoid unnecessary copies. Monitor memory — psutil.Process().memory_info().rss before/after concat. Add type hints — def concat_chunks(chunks: list[pd.DataFrame]) -> pd.DataFrame. Use pd.concat([df1, df2], sort=False) — avoid sorting columns. Use pd.concat with generator — pd.concat(chunk for chunk in reader) — memory-efficient for very large chunk lists. Prefer Polars for concatenation — pl.concat([df1, df2]) or pl.concat([pl.scan_csv(f) for f in files]) — lazy and faster.

pd.concat() combines DataFrames row-wise (axis=0) or column-wise (axis=1) — essential for chunked processing, appending, or merging. In 2026, use ignore_index=True, collect chunks in list then concat once, prefer Polars pl.concat() for speed/memory, and monitor usage with psutil. Master pd.concat(), and you’ll assemble large datasets efficiently, scalably, and without memory surprises.

Next time you have multiple DataFrames or chunks — concat them. It’s Python’s cleanest way to say: “Stack or join these tables — easily and safely.”

Generating content...