Numbered groups

Numbered groups in Python’s re module are created with parentheses () — each opening parenthesis starts a numbered capturing group, beginning from 1 and counting left to right. These groups capture matched substrings for later use: extraction via .group(n), .groups(), or findall() returning tuples; backreferences in patterns (\1, \2); and reuse in replacements (\1 or \g<1>). In 2026, numbered groups remain a core regex feature — essential for extracting structured fields (names, dates, IDs), reformatting text, detecting duplicates, and vectorized pandas/Polars operations where capturing specific parts of matches scales efficiently across large datasets.

Here’s a complete, practical guide to numbered groups in Python regex: creating and accessing groups, backreferences, replacements, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.

Basic numbered groups — parentheses capture text; group(0) is the full match, group(1) is the first captured group, etc.


import re

text = "My favorite numbers are 42 and 123."

pattern = r"My favorite numbers are (\d+) and (\d+)\."
match = re.search(pattern, text)

if match:
    print(match.group(0))   # full match: My favorite numbers are 42 and 123.
    print(match.group(1))   # first group: 42
    print(match.group(2))   # second group: 123
    print(match.groups())   # ('42', '123')

findall() with groups returns list of tuples — each tuple contains the captured groups from each match (no full match included).


matches = re.findall(r"(\w+) (\w+)", "John Doe, Jane Smith, Bob Johnson")
print(matches)
# [('John', 'Doe'), ('Jane', 'Smith'), ('Bob', 'Johnson')]

Backreferences — reuse captured groups in the same pattern with \1, \2, etc.


# Match repeated words
repeated = re.findall(r'\b(\w+)\b\s+\1\b', "hello hello world world")
print(repeated)   # ['hello', 'world']

# Enforce consistency (e.g., same word twice)
pattern_repeat = r'\b(\w+)\b\s+\1\b'
print(re.sub(pattern_repeat, r'\1', "hello hello world world"))   # hello world world

Real-world pattern: extracting structured data in pandas — vectorized .str.extract() uses numbered groups to pull fields into new columns efficiently.


import pandas as pd

df = pd.DataFrame({
    'log': [
        "ERROR: connection failed at 2023-03-15",
        "INFO: data loaded successfully",
        "WARNING: low memory at 14:30"
    ]
})

# Extract level and message using numbered groups
df[['level', 'message']] = df['log'].str.extract(r'^(ERROR|INFO|WARNING):\s+(.*)')

print(df)
#                           log    level                      message
# 0     ERROR: connection failed    ERROR     connection failed
# 1           INFO: data loaded   INFO     data loaded successfully
# 2      WARNING: low memory   WARNING     low memory at 14:30

Best practices make numbered group usage safe, readable, and performant. Use numbered groups when order is natural and extraction is simple — (\w+) (\w+) for name parsing. Prefer named groups (?Ppattern) for clarity and maintainability — especially in complex patterns. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(?P\w+) (?P\w+)') is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Use backreferences \1 or \g<1> in replacements — avoids ambiguity with digits. Handle no-match cases — check if match is not None or use matches or []. Combine with pandas.str — df['col'].str.extract(r'(\w+) (\w+)') for vectorized extraction. Use re.escape() for literal substrings in patterns.

Numbered groups with () capture matched text for extraction, backreferences, and replacements — simple and powerful for structured pattern matching. In 2026, use numbered groups for order-based capture, prefer named groups for clarity, compile patterns, vectorize in pandas/Polars, and use raw strings. Master numbered groups, and you’ll extract, reformat, and validate text patterns with precision and efficiency.

Next time you need to pull specific parts from a match — use numbered groups. It’s Python’s cleanest way to say: “Capture this and reference it by number.”

Generating content...