Numbered groups in Python’s re module are created with parentheses () — each opening parenthesis starts a numbered capturing group, beginning from 1 and counting left to right. These groups capture matched substrings for later use: extraction via .group(n), .groups(), or findall() returning tuples; backreferences in patterns (\1, \2); and reuse in replacements (\1 or \g<1>). In 2026, numbered groups remain a core regex feature — essential for extracting structured fields (names, dates, IDs), reformatting text, detecting duplicates, and vectorized pandas/Polars operations where capturing specific parts of matches scales efficiently across large datasets.
Here’s a complete, practical guide to numbered groups in Python regex: creating and accessing groups, backreferences, replacements, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.
Basic numbered groups — parentheses capture text; group(0) is the full match, group(1) is the first captured group, etc.
import re
text = "My favorite numbers are 42 and 123."
pattern = r"My favorite numbers are (\d+) and (\d+)\."
match = re.search(pattern, text)
if match:
print(match.group(0)) # full match: My favorite numbers are 42 and 123.
print(match.group(1)) # first group: 42
print(match.group(2)) # second group: 123
print(match.groups()) # ('42', '123')
findall() with groups returns list of tuples — each tuple contains the captured groups from each match (no full match included).
matches = re.findall(r"(\w+) (\w+)", "John Doe, Jane Smith, Bob Johnson")
print(matches)
# [('John', 'Doe'), ('Jane', 'Smith'), ('Bob', 'Johnson')]
Backreferences — reuse captured groups in the same pattern with \1, \2, etc.
# Match repeated words
repeated = re.findall(r'\b(\w+)\b\s+\1\b', "hello hello world world")
print(repeated) # ['hello', 'world']
# Enforce consistency (e.g., same word twice)
pattern_repeat = r'\b(\w+)\b\s+\1\b'
print(re.sub(pattern_repeat, r'\1', "hello hello world world")) # hello world world
Real-world pattern: extracting structured data in pandas — vectorized .str.extract() uses numbered groups to pull fields into new columns efficiently.
import pandas as pd
df = pd.DataFrame({
'log': [
"ERROR: connection failed at 2023-03-15",
"INFO: data loaded successfully",
"WARNING: low memory at 14:30"
]
})
# Extract level and message using numbered groups
df[['level', 'message']] = df['log'].str.extract(r'^(ERROR|INFO|WARNING):\s+(.*)')
print(df)
# log level message
# 0 ERROR: connection failed ERROR connection failed
# 1 INFO: data loaded INFO data loaded successfully
# 2 WARNING: low memory WARNING low memory at 14:30
Best practices make numbered group usage safe, readable, and performant. Use numbered groups when order is natural and extraction is simple — (\w+) (\w+) for name parsing. Prefer named groups (?P for clarity and maintainability — especially in complex patterns. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(?P is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Use backreferences \1 or \g<1> in replacements — avoids ambiguity with digits. Handle no-match cases — check if match is not None or use matches or []. Combine with pandas.str — df['col'].str.extract(r'(\w+) (\w+)') for vectorized extraction. Use re.escape() for literal substrings in patterns.
Numbered groups with () capture matched text for extraction, backreferences, and replacements — simple and powerful for structured pattern matching. In 2026, use numbered groups for order-based capture, prefer named groups for clarity, compile patterns, vectorize in pandas/Polars, and use raw strings. Master numbered groups, and you’ll extract, reformat, and validate text patterns with precision and efficiency.
Next time you need to pull specific parts from a match — use numbered groups. It’s Python’s cleanest way to say: “Capture this and reference it by number.”