The re module in Python is the built-in library for regular expressions (regex) — a powerful toolset for searching, matching, extracting, replacing, splitting, and validating patterns in text. Regex lets you describe complex text patterns (e.g., emails, dates, phone numbers, HTML tags, log formats) concisely and perform operations with high precision. In 2026, re remains essential — used constantly in data cleaning, log parsing, input validation, NLP preprocessing, web scraping, and pandas/Polars string column transformations where vectorized .str.extract() or .str.replace() scales to millions of rows efficiently. While simple string methods suffice for basic tasks, re unlocks pattern-based text processing that would be cumbersome or impossible otherwise.
Here’s a complete, practical guide to the re module: core functions with examples, flags for customization, compilation for performance, real-world patterns, pandas/Polars equivalents, and modern best practices with raw strings, error handling, and scalability.
re.compile(pattern, flags=0) pre-compiles a pattern into a Pattern object — faster for repeated use and clearer for complex patterns.
import re
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' # email regex
email_re = re.compile(pattern, re.IGNORECASE)
text = "Contact alice@example.com or bob@company.org"
matches = email_re.findall(text)
print(matches)
# ['alice@example.com', 'bob@company.org']
re.search(pattern, string, flags=0) finds the first match — returns Match object or None; use .group(), .start(), .end() for details.
match = re.search(r'quick brown fox', text)
if match:
print(match.group()) # quick brown fox
print(match.start(), match.end()) # 4 19
else:
print("No match")
re.findall(pattern, string, flags=0) returns all non-overlapping matches as a list of strings (or tuples for groups).
dates = re.findall(r'\d{4}-\d{2}-\d{2}', "Log: 2026-02-10 and 2025-12-31")
print(dates) # ['2026-02-10', '2025-12-31']
re.sub(pattern, repl, string, count=0, flags=0) replaces matches — repl can be string or callable (function) that receives Match objects.
cleaned = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', "Phone: 123-456-7890")
print(cleaned) # Phone: XXX-XXX-XXXX
# Dynamic replacement with function
def censor(match):
return '*' * len(match.group())
redacted = re.sub(r'\b\w+@\w+\.\w+\b', censor, "Email: alice@example.com")
print(redacted) # Email: *******************
Real-world pattern: extracting and cleaning data in pandas — vectorized .str methods call re under the hood efficiently.
import pandas as pd
df = pd.DataFrame({
'log': [
"ERROR: connection failed at 2023-03-15",
"INFO: data loaded successfully",
"WARNING: low memory at 14:30"
]
})
# Extract timestamps
df['timestamp'] = df['log'].str.extract(r'(\d{4}-\d{2}-\d{2})')
df['level'] = df['log'].str.extract(r'^(ERROR|INFO|WARNING)')
# Replace words case-insensitively
df['clean'] = df['log'].str.replace(r'\berror\b', 'ERROR', regex=True, case=False)
print(df)
Best practices make re module usage safe, readable, and performant. Always use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Use flags like re.IGNORECASE, re.MULTILINE, re.DOTALL — pass as argument or via compiled pattern. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'pattern') or .str.replace_all(...) is 10–100× faster than pandas .str. Add type hints — str or pd.Series[str] — improves static analysis. Handle no-match cases — check if match is not None or matches or []. For complex patterns, use verbose mode re.VERBOSE — allows comments and whitespace. Avoid overusing regex — simple string methods (split(), replace()) are faster when sufficient. Combine with pandas.str — df['col'].str.contains(r'pattern', regex=True) for vectorized boolean checks.
The re module unlocks powerful pattern matching and text transformation — compile patterns, use flags, vectorize in pandas/Polars, and prefer raw strings. Master regex calls, and you’ll search, extract, replace, and validate text data with precision and efficiency.
Next time you need to find or manipulate patterns in text — reach for re functions. It’s Python’s cleanest way to say: “Match this pattern and do something with it.”