Lookaround (also called zero-width assertions) in Python’s re module lets you check for the presence or absence of a pattern before or after the current position — without consuming any characters or including them in the match. This makes lookaround incredibly useful for conditional matching: match something only if it’s followed/preceded by (or not followed/preceded by) another pattern. There are four types: positive lookahead (?=...), negative lookahead (?!...), positive lookbehind (?<=...), and negative lookbehind (?. In 2026, lookaround remains a key regex feature — essential in data validation (e.g., passwords, emails), text extraction (words before/after keywords), cleaning (remove only if followed by something), and vectorized pandas/Polars string operations where conditional patterns scale efficiently across large datasets.
Here’s a complete, practical guide to lookaround in Python regex: positive/negative lookahead and lookbehind, examples, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.
Positive lookahead (?=pattern) — matches the position only if followed by pattern (doesn’t consume it).
import re
text = "The cat is in the hat"
# Match "cat" only if followed by " is"
print(re.findall(r'cat(?= is)', text)) # ['cat']
# Match words followed by punctuation
print(re.findall(r'\w+(?=[.,!?])', "Hello, world! How are you?")) # ['Hello', 'world', 'you']
Negative lookahead (?!pattern) — matches the position only if NOT followed by pattern.
# Match "cat" only if NOT followed by " is"
print(re.findall(r'cat(?! is)', text)) # ['cat'] (the second "cat" in "hat" is not matched if pattern is adjusted)
# Match words NOT followed by punctuation
print(re.findall(r'\w+(?![.,!?])', "Hello, world! How are you?")) # ['How', 'are']
Positive lookbehind (?<=pattern) — matches the position only if preceded by pattern (doesn’t consume it).
# Match numbers preceded by "$"
print(re.findall(r'(?<=\\$)\\d+', "$100 $200 €300")) # ['100', '200']
# Match words preceded by "the "
print(re.findall(r'(?<=the )\\w+', "the cat in the hat")) # ['cat', 'hat']
Negative lookbehind (? — matches the position only if NOT preceded by pattern.
# Match numbers NOT preceded by "$"
print(re.findall(r'(?
Real-world pattern: conditional extraction and cleaning in pandas — vectorized .str methods support lookaround for precise matches without extra columns.
import pandas as pd
df = pd.DataFrame({
'text': [
"Price: $99.99",
"Value: 50",
"Total: €200.00",
"Discount: $10"
]
})
# Extract dollar amounts only (positive lookbehind for $)
df['dollar_amount'] = df['text'].str.extract(r'(?<=\\$)\\d+\\.\\d{2}')
# Extract amounts NOT in dollars (negative lookbehind)
df['non_dollar'] = df['text'].str.extract(r'(?
Best practices make lookaround safe, readable, and performant. Use positive lookahead for “followed by” conditions — cat(?= is) — without including “ is” in match. Use negative lookahead for exclusions — cat(?! is) — matches only when condition fails. Prefer positive/negative lookbehind for “preceded by/not preceded by” — but note fixed-length limitation in most engines (Python allows variable-length lookbehind since 3.7+). Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(?<=\\$)\\d+\\.\\d{2}') is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Combine with pandas.str — df['col'].str.contains(r'pattern(?=condition)', regex=True) for vectorized conditional checks. Use lookaround to avoid capturing — keeps findall() results clean. Avoid overuse — lookaround can slow matching; test performance on large data.
Lookaround (lookahead/lookbehind) checks conditions without consuming text — positive for presence, negative for absence. In 2026, use lookahead for “followed by”, lookbehind for “preceded by”, raw strings, compile patterns, and vectorize in pandas/Polars. Master lookaround, and you’ll create conditional, precise regex patterns for validation, extraction, and cleaning.
Next time you need to match only if followed/preceded by something — use lookaround. It’s Python’s cleanest way to say: “Match this, but only if this condition is true/false.”