Look-ahead

Look-ahead (also called lookahead assertion) in Python’s re module is a zero-width assertion that checks if a pattern is (or is not) followed by another pattern — without consuming or including the following text in the match. Positive look-ahead (?=...) succeeds only if the position is followed by the specified pattern; negative look-ahead (?!...) succeeds only if it is not followed by the pattern. Look-ahead is incredibly useful for conditional matching: match something only if it’s followed (or not followed) by a specific context, such as words before punctuation, numbers before units, or keywords before certain delimiters. In 2026, look-ahead remains a key regex feature — essential in data validation, text extraction, cleaning, parsing logs, and vectorized pandas/Polars string operations where conditional patterns scale efficiently across large datasets without extra columns or post-processing.

Here’s a complete, practical guide to look-ahead in Python regex: positive and negative lookahead, syntax and examples, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.

Positive look-ahead (?=pattern) — matches the position only if followed by pattern (doesn’t consume it).


import re

text = "The foo is followed by bar, but not by baz."

# Match "foo" only if followed by "bar"
pattern_pos = r"foo(?=bar)"
matches_pos = re.findall(pattern_pos, text)
print(matches_pos)   # ['foo']

# Match numbers followed by "px" (CSS units)
css = "font-size: 16px; margin: 20px 10px;"
print(re.findall(r'\d+(?=px)', css))   # ['16', '20', '10']

Negative look-ahead (?!pattern) — matches the position only if NOT followed by pattern.


# Match "foo" only if NOT followed by "bar"
pattern_neg = r"foo(?!bar)"
matches_neg = re.findall(pattern_neg, text)
print(matches_neg)   # ['foo'] (the second "foo" would match if present without "bar")

# Match words NOT followed by punctuation
print(re.findall(r'\w+(?![.,!?])', "Hello, world! How are you?"))   # ['How', 'are']

Real-world pattern: conditional extraction and cleaning in pandas — look-ahead lets you match only in specific contexts without capturing extra text.


import pandas as pd

df = pd.DataFrame({
    'text': [
        "Price: $99.99 (discounted)",
        "Value: 50",
        "Total: €200.00",
        "Discount: $10"
    ]
})

# Extract dollar amounts only (positive look-ahead for $)
df['dollar_amount'] = df['text'].str.extract(r'(?<=\\$)\\d+\\.\\d{2}')

# Extract amounts NOT in dollars (negative look-ahead)
df['non_dollar'] = df['text'].str.extract(r'(?



Best practices make look-ahead safe, readable, and performant. Use positive look-ahead for “followed by” conditions — foo(?=bar) — without including “bar” in the match. Use negative look-ahead for exclusions — foo(?!bar) — matches only when condition fails. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(?<=\\$)\\d+\\.\\d{2}') is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Combine with pandas.str — df['col'].str.contains(r'pattern(?=condition)', regex=True) for vectorized conditional checks. Use look-ahead to avoid capturing — keeps findall() results clean. Avoid overuse — look-ahead can slow matching; test performance on large data.

Look-ahead ((?=...) positive, (?!...) negative) checks conditions without consuming text — match only if followed/not followed by something. In 2026, use positive for context requirements, negative for exclusions, raw strings, compile patterns, and vectorize in pandas/Polars. Master look-ahead, and you’ll create conditional, precise regex patterns for validation, extraction, and cleaning.

Next time you need to match only if followed by something — use look-ahead. It’s Python’s cleanest way to say: “Match this, but only if this comes next.”

Generating content...