Positive look-ahead

Positive look-ahead is a zero-width assertion in Python’s re module that matches a position in the string only if it is immediately followed by a specified pattern — without consuming or including the following text in the match. The syntax is (?=pattern), where pattern is the required following content. Positive look-ahead is ideal for conditional matching: match something only if it’s followed by a specific context (e.g., words before punctuation, numbers before units, keywords before delimiters) while keeping the match clean and focused. In 2026, positive look-ahead remains a key regex feature — essential in data validation, text extraction, cleaning, log parsing, and vectorized pandas/Polars string operations where context-dependent matching scales efficiently across large datasets without extra post-processing.

Here’s a complete, practical guide to positive look-ahead in Python regex: syntax and mechanics, examples, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.

Positive look-ahead (?=pattern) succeeds only if the current position is followed by pattern — the match itself stops before the look-ahead content.


import re

text = "The foo is followed by bar, but not by baz."

# Match "foo" only if followed by "bar"
pattern = r"foo(?=bar)"
matches = re.findall(pattern, text)
print(matches)   # ['foo']

# Match numbers followed by "px" (CSS units)
css = "font-size: 16px; margin: 20px 10px;"
print(re.findall(r'\d+(?=px)', css))   # ['16', '20', '10']

Positive look-ahead with quantifiers — ensures the match is followed by something specific, without including it.


# Match words followed by punctuation (but keep punctuation out)
print(re.findall(r'\w+(?=[.,!?])', "Hello, world! How are you?"))   # ['Hello', 'world', 'you']

# Match "error" only if followed by ":"
logs = "ERROR: failed INFO: success WARNING: low memory"
print(re.findall(r'ERROR(?=:)', logs))   # ['ERROR'] (matches only before colon)

Real-world pattern: conditional extraction and cleaning in pandas — positive look-ahead matches only in specific contexts without capturing extra text.


import pandas as pd

df = pd.DataFrame({
    'text': [
        "Price: $99.99 (discounted)",
        "Value: 50",
        "Total: €200.00",
        "Discount: $10"
    ]
})

# Extract dollar amounts only (positive look-ahead for $)
df['dollar_amount'] = df['text'].str.extract(r'(?<=\\$)\\d+\\.\\d{2}')

print(df)
#                            text dollar_amount
# 0  Price: $99.99 (discounted)        99.99
# 1                    Value: 50          NaN
# 2              Total: €200.00          NaN
# 3               Discount: $10          10.00

Best practices make positive look-ahead safe, readable, and performant. Use positive look-ahead for “followed by” conditions — foo(?=bar) — without including “bar” in the match. Combine with quantifiers — \d+(?=px) — to match only in context. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(?<=\\$)\\d+\\.\\d{2}') is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Combine with pandas.str — df['col'].str.contains(r'pattern(?=condition)', regex=True) for vectorized conditional checks. Use look-ahead to avoid capturing — keeps findall() results clean. Avoid overuse — look-ahead can slow matching; test performance on large data.

Positive look-ahead ((?=...)) matches only if followed by a pattern — without consuming or capturing the following text. In 2026, use it for context requirements, raw strings, compile patterns, and vectorize in pandas/Polars. Master positive look-ahead, and you’ll create conditional, precise regex patterns for validation, extraction, and cleaning.

Next time you need to match only if followed by something — use positive look-ahead. It’s Python’s cleanest way to say: “Match this, but only if this comes next.”

Generating content...