Positive look-behind

Positive look-behind is a zero-width assertion in Python’s re module that matches a position in the string only if it is immediately preceded by a specified pattern — without consuming or including the preceding text in the match. The syntax is (?<=pattern), where pattern is the required prefix that must appear before the match. Positive look-behind is ideal for context-dependent matching from the left: match something only if it comes right after a specific marker, such as numbers after currency symbols, words after titles, or tags after opening delimiters. In 2026, positive look-behind remains a powerful regex feature — essential in data extraction, validation, cleaning (e.g., extract only after certain prefixes), log parsing, and vectorized pandas/Polars string operations where backward context matching scales efficiently across large datasets without capturing extra text.

Here’s a complete, practical guide to positive look-behind in Python regex: syntax and mechanics, examples, fixed-width requirements, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.

Positive look-behind (?<=pattern) succeeds only if the current position is preceded by pattern — the match itself starts after the look-behind content.


import re

text = "The bar is preceded by foo, but not by baz."

# Match "bar" only if preceded by "foo "
pattern = r"(?<=foo )bar"
matches = re.findall(pattern, text)
print(matches)   # ['bar']

# Match numbers preceded by "$" (currency)
prices = "Total $99.99 €200.00 £50"
print(re.findall(r'(?<=\\$)\\d+\\.\\d{2}', prices))   # ['99.99']

Positive look-behind with quantifiers — ensures the match is preceded by something specific, without including it.


# Match words preceded by "the "
print(re.findall(r'(?<=the )\\w+', "the cat in the hat"))   # ['cat', 'hat']

# Match digits preceded by "ID:"
ids = "ID:123 User ID:456 Session:789"
print(re.findall(r'(?<=ID:)\\d+', ids))   # ['123', '456']

Important note — look-behind in Python requires fixed-width patterns (since Python 3.7, variable-length is supported in some cases but can be slower or less reliable). Positive look-behind must have a predictable length.


# Fixed-width look-behind — valid
print(re.findall(r'(?<=abc)def', "abcdef"))   # ['def']

# Variable-width look-behind — may raise error or behave unexpectedly
# Avoid patterns like (?<=a.*) — use alternatives or positive look-ahead instead

Real-world pattern: context-aware extraction in pandas — positive look-behind matches only after specific prefixes without capturing them.


import pandas as pd

df = pd.DataFrame({
    'text': [
        "Price: $99.99 (discounted)",
        "Value: 50",
        "Total: €200.00",
        "Discount: $10"
    ]
})

# Extract amounts preceded by "$" (positive look-behind)
df['dollar_amount'] = df['text'].str.extract(r'(?<=\\$)\\d+\\.\\d{2}')

print(df)
#                            text dollar_amount
# 0  Price: $99.99 (discounted)        99.99
# 1                    Value: 50          NaN
# 2              Total: €200.00          NaN
# 3               Discount: $10          10.00

Best practices make positive look-behind safe, readable, and performant. Use positive look-behind for “preceded by” conditions — (?<=foo)bar — without including “foo” in the match. Stick to fixed-width look-behind patterns — avoid variable-length quantifiers inside look-behind for reliability. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(?<=\\$)\\d+\\.\\d{2}') is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Combine with pandas.str — df['col'].str.contains(r'(?<=prefix)pattern', regex=True) for vectorized prefix-conditioned checks. Use look-behind to avoid capturing prefixes — keeps matches clean. Avoid overuse — look-behind can slow matching; test performance on large data.

Positive look-behind ((?<=...)) matches only if preceded by a pattern — without consuming or capturing the preceding text. In 2026, use it for required prefixes, stick to fixed-width, raw strings, compile patterns, and vectorize in pandas/Polars. Master positive look-behind, and you’ll create context-aware, precise regex patterns for extraction, validation, and cleaning.

Next time you need to match only if preceded by something — use positive look-behind. It’s Python’s cleanest way to say: “Match this, but only if this came before.”

Generating content...