Negative look-behind

Negative look-behind is a zero-width assertion in Python’s re module that matches a position in the string only if it is NOT immediately preceded by a specified pattern — without consuming or including the preceding text in the match. The syntax is (?, where pattern is the forbidden prefix that must not appear before the match. Negative look-behind is ideal for exclusionary matching from the left: match something only if it does not come right after an unwanted marker, such as numbers not preceded by currency symbols, words not after titles, or tags not after certain opening delimiters. In 2026, negative look-behind remains a powerful regex feature — essential in data extraction, validation, cleaning (e.g., exclude false positives after specific prefixes), log parsing, and vectorized pandas/Polars string operations where backward exclusion scales efficiently across large datasets without capturing extra text.



Here’s a complete, practical guide to negative look-behind in Python regex: syntax and mechanics, examples, fixed-width requirements, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.

Negative look-behind (? succeeds only if the current position is NOT preceded by pattern — the match itself starts at the position after the look-behind check.



import re

text = "The bar is preceded by foo, but not by baz."

# Match "bar" only if NOT preceded by "foo "
pattern = r"(?


Negative look-behind with quantifiers — excludes matches in unwanted prefix contexts while keeping the match clean.


# Match words NOT preceded by "the "
print(re.findall(r'(?


Important note — look-behind in Python requires fixed-width patterns (since Python 3.7, variable-length is supported in some cases but can be slower or less reliable). Negative look-behind must have a predictable length.


# Fixed-width negative look-behind — valid
print(re.findall(r'(?


Real-world pattern: context-aware extraction in pandas — negative look-behind matches only when unwanted prefixes are absent, without capturing them.


import pandas as pd

df = pd.DataFrame({
    'text': [
        "Price: $99.99 (discounted)",
        "Value: 50",
        "Total: €200.00",
        "Discount: $10"
    ]
})

# Extract amounts NOT preceded by "$" or "€" (negative look-behind)
df['plain_number'] = df['text'].str.extract(r'(?


Best practices make negative look-behind safe, readable, and performant. Use negative look-behind for exclusion conditions — (? — matches only when prefix is absent. Stick to fixed-width look-behind patterns — avoid variable-length quantifiers inside look-behind for reliability. Modern tip: use Polars for large text columns — pl.col("text").str.extract(r'(? is 10–100× faster than pandas .str.extract(). Add type hints — str or pd.Series[str] — improves static analysis. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Combine with pandas.str — df['col'].str.contains(r'(? for vectorized exclusion-conditioned checks. Use negative look-behind to avoid false positives — keeps matches precise. Avoid overuse — look-behind can slow matching; test performance on large data.


Negative look-behind ((?) matches only if NOT preceded by a pattern — without consuming or capturing the preceding text. In 2026, use it for exclusions, stick to fixed-width, raw strings, compile patterns, and vectorize in pandas/Polars. Master negative look-behind, and you’ll create precise, exclusion-based regex patterns for extraction, validation, and cleaning.


Next time you need to match only if NOT preceded by something — use negative look-behind. It’s Python’s cleanest way to say: “Match this, but only if this did NOT come before.”

Generating content...