Repeated characters

Repeated characters are one of the most powerful features in regular expressions — allowing you to match exact counts, ranges, or unbounded repetitions of a character, group, or pattern. In Python’s re module, repetition is controlled by quantifiers: {n} for exactly n times, {m,n} for between m and n times, * for zero or more, + for one or more, and ? for zero or one. These quantifiers are greedy by default (match as much as possible) but can be made lazy with ? (e.g., *?) or possessive with + (rare). In 2026, mastering repetition quantifiers remains essential — used constantly in data validation (e.g., phone numbers, ZIP codes), text extraction, cleaning, parsing logs, and vectorized pandas/Polars string operations where precise pattern matching scales efficiently across large datasets.

Here’s a complete, practical guide to matching repeated characters in Python regex: exact counts, ranges, unbounded quantifiers, greedy vs lazy vs possessive, real-world patterns, and modern best practices with raw strings, flags, compilation, and pandas/Polars integration.

Exact repetition with {n} — matches the preceding element exactly n times.


import re

text = "aaa bbbb ccccc dddddd"

print(re.findall(r'a{3}', text))     # ['aaa']
print(re.findall(r'b{4}', text))     # ['bbbb']
print(re.findall(r'\w{5}', text))    # ['ccccc', 'dddddd'] (5-letter words)

Ranges with {m,n} — matches between m and n times (inclusive); omit n for open-ended {m,}.


print(re.findall(r'c{2,5}', text))   # ['ccccc'] (between 2 and 5 'c's)
print(re.findall(r'd{3,}', text))    # ['dddddd'] (3 or more 'd's)
print(re.findall(r'\d{3,5}', "123 45678 9"))  # ['123', '45678'] (3–5 digits)

Unbounded quantifiers — * (0+), + (1+), ? (0 or 1) — greedy by default, but can be lazy (*?, +?, ??) or possessive (*+, ++, ?+ in Python 3.11+).


greedy = re.findall(r'<.*>', "Hello
World")   # ['Hello
World'] (greedy, matches too much)
lazy = re.findall(r'<.*?>', "Hello
World")   # ['', '
', '', ''] (lazy, non-greedy)
print(greedy)
print(lazy)

Real-world pattern: validating and extracting repeated patterns in pandas — vectorized .str methods handle repetition efficiently.


import pandas as pd

df = pd.DataFrame({
    'text': [
        "abc123def456",
        "aabbcc112233",
        "aaa bbb ccc",
        "error: 500 500 500"
    ]
})

# Count repeated letters/digits
df['repeated_letters'] = df['text'].str.count(r'([a-z])\1{2,}')
df['repeated_digits'] = df['text'].str.count(r'(\d)\1{2,}')
df['has_triple'] = df['text'].str.contains(r'(.)\1\1')

print(df)

Best practices make repetition matching safe, readable, and performant. Use raw strings r'pattern' — avoids double-escaping backslashes. Compile patterns with re.compile() for repeated use — faster and clearer. Use lazy quantifiers *?/+? to prevent over-matching — especially with .* in HTML/log parsing. Modern tip: use Polars for large text columns — pl.col("text").str.count_match(r'(.)\1{2,}') is 10–100× faster than pandas .str.count(). Add type hints — str or pd.Series[str] — improves static analysis. Prefer {m,n} over repeated literals — a{3} clearer than aaa. Use possessive quantifiers *+ (Python 3.11+) for performance when backtracking is unwanted. Avoid catastrophic backtracking — limit quantifiers or use atomic groups (?>...). Combine with pandas.str — df['col'].str.contains(r'(.)\1{2,}', regex=True) for vectorized checks. Use re.escape() for literal substrings in patterns.

Matching repeated characters with quantifiers ({n}, {m,n}, *, +, ?) gives precise control over repetition in regex patterns. In 2026, use raw strings, lazy quantifiers, compile patterns, vectorize in pandas/Polars, and escape literals correctly. Master repetition, and you’ll validate, extract, and analyze text patterns with accuracy and efficiency.

Next time you need to match repeated characters — use quantifiers. It’s Python’s cleanest way to say: “Match this thing exactly n times, or between m and n.”

Generating content...