Counting occurrences

Counting occurrences of a substring, character, or pattern within a string is a fundamental text analysis task in Python — it helps quantify frequency, detect duplicates, validate data, measure density (e.g., keyword counts), and support preprocessing steps like feature extraction or anomaly detection. Python’s built-in count() method provides a fast, simple way to count non-overlapping occurrences of a substring, while re (regular expressions) enables overlapping counts, case-insensitive searches, and complex patterns. In 2026, counting remains essential — especially in data cleaning, NLP token frequency, log analysis, spam detection, and pandas/Polars string column summarization where vectorized .str.count() scales to millions of rows efficiently.

Here’s a complete, practical guide to counting occurrences in Python: basic count() usage, overlapping counts, case sensitivity, real-world patterns, regex alternatives, performance notes, and modern best practices with type hints, pandas/Polars vectorization, and edge-case handling.

str.count(sub) returns the number of non-overlapping occurrences of substring sub — case-sensitive by default, optional start/end bounds.


text = "She sells seashells by the seashore."

print(text.count("se"))       # 4 (non-overlapping: she, sells, seashells, seashore)
print(text.count("s"))        # 8
print(text.count("the"))      # 1
print(text.count("Python"))   # 0 (not found)
print(text.count("se", 10))   # 2 (search starting from index 10)

Overlapping substrings are counted correctly by count() — it advances past each match, so "aaa".count("aa") returns 2 (positions 0-1 and 1-2).


overlapping = "aaaaaa"
print(overlapping.count("aa"))   # 5 (overlapping matches)
print(overlapping.count("aaa"))  # 4 (overlapping triples)

Real-world pattern: frequency analysis and validation in pandas — vectorized .str.count() counts patterns across entire columns efficiently.


import pandas as pd

df = pd.DataFrame({
    'text': [
        "Error: connection failed",
        "Success: data loaded successfully",
        "Warning: low memory warning",
        "Error: timeout error occurred"
    ]
})

# Vectorized counting
df['error_count'] = df['text'].str.count("error", flags=re.IGNORECASE)
df['warning_count'] = df['text'].str.count("warning")
df['has_success'] = df['text'].str.contains("success", case=False)

print(df)
#                           text  error_count  warning_count  has_success
# 0     Error: connection failed            1              0        False
# 1  Success: data loaded successfully            0              0         True
# 2         Warning: low memory warning            0              2        False
# 3     Error: timeout error occurred            2              0        False

Best practices make counting occurrences fast, safe, and readable. Use count() for simple, non-overlapping substrings — fastest and clearest. Prefer case-insensitive with .lower().count() or regex re.IGNORECASE. Modern tip: use Polars for large text columns — pl.col("text").str.count_match("pattern") is 10–100× faster than pandas .str.count(). Add type hints — str or pd.Series[str] — improves static analysis. For overlapping counts, use regex — len(re.findall(r'(?=(aa))', text)) for "aa" in "aaaaaa" (returns 5). Use str.contains() for boolean checks — faster than count() > 0. Chain methods — text.lower().count("error") — cleans case first. Handle empty strings — count() on empty returns 0 safely. Combine with value_counts() in pandas — df['text'].str.count("error").value_counts() for frequency distribution.

Counting occurrences with count() quantifies substrings efficiently — non-overlapping, fast, and vectorized in pandas/Polars. In 2026, use it for simple frequency, regex for overlap/complexity, vectorize for scale, and add type hints for safety. Master counting, and you’ll analyze text frequency, detect keywords, and preprocess data accurately and quickly.

Next time you need to know how many times something appears — reach for count(). It’s Python’s cleanest way to say: “How many of these are there?”

Generating content...