Finding substrings is a core string operation in Python — it lets you search for the presence, position, or count of a smaller string (substring) within a larger one, enabling tasks like validation, extraction, parsing, filtering, and text analysis. Python provides several built-in methods for finding substrings — find(), index(), count(), startswith(), endswith(), in operator, and regular expressions via re — each with trade-offs in functionality, error handling, and performance. In 2026, substring search remains essential — especially in data cleaning, log parsing, API response validation, NLP preprocessing, and pandas/Polars string column operations where vectorized .str.contains() scales to millions of rows efficiently.
Here’s a complete, practical guide to finding substrings in Python: core methods with examples, differences and use cases, real-world patterns, regex for advanced patterns, performance notes, and modern best practices with type hints, pandas/Polars vectorization, and safety.
find() returns the lowest index of the first occurrence — returns -1 if not found, safe and simple.
text = "Hello, World!"
print(text.find("World")) # 7 (first occurrence)
print(text.find("Python")) # -1 (not found)
print(text.find("o")) # 4 (first 'o')
print(text.find("o", 5)) # 8 (search starting from index 5)
index() is similar but raises ValueError if not found — use when absence is an error.
print(text.index("World")) # 7
try:
print(text.index("Python"))
except ValueError:
print("Not found") # Not found
count() returns the number of non-overlapping occurrences — useful for frequency or validation.
print(text.count("o")) # 3
print(text.count("l")) # 3
print(text.count("Python")) # 0
startswith() and endswith() return booleans — fast for prefix/suffix checks.
print(text.startswith("Hello")) # True
print(text.endswith("!")) # True
print(text.startswith("hi")) # False
The in operator checks existence — simple and readable for conditional logic.
if "World" in text:
print("Found!") # Found!
if "Python" not in text:
print("Not found") # Not found
Real-world pattern: searching and validating text columns in pandas — vectorized .str.contains() finds patterns across entire Series efficiently.
import pandas as pd
df = pd.DataFrame({
'text': ['Error: connection failed', 'Success: data loaded', 'Warning: low memory', 'Error: timeout']
})
# Vectorized search
df['has_error'] = df['text'].str.contains("Error", case=False)
df['error_pos'] = df['text'].str.find("Error")
df['error_count'] = df['text'].str.count("Error")
print(df)
# text has_error error_pos error_count
# 0 Error: connection failed True 0 1
# 1 Success: data loaded False -1 0
# 2 Warning: low memory False -1 0
# 3 Error: timeout True 0 1
Best practices make substring search fast, safe, and readable. Prefer find() over index() when absence is valid — returns -1 safely. Use startswith()/endswith() for prefix/suffix checks — faster than slicing or regex. Modern tip: use Polars for large text columns — pl.col("text").str.contains("pattern") is 10–100× faster than pandas .str.contains(). Add type hints — str or pd.Series[str] — improves static analysis. For case-insensitive search — text.lower().find("pattern") or re.search(r'pattern', text, re.IGNORECASE). Use regex for complex patterns — re.search(r'\bword\b', text) for whole words. Avoid repeated find() in loops — cache results or use str.count() for frequency. Combine with replace() — find position, then replace conditionally. Use in for simple existence — clearest for if-statements.
Finding substrings with find(), index(), count(), startswith(), endswith(), and in lets you locate and validate text efficiently. In 2026, use safe methods, vectorize with .str in pandas/Polars, add type hints, and regex for complexity. Master substring search, and you’ll parse, validate, extract, and analyze text data quickly and correctly.
Next time you need to locate a substring — reach for find() or in. It’s Python’s cleanest way to say: “Is this part here, and where?”