set() is Python’s built-in function for creating mutable sets — unordered collections of unique, hashable elements. It automatically removes duplicates, provides fast membership testing (O(1) average), and supports powerful set operations (union, intersection, difference, symmetric difference). In 2026, set() remains essential in data science (unique value extraction in pandas/Polars/Dask, deduplication, fast lookups), software engineering (dependency tracking, graph algorithms), and competitive programming — offering O(1) average-time operations and seamless integration with modern data tools.
Here’s a complete, practical guide to using set() in Python: creation patterns, common operations, real-world patterns (earthquake place deduplication, unique IDs, fast filtering), and modern best practices with type hints, performance, mutability, and integration with pandas/Polars/Dask/NumPy/frozenset.
Creating sets — from iterables, literals, or empty.
# Empty set (use set(), not {})
empty = set()
print(empty) # set()
# From list/tuple (removes duplicates)
unique_nums = set([1, 2, 2, 3, 4])
print(unique_nums) # {1, 2, 3, 4}
# From string (unique characters)
unique_chars = set("hello world")
print(unique_chars) # {'h', 'e', 'l', 'o', ' ', 'w', 'r', 'd'}
# From dict (keys only)
keys = set({"a": 1, "b": 2})
print(keys) # {'a', 'b'}
# Literal syntax (preferred for static sets)
s = {1, 2, 3}
print(s) # {1, 2, 3}
Common set operations — add/remove, membership, union/intersection/difference.
s = {1, 2, 3}
# Add / remove
s.add(4) # {1, 2, 3, 4}
s.remove(2) # {1, 3, 4}
s.discard(99) # no error if missing
s.pop() # removes arbitrary element
# Membership
print(3 in s) # True
print(99 in s) # False
# Set operations
a = {1, 2, 3}
b = {3, 4, 5}
print(a | b) # union: {1, 2, 3, 4, 5}
print(a & b) # intersection: {3}
print(a - b) # difference: {1, 2}
print(a ^ b) # symmetric difference: {1, 2, 4, 5}
Real-world pattern: earthquake data deduplication & unique places — fast with set().
import pandas as pd
df = pd.read_csv('earthquakes.csv')
# Unique places (fast deduplication)
unique_places = set(df['place'])
print(f"Unique places: {len(unique_places)}")
print(list(unique_places)[:5])
# Deduplicate events by (time, lat, lon) tuple
event_keys = set()
duplicates = []
for _, row in df.iterrows():
key = (row['time'], row['latitude'], row['longitude'])
if key in event_keys:
duplicates.append(row)
else:
event_keys.add(key)
print(f"Found {len(duplicates)} duplicate events")
# Polars: unique values
import polars as pl
pl_df = pl.from_pandas(df)
unique_countries = pl_df['country'].unique().to_list()
print(f"Unique countries: {len(unique_countries)}")
Best practices for set() in Python & data workflows. Prefer set literals {...} — for static sets; use set() for dynamic creation or conversion. Modern tip: use Polars pl.col('col').unique() — fast unique extraction; Dask ddf['col'].unique().compute(). Use set() for fast membership — if x in s (O(1) average). Use set(df['col']) — quick unique values from Series. Add type hints — def get_unique_places(df: pd.DataFrame) -> set[str]: return set(df['place']). Use frozenset — when set needs to be hashed (dict key, set element). Use set.add()/discard() — safe add/remove. Use set.update() — bulk add from iterable. Use set.pop() — remove arbitrary element. Use set.clear() — empty set. Use set.union()/intersection()/difference()/symmetric_difference() — set operations. Use set.issubset()/issuperset() — subset/superset checks. Use set.isdisjoint() — no common elements. Use len(set) — unique count. Use set() in list comprehensions — [x for x in lst if x not in seen_set]. Use set(df.duplicated()) — find duplicates. Use set(df.columns) — unique column names. Use set.intersection(*sets) — common elements across multiple sets. Use set.union(*sets) — combine multiple sets. Use set.symmetric_difference() — XOR multiple sets. Use set() with collections.Counter — unique keys. Use set() with numpy.unique() — compare with NumPy. Use set() in caching — track seen items.
set(iterable) creates a mutable set of unique elements — fast membership, deduplication, set operations. In 2026, use for unique extraction, fast lookups, grouping keys, and integrate with pandas/Polars/Dask for efficient data processing. Master set(), and you’ll eliminate duplicates, speed up checks, and perform set algebra efficiently in any Python workflow.
Next time you need unique elements or fast membership — use set(). It’s Python’s cleanest way to say: “Give me only the distinct items — unordered and blazing fast to check.”