PyInns - Home

Sets in Python are mutable, unordered collections of unique, hashable elements — perfect for membership testing, deduplication, mathematical set operations, and eliminating duplicates without caring about order. Since Python 3.7, insertion order is preserved in CPython (though not guaranteed by the language spec until Python 3.7+), but sets are still fundamentally about uniqueness and fast O(1) average-time lookups. In 2026, sets remain a cornerstone in data science (unique values, overlap analysis, filtering), software engineering (permission sets, tag collections, cache keys), and algorithms (graph vertices/edges, reconciliation) — especially when paired with tuples for composite uniqueness, frozensets for hashability, and Polars/Dask/pandas for large-scale unique operations.

Here’s a complete, practical guide to sets in Python: creation, uniqueness enforcement, membership testing, set operations (union, intersection, difference, symmetric difference), real-world patterns (earthquake country uniqueness, event deduplication, category filtering), and modern best practices with type hints, performance, frozensets, and integration with Polars/pandas/Dask/NumPy.

1. Creating Sets — Literals, Constructors, from Iterables


# Set literal (preferred for static data)
fruits = {"apple", "banana", "orange"}
print(fruits)  # {'banana', 'orange', 'apple'} — order arbitrary

# From list (deduplication)
colors = set(["red", "green", "blue", "red", "green"])
print(colors)  # {'red', 'green', 'blue'}

# From range or string
nums = set(range(5))               # {0, 1, 2, 3, 4}
letters = set("hello")             # {'h', 'e', 'l', 'o'}

# Empty set (note: {} is empty dict!)
empty_set = set()
print(empty_set)                   # set()

2. Uniqueness & Membership — Core Set Properties


numbers = {1, 2, 3, 2, 4, 3}   # duplicates removed
print(numbers)                 # {1, 2, 3, 4}

# Fast membership (O(1) average)
print(3 in numbers)            # True
print(5 in numbers)            # False

# Add & remove
numbers.add(5)
numbers.discard(2)             # safe remove (no error if missing)
print(numbers)                 # {1, 3, 4, 5}

3. Set Operations — Union, Intersection, Difference, Symmetric Difference


set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

# Union — all unique elements
union = set1 | set2
print(union)                   # {1, 2, 3, 4, 5, 6, 7, 8}

# Intersection — common elements
common = set1 & set2
print(common)                  # {4, 5}

# Difference — in set1 but not set2
only_set1 = set1 - set2
print(only_set1)               # {1, 2, 3}

# Symmetric difference — exclusive to each
exclusive = set1 ^ set2
print(exclusive)               # {1, 2, 3, 6, 7, 8}

Real-world pattern: earthquake data uniqueness & overlap analysis


import polars as pl

df = pl.read_csv('earthquakes.csv')

# Unique countries
countries = set(df['country'].to_list())
print(f"Unique countries: {len(countries)}")
print(sorted(countries)[:5])

# Unique event keys (time, lat, lon) for deduplication
event_keys = set(
    (row['time'], row['latitude'], row['longitude'])
    for row in df.iter_rows(named=True)
)
print(f"Unique events: {len(event_keys)}")

# Countries with strong quakes (mag >= 7.0)
strong_countries = set(
    row['country'] for row in df.filter(pl.col('mag') >= 7.0).iter_rows(named=True)
)
print("Countries with strong quakes:", sorted(strong_countries))

# Overlap: countries with both strong and weak events
all_countries = set(df['country'].unique().to_list())
overlap = strong_countries & all_countries
print("Overlap:", sorted(overlap))

Best practices for sets in Python 2026. Prefer set literals — {"a", "b"} — for static sets. Use set() on iterables — for deduplication: set(lst). Use frozenset — when set must be hashable (dict key, set element). Use x in set — O(1) average membership test. Add type hints — Set[int] or AbstractSet[str]. Use Polars unique() — for large-scale unique values. Use pandas unique() — df['col'].unique(). Use Dask unique().compute() — distributed unique. Use set.add() — single element addition. Use set.update() — bulk addition from iterable. Use set.discard() — safe removal (no error if missing). Use set.remove() — error if missing. Use set.pop() — arbitrary removal & return. Use set.clear() — empty set. Use len(set) — cardinality. Use set.union() / | — new union. Use set.intersection() / & — new intersection. Use set.difference() / - — new difference. Use set.symmetric_difference() / ^ — new symmetric difference. Use issubset() / <= — subset check. Use issuperset() / >= — superset check. Use isdisjoint() — no common elements. Use sets for validation — required.issubset(available). Use sets for filtering — valid - invalid. Use sets in config — unique allowed values. Use sets in caching — track seen items. Use sets in graph algorithms — adjacency sets. Use sets in rate limiting — unique IPs per minute. Use sets in anomaly detection — rare events. Use sets in data cleaning — remove invalid categories. Use sets in testing — assert unique count. Use set.union_update() — in-place union. Use set.intersection_update() — in-place intersection. Use set.difference_update() — in-place difference. Use set.symmetric_difference_update() — in-place XOR.

Sets excel at unordered, unique collections — fast membership, deduplication, and mathematical operations. In 2026, combine with tuples for composite uniqueness, frozenset for hashability, Polars/pandas/Dask for scale, and type hints for safety. Master sets, and you’ll handle uniqueness, comparison, filtering, and overlap efficiently in any Python workflow.

Next time you need uniqueness or set-based comparison — reach for set. It’s Python’s cleanest way to say: “Unique elements only — no duplicates, no order, just fast membership and operations.”

1. Creating Sets — Literals, Constructors, from Iterables

2. Uniqueness & Membership — Core Set Properties

3. Set Operations — Union, Intersection, Difference, Symmetric Difference

Real-world pattern: earthquake data uniqueness & overlap analysis

Generating content...