hash() is a built-in Python function that returns a fixed-size integer hash value for an object — a unique (within the current Python session) numerical representation used primarily for fast lookups in hash tables (dict keys, set elements) and for equality testing. In 2026, hash() remains critical in data science (fast grouping, deduplication, caching), software engineering (custom hashing, memoization), and performance-critical code — with deterministic hashes for immutable built-ins (strings, tuples, frozensets) and randomized hashes for security (strings, bytes) unless disabled via PYTHONHASHSEED=0. Only hashable objects can be hashed; mutable types (list, dict, set) raise TypeError.
Here’s a complete, practical guide to using hash() in Python: basic hashing, hashable vs unhashable types, custom __hash__, real-world patterns (earthquake deduplication, caching, fast grouping), and modern best practices with type hints, security, performance, and integration with Dask/Polars/pandas/NumPy.
Basic hash() — returns integer hash for hashable objects.
print(hash(42)) # 42 (integers hash to themselves)
print(hash("hello")) # large integer (randomized per session)
print(hash((1, 2, 3))) # hash of immutable tuple
print(hash(frozenset([1,2]))) # hash of immutable set
# Same content ? same hash (within same session)
print(hash("hello") == hash("hello")) # True
# Different content ? different hash
print(hash("hello") == hash("world")) # False
Hashable vs unhashable — what can and cannot be hashed.
# Hashable (immutable)
print(hash(42)) # int
print(hash("string")) # str
print(hash((1, 2))) # tuple
print(hash(frozenset([1,2]))) # frozenset
# Unhashable (mutable) ? TypeError
try:
hash([1, 2]) # list
except TypeError as e:
print(e)
try:
hash({"a": 1}) # dict
except TypeError as e:
print(e)
Custom __hash__ & __eq__ — make user classes hashable.
@dataclass(frozen=True) # frozen makes it hashable automatically
class Earthquake:
mag: float
place: str
e1 = Earthquake(7.2, "Japan")
e2 = Earthquake(7.2, "Japan")
print(hash(e1) == hash(e2)) # True (same content)
print(e1 == e2) # True (dataclass equality)
# Manual implementation (if not using frozen)
class Event:
def __init__(self, mag, place):
self.mag = mag
self.place = place
def __hash__(self):
return hash((self.mag, self.place))
def __eq__(self, other):
if not isinstance(other, Event):
return NotImplemented
return (self.mag, self.place) == (other.mag, other.place)
e3 = Event(7.2, "Japan")
e4 = Event(7.2, "Japan")
print(hash(e3) == hash(e4)) # True
Real-world pattern: earthquake event deduplication & fast lookup using hashable keys.
import pandas as pd
df = pd.read_csv('earthquakes.csv')
# Create hashable composite key for deduplication
df['event_key'] = df.apply(
lambda row: hash((row['time'], row['latitude'], row['longitude'], row['mag'])), axis=1
)
# Deduplicate based on hash key
deduped = df.drop_duplicates(subset='event_key')
print(f"Original: {len(df)}, Deduped: {len(deduped)}")
# Fast lookup dict with hashable keys
event_dict = {hash((row['time'], row['place'])): row['mag'] for _, row in deduped.iterrows()}
print(f"Fast lookup example: {event_dict.get(hash(('2025-01-01', 'Japan')), 'Not found')}")
Best practices for hash() in Python & data workflows. Use hash() for fast lookups — dict keys, set membership, caching. Modern tip: use Polars pl.struct(['col1', 'col2']).hash() — fast composite hashing; Dask for distributed sets/dicts. Make custom classes hashable — implement __hash__ + __eq__ consistently. Use @dataclass(frozen=True) — auto-generates hash & eq for immutable objects. Use frozenset — for unordered immutable collections as keys. Add type hints — def hash_event(e: Event) -> int: return hash(e). Never rely on hash values being stable across Python sessions — randomized for security (strings, bytes). Disable hash randomization — PYTHONHASHSEED=0 for reproducible hashes (debugging only). Use hash() in @cache/@lru_cache — requires hashable arguments. Use hash(tuple) — for combining multiple values. Use hash(frozenset) — for unordered sets. Avoid hashing mutable objects — TypeError. Profile performance — hash() is very fast (O(1)). Use __hash__ = None — to make class unhashable. Use object.__hash__ — default identity hash for custom classes. Use hashlib — for cryptographic hashes (sha256, etc.).
hash(obj) returns a fixed-size integer hash value — used for dict keys, set membership, caching, and deduplication. In 2026, make objects hashable with __hash__ + __eq__, use frozenset for immutable sets, and integrate with pandas/Polars/Dask for fast grouping and lookups. Master hash(), and you’ll build efficient, scalable data structures and algorithms in Python.
Next time you need fast lookups or unique identification — use hash(). It’s Python’s cleanest way to say: “Give me a unique number for this object — fast and consistent.”