Computing Fraction of Long Trips with `delayed` Functions in Dask – Python 2026
Calculating fractions (e.g., "what percentage of trips were longer than 30 minutes?") on large datasets is a common analytical task. Using dask.delayed gives you full control over the computation graph, making it ideal for complex or custom fraction calculations that don’t fit neatly into standard Dask DataFrame methods.
TL;DR — Pattern Overview
- Use
@delayedto define custom computation steps - Build the full task graph first
- Call
.compute()only once at the end - This approach offers maximum flexibility and clear parallelism
1. Basic Version with `delayed`
from dask import delayed
@delayed
def count_long_trips(chunk):
"""Count trips longer than 30 minutes in a chunk."""
return (chunk["trip_duration_minutes"] > 30).sum()
@delayed
def count_total_trips(chunk):
"""Count all trips in a chunk."""
return len(chunk)
# Build computation graph
files = ["trips/part_*.parquet"]
total_long = 0
total_trips = 0
for f in files:
chunk = delayed(pd.read_parquet)(f)
total_long += count_long_trips(chunk)
total_trips += count_total_trips(chunk)
# Final fraction
fraction_long = total_long / total_trips
print("Fraction of long trips:", fraction_long.compute())
2. Cleaner & More Scalable Version
from dask import delayed
import dask.dataframe as dd
@delayed
def compute_fraction(df):
"""Compute fraction of long trips in one DataFrame chunk."""
long_trips = (df["trip_duration_minutes"] > 30).sum()
total = len(df)
return {"long": long_trips, "total": total}
# Use Dask DataFrame for better scalability
ddf = dd.read_parquet("trips/year=2025/*.parquet")
# Map custom delayed function over partitions
delayed_results = ddf.map_partitions(compute_fraction).to_delayed()
# Aggregate all partition results
total_long = delayed(sum)([r["long"] for r in delayed_results])
total_trips = delayed(sum)([r["total"] for r in delayed_results])
fraction = (total_long / total_trips).compute()
print(f"Fraction of long trips: {fraction:.4f} ({fraction*100:.2f}%)")
3. Best Practices in 2026
- Use
@delayedfor custom logic that doesn’t fit Dask’s built-in methods - Keep delayed functions small and focused
- Build the entire graph before calling
.compute() - Use
dd.from_delayed()ormap_partitions()when working with tabular data - Visualize the task graph with
.visualize()to verify parallelism - Prefer native Dask methods (e.g., boolean masking +
.mean()) when possible — they are usually faster
Conclusion
Computing fractions with dask.delayed gives you fine-grained control over parallel execution. In 2026, this pattern is particularly useful when you need custom aggregation logic or when combining multiple delayed steps. For simple fraction calculations, native Dask DataFrame methods are often faster, but delayed remains the go-to tool for maximum flexibility.
Next steps:
- Try rewriting one of your fraction-based calculations using
@delayedfor better control