Computing Fraction of Long Trips with `delayed` Functions in Dask – Python 2026

Computing Fraction of Long Trips with `delayed` Functions in Dask – Python 2026

Calculating fractions (e.g., "what percentage of trips were longer than 30 minutes?") on large datasets is a common analytical task. Using dask.delayed gives you full control over the computation graph, making it ideal for complex or custom fraction calculations that don’t fit neatly into standard Dask DataFrame methods.

TL;DR — Pattern Overview

Use @delayed to define custom computation steps
Build the full task graph first
Call .compute() only once at the end
This approach offers maximum flexibility and clear parallelism

1. Basic Version with `delayed`


from dask import delayed

@delayed
def count_long_trips(chunk):
    """Count trips longer than 30 minutes in a chunk."""
    return (chunk["trip_duration_minutes"] > 30).sum()

@delayed
def count_total_trips(chunk):
    """Count all trips in a chunk."""
    return len(chunk)

# Build computation graph
files = ["trips/part_*.parquet"]

total_long = 0
total_trips = 0

for f in files:
    chunk = delayed(pd.read_parquet)(f)
    total_long += count_long_trips(chunk)
    total_trips += count_total_trips(chunk)

# Final fraction
fraction_long = total_long / total_trips

print("Fraction of long trips:", fraction_long.compute())

2. Cleaner & More Scalable Version


from dask import delayed
import dask.dataframe as dd

@delayed
def compute_fraction(df):
    """Compute fraction of long trips in one DataFrame chunk."""
    long_trips = (df["trip_duration_minutes"] > 30).sum()
    total = len(df)
    return {"long": long_trips, "total": total}

# Use Dask DataFrame for better scalability
ddf = dd.read_parquet("trips/year=2025/*.parquet")

# Map custom delayed function over partitions
delayed_results = ddf.map_partitions(compute_fraction).to_delayed()

# Aggregate all partition results
total_long = delayed(sum)([r["long"] for r in delayed_results])
total_trips = delayed(sum)([r["total"] for r in delayed_results])

fraction = (total_long / total_trips).compute()

print(f"Fraction of long trips: {fraction:.4f} ({fraction*100:.2f}%)")

3. Best Practices in 2026

Use @delayed for custom logic that doesn’t fit Dask’s built-in methods
Keep delayed functions small and focused
Build the entire graph before calling .compute()
Use dd.from_delayed() or map_partitions() when working with tabular data
Visualize the task graph with .visualize() to verify parallelism
Prefer native Dask methods (e.g., boolean masking + .mean()) when possible — they are usually faster

Conclusion

Computing fractions with dask.delayed gives you fine-grained control over parallel execution. In 2026, this pattern is particularly useful when you need custom aggregation logic or when combining multiple delayed steps. For simple fraction calculations, native Dask DataFrame methods are often faster, but delayed remains the go-to tool for maximum flexibility.

Next steps:

Try rewriting one of your fraction-based calculations using @delayed for better control
Related articles: Parallel Programming with Dask in Python 2026 • Deferring Computation with `delayed` in Dask – Python 2026 Best Practices • Computing the Fraction of Long Trips with Dask in Python 2026

Computing Fraction of Long Trips with `delayed` Functions in Dask – Python 2026

TL;DR — Pattern Overview

1. Basic Version with `delayed`

2. Cleaner & More Scalable Version

3. Best Practices in 2026

Conclusion

Related Articles in Parallel Programming With Dask 2026

Parallel Programming With Dask in Python 2026 – Complete Guide & Best Practices

Dask DataFrame Pipelines in Python 2026 – Best Practices

Using Persistence with Dask in Python 2026 – Best Practices

Generating content...