Glob Expressions with Dask in Python 2026 – Best Practices
Glob expressions (using wildcards like `*` and `?`) are the easiest and most powerful way to read multiple files with Dask. In 2026, Dask’s glob support is highly optimized and works seamlessly with Dask DataFrames, Dask Bags, and Dask Arrays.
TL;DR — Common Glob Patterns
"data/*.csv"— all CSV files in a directory"logs/2025/*.log"— all log files in a year folder"data/year=2025/month=*/part-*.parquet"— Hive-style partitioned data"s3://bucket/prefix/*.jsonl"— files on S3
1. Basic Glob Usage
import dask.dataframe as dd
import dask.bag as db
# CSV files
df = dd.read_csv("sales_data/*.csv", blocksize="64MB")
# JSON Lines files
bag = db.read_text("logs/2025/*.jsonl")
# Parquet files with Hive partitioning
ddf = dd.read_parquet("data/year=2025/month=*/*.parquet")
2. Advanced Glob Patterns
# Multiple extensions
files = ["data/*.csv", "data/*.parquet"]
df = dd.read_csv(files, blocksize="128MB") # Note: better to use read_parquet when possible
# Recursive glob (all subdirectories)
bag = db.read_text("logs/**/*.log")
# Specific date range with glob
ddf = dd.read_parquet("data/year=2025/month=0[1-6]/*.parquet") # January to June
3. Best Practices for Glob Expressions in 2026
- Use wildcards (`*`) liberally — Dask handles file discovery efficiently
- Prefer Parquet over CSV when possible (`dd.read_parquet("data/year=*/month=*/*.parquet")`)
- Set appropriate `blocksize` or `partition_size` for your data type
- Use Hive-style partitioning (`year=2025/month=03/`) for best performance
- Combine glob with early filtering to reduce data volume
- Monitor the Dask Dashboard to see how files are being distributed across partitions
Conclusion
Glob expressions are one of the most convenient features when working with Dask. In 2026, using wildcards with `dd.read_csv()`, `dd.read_parquet()`, or `db.read_text()` is the standard way to read multiple files efficiently. Combined with proper chunking and early filtering, glob patterns enable scalable processing of thousands or even millions of files with clean, readable code.
Next steps:
- Replace your manual file lists with glob patterns in your Dask workflows