DuckDB Async I/O Engine: The Secret Behind 1.7x Parquet Read Performance
In June 2026, the DuckDB core team merged a remarkable PR #23142 — introducing asynchronous I/O support for the Parquet reader, delivering up to 1.68x performance improvement. This article provides an in-depth analysis of how this technology works, benchmark results, and how you can leverage these improvements in your data pipelines.

Why I/O Becomes the Bottleneck
In modern data analytics, Parquet has become the de facto columnar storage format. Its high compression ratio, predicate pushdown support, and suitability for large-scale data analysis make it indispensable. But when data reaches tens or hundreds of gigabytes, the traditional synchronous I/O model encounters a hidden performance bottleneck:
The CPU sits idle waiting for disk or network I/O, while the disk/network waits for the CPU to issue read commands.
This “wait-execute-wait” pattern becomes especially severe in cloud environments (S3, GCS, OSS), where network latency far exceeds local SSD access latency.
How DuckDB’s Async I/O Engine Works
Limitations of Synchronous I/O
In DuckDB’s previous architecture, the Parquet reader worked synchronously:
- The main thread requests to read a Parquet file
- The main thread waits for data to arrive from disk/network
- Once data arrives, the main thread parses and filters it
- Repeat
This means every I/O operation blocks the entire Worker thread, even when the CPU has no work to do at that moment.
New Architecture with Async I/O
DuckDB’s async I/O engine redesigns the entire read workflow:
┌─────────────────────────────────────────────────────┐
│ DuckDB Query Engine │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Worker │───▶│ Task │───▶│ Result │ │
│ │ Thread │ │ Scheduler │ │ Buffer │ │
│ │ (Compute)│◀───│ (Schedule) │ │ (Result) │ │
│ └──────────┘ └──────┬───────┘ └──────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Async I/O │ │
│ │ Thread Pool │ │
│ │ (Disk/Net) │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────┘
The core innovation lies in introducing a dedicated Asynchronous I/O Thread Pool:
- Task Scheduler: The main thread schedules I/O tasks to the async thread pool, then immediately returns a
BLOCKEDstate - Async I/O Pool: A dedicated thread pool handles all disk/network read operations, without blocking compute threads
- Multi-strategy Support: Supports multiple prefetch strategies including
WHOLE_GROUP,COLUMN_WISE_EAGER, andPREFETCH_FILTERS
Three Prefetch Strategies Explained
| Strategy | Working Mode | Best For |
|---|---|---|
WHOLE_GROUP | Reads all columns of an entire row group at once | Full column scans, aggregation queries |
COLUMN_WISE_EAGER | Prefetches all needed columns column-by-column | Wide tables with few selected columns |
PREFETCH_FILTERS | Reads filter columns first, checks row survival, then reads remaining columns | High-selectivity filtering |
Benchmark Results
Let’s analyze the benchmark data provided by DuckDB’s official team. Test environment: AWS c6in.4xlarge instance, reading from S3 in the same region — 8.2 GB, 16 Parquet files, 640 row groups, 17 columns, 64 million rows.
Test 1: Full Column Scan
-- Test full column scan performance
INSTALL httpfs;
LOAD httpfs;
SELECT COUNT(*) AS total_rows
FROM read_parquet('s3://bucket/large_dataset/*.parquet');
-- Full column scan with filter condition
SELECT category, AVG(price) AS avg_price
FROM read_parquet('s3://bucket/large_dataset/*.parquet')
WHERE region = 'us-east-1'
GROUP BY category
ORDER BY avg_price DESC;
| Workload | main (sync) | async (16T) | async (32T) | Improvement |
|---|---|---|---|---|
| Full scan (WHOLE_GROUP) | 10.19 s | 8.15 s | 6.08 s | 1.68× |
| Column-wise eager (12/17 cols) | 9.34 s | 6.12 s | 6.34 s | 1.48× |
| Filter prefetch (high selectivity) | 7.82 s | 4.93 s | 4.51 s | 1.73× |
| Aggregation query | 12.45 s | 8.21 s | 6.73 s | 1.85× |
Key Findings
- 32-thread async I/O achieves the biggest gain in full column scans (1.68×), because I/O bandwidth utilization is highest in this scenario
- High-selectivity filtering scenarios see the most significant improvement (1.73×), because the
PREFETCH_FILTERSstrategy maximally reduces redundant I/O - Aggregation queries benefit from improved CPU cache efficiency due to reduced I/O
Test 2: Real-World Scenario Replication
Let’s replicate an equivalent local benchmark using DuckDB’s built-in capabilities:
import duckdb
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import numpy as np
# Generate test data
np.random.seed(42)
n_rows = 10_000_000
data = {
'id': range(n_rows),
'category': np.random.choice(['A','B','C','D','E'], n_rows),
'region': np.random.choice(['us-east-1','us-west-2','eu-west-1','ap-south-1'], n_rows),
'timestamp': pd.date_range('2024-01-01', periods=n_rows, freq='s'),
}
# Add numeric columns
for i in range(10):
data[f'value_{i}'] = np.random.randn(n_rows)
# Save as Parquet
df = pd.DataFrame(data)
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, '/tmp/test_dataset', partition_cols=['category', 'region'])
# Test synchronous read
con = duckdb.connect(':memory:')
result_sync = con.execute("""
SELECT category, region, COUNT(*) AS cnt, AVG(value_0) AS avg_val
FROM read_parquet('/tmp/test_dataset/**')
WHERE region = 'us-east-1'
GROUP BY category, region
""").fetchdf()
print("Sync result:", result_sync)
Comparison with Traditional Synchronous I/O
| Dimension | Sync I/O | Async I/O (32 threads) |
|---|---|---|
| Full scan (8.2GB S3) | 10.19 s | 6.08 s |
| Column prefetch (12/17 cols) | 9.34 s | 6.12 s |
| Filter prefetch (high select) | 7.82 s | 4.51 s |
| Worker thread utilization | ~40% | ~85% |
| I/O concurrency | 1 | 32+ |
| Memory usage | Lower | Slightly higher (buffering) |
| Best for network | Local SSD | Network storage (S3/OSS) |
| Ideal scenario | Small files / local data | Large datasets / cloud storage |
How to Enable in Production
DuckDB’s async I/O is enabled by default in the latest version. You can fine-tune the behavior through the following configurations:
-- Check current configuration
SHOW all WHERE name LIKE '%io%';
-- Adjust async I/O thread count (based on CPU cores)
SET async_io_threads = 32;
-- Adjust prefetch buffer size
SET io_thread_buffer_size = '2GB';
-- Verify configuration
SELECT * FROM duckdb_settings() WHERE name LIKE '%async%';
Additional Optimizations for DuckLake and Iceberg
If you’re using DuckLake or Iceberg engines, the benefits of async I/O are even greater:
-- Iceberg table query (async I/O enabled automatically)
SELECT event_type, COUNT(*) AS cnt
FROM iceberg_scan('s3://warehouse/events/*.parquet')
WHERE event_date >= '2026-01-01'
GROUP BY event_type;
-- DuckLake table query
SELECT product_category, SUM(revenue) AS total
FROM ducklake_scan('s3://lake/sales/*.parquet')
WHERE quarter = 'Q1'
GROUP BY product_category;
Performance Tuning Recommendations
1. Choose the Right Prefetch Strategy for Your Query Pattern
-- Full aggregation: use WHOLE_GROUP (default)
SET parquet_prefer_whole_group_scan = true;
-- Select few columns: use COLUMN_WISE_EAGER
SET parquet_column_wise_eager = true;
-- High-selectivity filtering: use PREFETCH_FILTERS
SET parquet_prefetch_filters = true;
2. Set Async Thread Count Appropriately
CPU cores ≤ 8 → async_io_threads = 8
CPU cores 8-16 → async_io_threads = 16
CPU cores 16-32 → async_io_threads = 24
CPU cores 32+ → async_io_threads = min(cpu_cores - 4, 64)
Core principle: reserve a few cores for CPU-intensive computation, allocate the majority to async I/O.
3. Optimize Data Sharding and Row Group Size
-- Set appropriate row group size when creating Parquet
SET parquet_row_group_size = 128 * 1024 * 1024; -- 128MB per row group
-- Partition large files by date, combined with predicate pushdown
CREATE TABLE events_parquet AS
SELECT * FROM read_parquet('s3://bucket/events/2026-*/data/*.parquet')
WHERE event_date = '2026-06-01';
Comprehensive Tool Comparison
| Tool | Async I/O | Parquet Optimization | S3 Read Speed | Memory Efficiency | Learning Curve |
|---|---|---|---|---|---|
| DuckDB (async) | ✅ Built-in | Predicate pushdown + column pruning | 1.7× faster | High (columnar) | Low (SQL) |
| Pandas | ❌ None | Basic | Baseline | Low (row-based) | Low (Python) |
| Spark | ✅ Limited | Full | Baseline | Medium | High (cluster) |
| Dask | ✅ Limited | Basic | Baseline | Medium | Medium |
| ClickHouse | ✅ Built-in | Advanced | Fast | High | Medium |
| SQLite | ❌ None | Basic | Slow | Low | Low (SQL) |
| Polars | ⚠️ Partial | Good | Moderate | Medium | Low (Rust) |
Monetization Guide: How to Earn Money With This Skill
1. Productize Data Services
With async I/O tuning expertise, you can offer low-cost data analytics services to SMEs:
- E-commerce Sales Reports: Use DuckDB to read daily Parquet sales data from S3/OSS. Async I/O reduces analysis of 50GB daily closing data from 30 minutes to 15 minutes
- Customer Behavior Analysis: Use
PREFETCH_FILTERSstrategy to quickly identify high-value user segments for marketing teams - Monthly subscription analytics service: Offer automated data analysis reports to e-commerce/retail clients, charging ¥2,000-5,000/month
2. Build High-Performance ETL Tools
Leverage async I/O to build lightweight ETL pipelines that replace traditional Spark jobs:
# High-performance ETL example
import duckdb
con = duckdb.connect(':memory:')
con.execute("SET async_io_threads = 32")
con.execute("SET parquet_prefetch_filters = true")
# Read from S3, filter, aggregate, write results — all in SQL
result = con.execute("""
COPY (
SELECT date_trunc('day', ts) AS day,
category,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM read_parquet('s3://data-lake/sales/**/*.parquet')
WHERE ts >= '2026-01-01'
GROUP BY 1, 2
) TO 's3://report-bucket/daily/' (FORMAT PARQUET);
""").fetchdf()
Package this solution as a SaaS tool and charge enterprise clients handling 10-100GB of data $500-2,000/month in subscription fees.
3. Technical Consulting and Training
- Share async I/O benchmark results and optimization insights on tech communities (Medium, Dev.to, LinkedIn)
- Create an online course “DuckDB Performance Tuning in Practice,” covering async I/O tuning, query plan analysis, and more
- Provide DuckDB consulting for teams with big data needs but limited budgets, charging $300-700 per consultation session
4. Cloud Cost Optimization Service
The performance gains from async I/O mean the same data processing tasks can be completed with lower-tier instances:
- Help enterprises downgrade from c6in.4xlarge to c6in.2xlarge, saving approximately 40-50% in compute costs
- Reduce S3/GCS API call counts and transfer latency, lowering storage expenses
- Provide cloud cost audit and optimization reports, charging 10-20% of the savings amount
Core monetization logic: Async I/O is not just a technical optimization — it’s a cloud cost reduction tool. For enterprises spending over $10,000/year on data processing, your optimization service can pay for itself within 1-2 months.
Summary
DuckDB’s async I/O engine is one of the most significant performance improvements in 2026. It delivers a substantial 1.5-1.7x performance boost for Parquet reads, especially in cloud storage scenarios (S3, OSS). For teams using DuckDB to process large-scale Parquet data, upgrading to the latest version and correctly configuring async I/O parameters represents the most direct, lowest-cost optimization available.
Try enabling async I/O in your production environment today and observe how much your query performance improves!