Performance Tuning

DuckDB Async I/O Engine: The Secret Behind 1.7x Parquet Read Performance

Deep dive into DuckDB's latest asynchronous I/O engine that delivers 1.5-1.7x performance gains for Parquet queries, with real benchmarks and optimization strategies.

DuckDB Async I/O Engine: The Secret Behind 1.7x Parquet Read Performance

In June 2026, the DuckDB core team merged a remarkable PR #23142 — introducing asynchronous I/O support for the Parquet reader, delivering up to 1.68x performance improvement. This article provides an in-depth analysis of how this technology works, benchmark results, and how you can leverage these improvements in your data pipelines.

DuckDB Async I/O Engine Architecture

Why I/O Becomes the Bottleneck

In modern data analytics, Parquet has become the de facto columnar storage format. Its high compression ratio, predicate pushdown support, and suitability for large-scale data analysis make it indispensable. But when data reaches tens or hundreds of gigabytes, the traditional synchronous I/O model encounters a hidden performance bottleneck:

The CPU sits idle waiting for disk or network I/O, while the disk/network waits for the CPU to issue read commands.

This “wait-execute-wait” pattern becomes especially severe in cloud environments (S3, GCS, OSS), where network latency far exceeds local SSD access latency.

How DuckDB’s Async I/O Engine Works

Limitations of Synchronous I/O

In DuckDB’s previous architecture, the Parquet reader worked synchronously:

The main thread requests to read a Parquet file
The main thread waits for data to arrive from disk/network
Once data arrives, the main thread parses and filters it
Repeat

This means every I/O operation blocks the entire Worker thread, even when the CPU has no work to do at that moment.

New Architecture with Async I/O

DuckDB’s async I/O engine redesigns the entire read workflow:

┌─────────────────────────────────────────────────────┐
│                DuckDB Query Engine                   │
│                                                      │
│  ┌──────────┐    ┌──────────────┐    ┌──────────┐  │
│  │  Worker   │───▶│   Task       │───▶│  Result   │  │
│  │  Thread   │    │  Scheduler   │    │  Buffer   │  │
│  │  (Compute)│◀───│  (Schedule)  │    │  (Result) │  │
│  └──────────┘    └──────┬───────┘    └──────────┘  │
│                         │                           │
│                  ┌──────▼───────┐                   │
│                  │  Async I/O   │                   │
│                  │  Thread Pool │                   │
│                  │  (Disk/Net)  │                   │
│                  └──────────────┘                   │
└─────────────────────────────────────────────────────┘

The core innovation lies in introducing a dedicated Asynchronous I/O Thread Pool:

Task Scheduler: The main thread schedules I/O tasks to the async thread pool, then immediately returns a BLOCKED state
Async I/O Pool: A dedicated thread pool handles all disk/network read operations, without blocking compute threads
Multi-strategy Support: Supports multiple prefetch strategies including WHOLE_GROUP, COLUMN_WISE_EAGER, and PREFETCH_FILTERS

Three Prefetch Strategies Explained

Strategy	Working Mode	Best For
`WHOLE_GROUP`	Reads all columns of an entire row group at once	Full column scans, aggregation queries
`COLUMN_WISE_EAGER`	Prefetches all needed columns column-by-column	Wide tables with few selected columns
`PREFETCH_FILTERS`	Reads filter columns first, checks row survival, then reads remaining columns	High-selectivity filtering

Benchmark Results

Let’s analyze the benchmark data provided by DuckDB’s official team. Test environment: AWS c6in.4xlarge instance, reading from S3 in the same region — 8.2 GB, 16 Parquet files, 640 row groups, 17 columns, 64 million rows.

Test 1: Full Column Scan

-- Test full column scan performance
INSTALL httpfs;
LOAD httpfs;

SELECT COUNT(*) AS total_rows
FROM read_parquet('s3://bucket/large_dataset/*.parquet');

-- Full column scan with filter condition
SELECT category, AVG(price) AS avg_price
FROM read_parquet('s3://bucket/large_dataset/*.parquet')
WHERE region = 'us-east-1'
GROUP BY category
ORDER BY avg_price DESC;

Workload	main (sync)	async (16T)	async (32T)	Improvement
Full scan (WHOLE_GROUP)	10.19 s	8.15 s	6.08 s	1.68×
Column-wise eager (12/17 cols)	9.34 s	6.12 s	6.34 s	1.48×
Filter prefetch (high selectivity)	7.82 s	4.93 s	4.51 s	1.73×
Aggregation query	12.45 s	8.21 s	6.73 s	1.85×

Key Findings

32-thread async I/O achieves the biggest gain in full column scans (1.68×), because I/O bandwidth utilization is highest in this scenario
High-selectivity filtering scenarios see the most significant improvement (1.73×), because the PREFETCH_FILTERS strategy maximally reduces redundant I/O
Aggregation queries benefit from improved CPU cache efficiency due to reduced I/O

Test 2: Real-World Scenario Replication

Let’s replicate an equivalent local benchmark using DuckDB’s built-in capabilities:

import duckdb
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import numpy as np

# Generate test data
np.random.seed(42)
n_rows = 10_000_000

data = {
    'id': range(n_rows),
    'category': np.random.choice(['A','B','C','D','E'], n_rows),
    'region': np.random.choice(['us-east-1','us-west-2','eu-west-1','ap-south-1'], n_rows),
    'timestamp': pd.date_range('2024-01-01', periods=n_rows, freq='s'),
}

# Add numeric columns
for i in range(10):
    data[f'value_{i}'] = np.random.randn(n_rows)

# Save as Parquet
df = pd.DataFrame(data)
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, '/tmp/test_dataset', partition_cols=['category', 'region'])

# Test synchronous read
con = duckdb.connect(':memory:')
result_sync = con.execute("""
    SELECT category, region, COUNT(*) AS cnt, AVG(value_0) AS avg_val
    FROM read_parquet('/tmp/test_dataset/**')
    WHERE region = 'us-east-1'
    GROUP BY category, region
""").fetchdf()
print("Sync result:", result_sync)

Comparison with Traditional Synchronous I/O

Dimension	Sync I/O	Async I/O (32 threads)
Full scan (8.2GB S3)	10.19 s	6.08 s
Column prefetch (12/17 cols)	9.34 s	6.12 s
Filter prefetch (high select)	7.82 s	4.51 s
Worker thread utilization	~40%	~85%
I/O concurrency	1	32+
Memory usage	Lower	Slightly higher (buffering)
Best for network	Local SSD	Network storage (S3/OSS)
Ideal scenario	Small files / local data	Large datasets / cloud storage

How to Enable in Production

DuckDB’s async I/O is enabled by default in the latest version. You can fine-tune the behavior through the following configurations:

-- Check current configuration
SHOW all WHERE name LIKE '%io%';

-- Adjust async I/O thread count (based on CPU cores)
SET async_io_threads = 32;

-- Adjust prefetch buffer size
SET io_thread_buffer_size = '2GB';

-- Verify configuration
SELECT * FROM duckdb_settings() WHERE name LIKE '%async%';

Additional Optimizations for DuckLake and Iceberg

If you’re using DuckLake or Iceberg engines, the benefits of async I/O are even greater:

-- Iceberg table query (async I/O enabled automatically)
SELECT event_type, COUNT(*) AS cnt
FROM iceberg_scan('s3://warehouse/events/*.parquet')
WHERE event_date >= '2026-01-01'
GROUP BY event_type;

-- DuckLake table query
SELECT product_category, SUM(revenue) AS total
FROM ducklake_scan('s3://lake/sales/*.parquet')
WHERE quarter = 'Q1'
GROUP BY product_category;

Performance Tuning Recommendations

1. Choose the Right Prefetch Strategy for Your Query Pattern

-- Full aggregation: use WHOLE_GROUP (default)
SET parquet_prefer_whole_group_scan = true;

-- Select few columns: use COLUMN_WISE_EAGER
SET parquet_column_wise_eager = true;

-- High-selectivity filtering: use PREFETCH_FILTERS
SET parquet_prefetch_filters = true;

2. Set Async Thread Count Appropriately

CPU cores ≤ 8         → async_io_threads = 8
CPU cores 8-16        → async_io_threads = 16
CPU cores 16-32       → async_io_threads = 24
CPU cores 32+         → async_io_threads = min(cpu_cores - 4, 64)

Core principle: reserve a few cores for CPU-intensive computation, allocate the majority to async I/O.

3. Optimize Data Sharding and Row Group Size

-- Set appropriate row group size when creating Parquet
SET parquet_row_group_size = 128 * 1024 * 1024;  -- 128MB per row group

-- Partition large files by date, combined with predicate pushdown
CREATE TABLE events_parquet AS
SELECT * FROM read_parquet('s3://bucket/events/2026-*/data/*.parquet')
WHERE event_date = '2026-06-01';

Comprehensive Tool Comparison

Tool	Async I/O	Parquet Optimization	S3 Read Speed	Memory Efficiency	Learning Curve
DuckDB (async)	✅ Built-in	Predicate pushdown + column pruning	1.7× faster	High (columnar)	Low (SQL)
Pandas	❌ None	Basic	Baseline	Low (row-based)	Low (Python)
Spark	✅ Limited	Full	Baseline	Medium	High (cluster)
Dask	✅ Limited	Basic	Baseline	Medium	Medium
ClickHouse	✅ Built-in	Advanced	Fast	High	Medium
SQLite	❌ None	Basic	Slow	Low	Low (SQL)
Polars	⚠️ Partial	Good	Moderate	Medium	Low (Rust)

Monetization Guide: How to Earn Money With This Skill

1. Productize Data Services

With async I/O tuning expertise, you can offer low-cost data analytics services to SMEs:

E-commerce Sales Reports: Use DuckDB to read daily Parquet sales data from S3/OSS. Async I/O reduces analysis of 50GB daily closing data from 30 minutes to 15 minutes
Customer Behavior Analysis: Use PREFETCH_FILTERS strategy to quickly identify high-value user segments for marketing teams
Monthly subscription analytics service: Offer automated data analysis reports to e-commerce/retail clients, charging ¥2,000-5,000/month

2. Build High-Performance ETL Tools

Leverage async I/O to build lightweight ETL pipelines that replace traditional Spark jobs:

# High-performance ETL example
import duckdb

con = duckdb.connect(':memory:')
con.execute("SET async_io_threads = 32")
con.execute("SET parquet_prefetch_filters = true")

# Read from S3, filter, aggregate, write results — all in SQL
result = con.execute("""
    COPY (
        SELECT date_trunc('day', ts) AS day,
               category,
               COUNT(*) AS orders,
               SUM(amount) AS revenue
        FROM read_parquet('s3://data-lake/sales/**/*.parquet')
        WHERE ts >= '2026-01-01'
        GROUP BY 1, 2
    ) TO 's3://report-bucket/daily/' (FORMAT PARQUET);
""").fetchdf()

Package this solution as a SaaS tool and charge enterprise clients handling 10-100GB of data $500-2,000/month in subscription fees.

3. Technical Consulting and Training

Share async I/O benchmark results and optimization insights on tech communities (Medium, Dev.to, LinkedIn)
Create an online course “DuckDB Performance Tuning in Practice,” covering async I/O tuning, query plan analysis, and more
Provide DuckDB consulting for teams with big data needs but limited budgets, charging $300-700 per consultation session

4. Cloud Cost Optimization Service

The performance gains from async I/O mean the same data processing tasks can be completed with lower-tier instances:

Help enterprises downgrade from c6in.4xlarge to c6in.2xlarge, saving approximately 40-50% in compute costs
Reduce S3/GCS API call counts and transfer latency, lowering storage expenses
Provide cloud cost audit and optimization reports, charging 10-20% of the savings amount

Core monetization logic: Async I/O is not just a technical optimization — it’s a cloud cost reduction tool. For enterprises spending over $10,000/year on data processing, your optimization service can pay for itself within 1-2 months.

Summary

DuckDB’s async I/O engine is one of the most significant performance improvements in 2026. It delivers a substantial 1.5-1.7x performance boost for Parquet reads, especially in cloud storage scenarios (S3, OSS). For teams using DuckDB to process large-scale Parquet data, upgrading to the latest version and correctly configuring async I/O parameters represents the most direct, lowest-cost optimization available.

Try enabling async I/O in your production environment today and observe how much your query performance improves!

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.