DuckDB Performance Tuning: 5 Tips from Slow Queries to Millisecond Response

DuckDB Performance Tuning: 5 Tips from Slow Queries to Millisecond Response

Query too slow? Don't guess — learn to read execution plans, choose the right file format, manage memory, and tune parallelism. 5 actionable SQL tips with real benchmarks.

The Problem: Your 3-Second Query Just Hit 3 Minutes

You run a simple aggregation on DuckDB:

SELECT category, SUM(revenue), AVG(discount)
FROM sales_1b
WHERE date >= '2026-01-01'
GROUP BY category;

Three minutes later — still waiting. RAM is maxed out. Fans are spinning.

Where’s the bottleneck? The data? The SQL? Or DuckDB itself?

The answer in most cases: it’s not DuckDB — it’s how you’re using it. DuckDB’s columnar engine and vectorized execution are already fast, but default settings and habitual query patterns can hold it back.

These 5 tips cover the most commonly overlooked performance levers. Every one includes executable SQL you can try on your own queries right now.

5 Performance Tuning Tips

Tip 1: Stop Guessing — Read the Execution Plan

EXPLAIN ANALYZE is the single most important tool for diagnosing performance. It’s also the one most people skip.

Most engineers optimize by instinct — “joins are slow, add an index” or “data is large, buy more RAM.” But 90% of the time, the real problem is different from what you guess.

The command is straightforward:

EXPLAIN ANALYZE
SELECT category, SUM(revenue)
FROM sales_1b
WHERE date >= '2026-01-01'
GROUP BY category;

The output has two parts:

Logical Plan — what DuckDB intends to do
Physical Plan — what actually happened, with timing and row counts

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││        Actual Time: 12.34s       ││
││    Hash GroupBy: 9750000 rows    ││
││      82% of total time           ││  ← bottleneck here
││        card estimate: 100        ││
││        actual cardinality: 2000  ││  ← severely underestimated!
│└───────────────────────────────────┘│
│┌───────────────────────────────────┐│
││      Seq Scan: sales_1b          ││
││      Actual Time: 2.11s          ││
││      rows scanned: 1B -> 600M    ││  ← predicate pushdown pruned 40%
│└───────────────────────────────────┘│
└─────────────────────────────────────┘

Three things to read in the output:

Metric	What to check	Red flag
Actual Time	Time/percentage per node	Any node above 50% is your bottleneck
Cardinality estimate vs actual	Row estimate vs real count	10x+ mismatch means optimizer chose wrong join strategy
Filter efficiency	Rows after scan	Predicate pushdown failed — check WHERE clause

Real-world case: A query dropped from 45s to 0.3s — not by rewriting SQL, but by discovering DuckDB’s cardinality estimate was off by 100x, causing a nested-loop join instead of hash join. Running ANALYZE to update statistics fixed it instantly.

What Each Plan Signal Means

"HASH_GROUP_BY" + "ACTUAL_TIME: 80%" — too many group keys or high cardinality
"CROSS_PRODUCT" — missing join condition
"SEQ_SCAN: rows=1B" — full table scan is inevitable, but column pruning can reduce data read
"card estimate: 2" / "actual: 500000" — stale statistics, run ANALYZE

Do this now: Prefix any slow query with EXPLAIN ANALYZE. Find the bottleneck, then optimize.

Tip 2: File Format & Partitioning Strategy

Your data source choice directly impacts DuckDB’s read efficiency. The priority order:

Parquet > DuckDB Native Format > CSV/JSON

CSV vs Parquet: Real Benchmarks

Metric	CSV	Parquet
100M row scan	18.4s	1.2s
File size	4.2GB	780MB
Query 3 columns only	Still reads all columns	Reads only requested columns
Predicate pushdown	Not supported (reads everything)	Supported (stripe-level pruning)

-- CSV: must parse and scan the whole file
SELECT SUM(amount) FROM 'sales.csv';  -- 18s

-- Parquet: column pruning kicks in automatically
SELECT SUM(amount) FROM 'sales.parquet';  -- 1.2s

-- With predicate pushdown, Parquet's advantage grows
SELECT SUM(amount) FROM 'sales.parquet'
WHERE date >= '2026-06-01';  -- 0.3s (scans only relevant stripes)

Partitioned Reads with HIVE_PARTITIONING

If you have millions of files, read_parquet with hive partitioning can slash scan volume from all files to only matching subdirectories:

-- Full scan: 100 parquet files
SELECT region, SUM(sales)
FROM read_parquet('data/*.parquet')
GROUP BY region;

-- Partition-pruned: only January files
SELECT region, SUM(sales)
FROM read_parquet('data/*/*.parquet',
    hive_partitioning = true)
WHERE month = '2026-01'
  AND region = 'APAC';  -- reads only matching partition dirs

Same query, but the first scans 100 files and the second reads 2-3. In production, this is typically the difference between 30 seconds and 1 second.

FILE_GLOB Patterns

When you need precise file selection:

SELECT * FROM read_parquet('data/2026-{01,02,03}/*.parquet');
-- Or with glob patterns
SELECT * FROM read_parquet('data/2026-0[1-3]/*.parquet');

Do this now: If your data is still in CSV, spend 10 minutes converting to Parquet. It’s the highest-ROI optimization you can make.

Tip 3: Indexes — They Exist, but Not How You Think

Coming from PostgreSQL/MySQL, your first instinct might be “query is slow, add an index.” DuckDB’s index support is more limited in scope.

DuckDB’s Index Types

Type	Best for	When it won’t help
ART (Adaptive Radix Tree)	Point lookups `WHERE id = 123`	Range scans, aggregations, joins
B-tree (Zone Maps)	Auto-maintained column min/max stats	Limited effect on high-cardinality columns

-- ART index helps point lookups
CREATE INDEX idx_user ON users USING ART(user_id);

SELECT * FROM users WHERE user_id = 42;  -- uses index, microseconds
SELECT * FROM users WHERE user_id > 100; -- full table scan, index ignored

The Real Performance Weapon: Pre-Aggregated Tables

DuckDB’s columnar storage and vectorized engine mean that materialized pre-aggregation beats indexes every time for analytical queries.

-- Before optimization: aggregates 1B rows on every query
SELECT DATE_TRUNC('day', ts), region,
       SUM(revenue), COUNT(DISTINCT user_id)
FROM raw_events
WHERE ts >= NOW() - INTERVAL '7 days'
GROUP BY ALL;

-- After: pre-aggregate to hourly level (transform-on-write)
CREATE TABLE hourly_metrics AS
SELECT DATE_TRUNC('hour', ts) AS hour,
       region,
       SUM(revenue) AS total_revenue,
       COUNT(DISTINCT user_id) AS unique_users
FROM raw_events
GROUP BY ALL;

-- Query from hourly table: scan 168 rows instead of 1B
SELECT DATE_TRUNC('day', hour) AS day,
       region,
       SUM(total_revenue),
       SUM(unique_users)
FROM hourly_metrics
WHERE hour >= NOW() - INTERVAL '7 days'
GROUP BY ALL;  -- millisecond

-- Alternative: CREATE MACRO for lightweight caching
CREATE MACRO daily_active_users(d DATE) AS (
    SELECT COUNT(DISTINCT user_id)
    FROM sessions
    WHERE session_date = d
);

-- Call: DuckDB caches macro results
SELECT daily_active_users('2026-05-01');

Do this now: Find your most frequently run aggregation query. Build a pre-aggregated table at a coarser granularity — row scan drops from hundreds of millions to thousands, query time from minutes to milliseconds.

Tip 4: Memory Management — Is Your Query Spilling to Disk?

The most common hidden cause of slow DuckDB queries: data doesn’t fit in memory and spills to disk.

Typical spill symptoms:

Query starts fast, then abruptly slows down
Disk I/O spikes while CPU usage stays low
Same query, same data, wildly different run times

How to Check for Spilling

-- Check temp file directory
PRAGMA show_temporary_files;

-- Or query directly
SELECT * FROM duckdb_temporary_files();

If a running query is producing temp files (default: /tmp/duckdb), your data doesn’t fit in memory and DuckDB is writing to disk — 10-100x slower.

Memory Configuration Trio

-- 1. Allocate enough memory (default is 75% of available RAM)
PRAGMA memory_limit = '8GB';

-- 2. Point temp files to SSD (not HDD or network mounts)
PRAGMA temp_directory = '/mnt/ssd/duckdb_tmp';

-- 3. Cap per-operation memory (prevent one query from starving others)
PRAGMA hash_table_size_limit = '2GB';
PRAGMA out_of_core_threshold = '2GB';

💡 Pro Tip: Unless you’re certain everything fits in memory (e.g., a single 100MB table), always set temp_directory to an SSD. The default /tmp is often a ramdisk — spilling there doesn’t help (you’re competing for the same RAM).

Minimum Memory by Operation

Operation	Minimum	Suggested
`GROUP BY` (full table)	Result set size	Usually < 1GB
`ORDER BY` (full table)	1.2x data size	Data < 80% of RAM
`HASH JOIN` (two large tables)	Left table size	Left table < available RAM
`DISTINCT` (high cardinality)	Distinct values size	Watch out above 10M distinct
`UNION` / `UNION ALL`	Input size	UNION consumes ~2x memory

Do this now: Add PRAGMA memory_limit = '80% of RAM' + PRAGMA temp_directory = 'SSD path', then re-run your slow query. If it’s 5x+ faster, you were spilling to disk.

Tip 5: Parallelism — Your 8-Core Machine May Be Using 1

DuckDB uses Morsel-Driven Parallelism — queries are split into small chunks (morsels) processed concurrently by multiple threads.

But default settings don’t always match your hardware.

-- Check current thread count
SELECT current_setting('threads');

-- Set explicitly (physical core count is usually the sweet spot)
SET threads = 8;

-- For production: diminishing returns above 16 cores
SET threads = 16;

Which Operations Parallelize?

Operation	Parallel?	Scalability
`Seq Scan` (Parquet)	✅ File-level	Linear
`HASH_GROUP_BY`	✅ Phased parallel	Near-linear
`HASH_JOIN`	✅ Build phase	Good
`ORDER BY`	✅ Multi-way merge	Moderate
`WINDOW` Functions	⚠️ Partial	Depends on PARTITION BY
`UNION ALL`	✅ Per-query	Good
`COPY` writes	⚠️ File-lock limited	Set `preserve_insertion_order=false`

Accelerating Bulk INSERT

-- Default: DuckDB preserves insertion order (for MVCC safety)
-- Disable it for faster bulk imports:
SET preserve_insertion_order = false;

-- Bulk insert becomes 2-3x faster
INSERT INTO large_table
SELECT * FROM read_parquet('batch_*.parquet');

Thread Count Sweet Spot

Real benchmark (64GB RAM, 1B row CSV aggregation):

Threads	Time	vs 1 Core
1	84s	1x
2	43s	1.9x
4	22s	3.8x
8	11s	7.6x
16	7s	12x
32	6.2s	13.5x (diminishing returns)

Beyond physical cores, linear scaling stops and context switching overhead kicks in.

-- Recommended: set to physical core count
SET threads = 8;  -- for 8-core
-- Or auto-detect: DuckDB detects by default, but explicit is better

Do this now: Add SET threads = <your physical core count> before your slow query. It’s the simplest optimization on this list.

More Resources

Optimization Checklist

Step	Action	Expected Impact
1	Run `EXPLAIN ANALYZE`	Find the >50% bottleneck
2	Check cardinality estimate	Run `ANALYZE` if 10x+ off
3	Convert CSV → Parquet	Usually 5-15x faster
4	Set `memory_limit` + SSD `temp_directory`	Prevent disk spill
5	`SET threads = N`	2-8x on multi-core
6	Build pre-aggregated table	50-100x for frequent queries

Diagnostic Script

-- One-shot config check
SELECT name, value
FROM duckdb_settings()
WHERE name IN ('threads', 'memory_limit',
               'temp_directory', 'preserve_insertion_order');

-- Check temp files (if a query is running)
SELECT * FROM duckdb_temporary_files()
ORDER BY size DESC;

Coming Next: DuckDB vs Polars — a real-world performance comparison on the same data with the same query.

This post is part of our Wednesday Quick Tips series. For weekend deep dives: DuckDB vs Pandas on 100GB-scale data.

DuckDB Performance Tuning

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.