The Problem: Your 3-Second Query Just Hit 3 Minutes
You run a simple aggregation on DuckDB:
SELECT category, SUM(revenue), AVG(discount)
FROM sales_1b
WHERE date >= '2026-01-01'
GROUP BY category;
Three minutes later — still waiting. RAM is maxed out. Fans are spinning.
Where’s the bottleneck? The data? The SQL? Or DuckDB itself?
The answer in most cases: it’s not DuckDB — it’s how you’re using it. DuckDB’s columnar engine and vectorized execution are already fast, but default settings and habitual query patterns can hold it back.
These 5 tips cover the most commonly overlooked performance levers. Every one includes executable SQL you can try on your own queries right now.
5 Performance Tuning Tips
Tip 1: Stop Guessing — Read the Execution Plan
EXPLAIN ANALYZE is the single most important tool for diagnosing performance. It’s also the one most people skip.
Most engineers optimize by instinct — “joins are slow, add an index” or “data is large, buy more RAM.” But 90% of the time, the real problem is different from what you guess.
The command is straightforward:
EXPLAIN ANALYZE
SELECT category, SUM(revenue)
FROM sales_1b
WHERE date >= '2026-01-01'
GROUP BY category;
The output has two parts:
- Logical Plan — what DuckDB intends to do
- Physical Plan — what actually happened, with timing and row counts
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Actual Time: 12.34s ││
││ Hash GroupBy: 9750000 rows ││
││ 82% of total time ││ ← bottleneck here
││ card estimate: 100 ││
││ actual cardinality: 2000 ││ ← severely underestimated!
│└───────────────────────────────────┘│
│┌───────────────────────────────────┐│
││ Seq Scan: sales_1b ││
││ Actual Time: 2.11s ││
││ rows scanned: 1B -> 600M ││ ← predicate pushdown pruned 40%
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
Three things to read in the output:
| Metric | What to check | Red flag |
|---|---|---|
| Actual Time | Time/percentage per node | Any node above 50% is your bottleneck |
| Cardinality estimate vs actual | Row estimate vs real count | 10x+ mismatch means optimizer chose wrong join strategy |
| Filter efficiency | Rows after scan | Predicate pushdown failed — check WHERE clause |
Real-world case: A query dropped from 45s to 0.3s — not by rewriting SQL, but by discovering DuckDB’s cardinality estimate was off by 100x, causing a nested-loop join instead of hash join. Running
ANALYZEto update statistics fixed it instantly.
What Each Plan Signal Means
"HASH_GROUP_BY"+"ACTUAL_TIME: 80%"— too many group keys or high cardinality"CROSS_PRODUCT"— missing join condition"SEQ_SCAN: rows=1B"— full table scan is inevitable, but column pruning can reduce data read"card estimate: 2"/"actual: 500000"— stale statistics, runANALYZE
Do this now: Prefix any slow query with EXPLAIN ANALYZE. Find the bottleneck, then optimize.
Tip 2: File Format & Partitioning Strategy
Your data source choice directly impacts DuckDB’s read efficiency. The priority order:
Parquet > DuckDB Native Format > CSV/JSON
CSV vs Parquet: Real Benchmarks
| Metric | CSV | Parquet |
|---|---|---|
| 100M row scan | 18.4s | 1.2s |
| File size | 4.2GB | 780MB |
| Query 3 columns only | Still reads all columns | Reads only requested columns |
| Predicate pushdown | Not supported (reads everything) | Supported (stripe-level pruning) |
-- CSV: must parse and scan the whole file
SELECT SUM(amount) FROM 'sales.csv'; -- 18s
-- Parquet: column pruning kicks in automatically
SELECT SUM(amount) FROM 'sales.parquet'; -- 1.2s
-- With predicate pushdown, Parquet's advantage grows
SELECT SUM(amount) FROM 'sales.parquet'
WHERE date >= '2026-06-01'; -- 0.3s (scans only relevant stripes)
Partitioned Reads with HIVE_PARTITIONING
If you have millions of files, read_parquet with hive partitioning can slash scan volume from all files to only matching subdirectories:
-- Full scan: 100 parquet files
SELECT region, SUM(sales)
FROM read_parquet('data/*.parquet')
GROUP BY region;
-- Partition-pruned: only January files
SELECT region, SUM(sales)
FROM read_parquet('data/*/*.parquet',
hive_partitioning = true)
WHERE month = '2026-01'
AND region = 'APAC'; -- reads only matching partition dirs
Same query, but the first scans 100 files and the second reads 2-3. In production, this is typically the difference between 30 seconds and 1 second.
FILE_GLOB Patterns
When you need precise file selection:
SELECT * FROM read_parquet('data/2026-{01,02,03}/*.parquet');
-- Or with glob patterns
SELECT * FROM read_parquet('data/2026-0[1-3]/*.parquet');
Do this now: If your data is still in CSV, spend 10 minutes converting to Parquet. It’s the highest-ROI optimization you can make.
Tip 3: Indexes — They Exist, but Not How You Think
Coming from PostgreSQL/MySQL, your first instinct might be “query is slow, add an index.” DuckDB’s index support is more limited in scope.
DuckDB’s Index Types
| Type | Best for | When it won’t help |
|---|---|---|
| ART (Adaptive Radix Tree) | Point lookups WHERE id = 123 | Range scans, aggregations, joins |
| B-tree (Zone Maps) | Auto-maintained column min/max stats | Limited effect on high-cardinality columns |
-- ART index helps point lookups
CREATE INDEX idx_user ON users USING ART(user_id);
SELECT * FROM users WHERE user_id = 42; -- uses index, microseconds
SELECT * FROM users WHERE user_id > 100; -- full table scan, index ignored
The Real Performance Weapon: Pre-Aggregated Tables
DuckDB’s columnar storage and vectorized engine mean that materialized pre-aggregation beats indexes every time for analytical queries.
-- Before optimization: aggregates 1B rows on every query
SELECT DATE_TRUNC('day', ts), region,
SUM(revenue), COUNT(DISTINCT user_id)
FROM raw_events
WHERE ts >= NOW() - INTERVAL '7 days'
GROUP BY ALL;
-- After: pre-aggregate to hourly level (transform-on-write)
CREATE TABLE hourly_metrics AS
SELECT DATE_TRUNC('hour', ts) AS hour,
region,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT user_id) AS unique_users
FROM raw_events
GROUP BY ALL;
-- Query from hourly table: scan 168 rows instead of 1B
SELECT DATE_TRUNC('day', hour) AS day,
region,
SUM(total_revenue),
SUM(unique_users)
FROM hourly_metrics
WHERE hour >= NOW() - INTERVAL '7 days'
GROUP BY ALL; -- millisecond
-- Alternative: CREATE MACRO for lightweight caching
CREATE MACRO daily_active_users(d DATE) AS (
SELECT COUNT(DISTINCT user_id)
FROM sessions
WHERE session_date = d
);
-- Call: DuckDB caches macro results
SELECT daily_active_users('2026-05-01');
Do this now: Find your most frequently run aggregation query. Build a pre-aggregated table at a coarser granularity — row scan drops from hundreds of millions to thousands, query time from minutes to milliseconds.
Tip 4: Memory Management — Is Your Query Spilling to Disk?
The most common hidden cause of slow DuckDB queries: data doesn’t fit in memory and spills to disk.
Typical spill symptoms:
- Query starts fast, then abruptly slows down
- Disk I/O spikes while CPU usage stays low
- Same query, same data, wildly different run times
How to Check for Spilling
-- Check temp file directory
PRAGMA show_temporary_files;
-- Or query directly
SELECT * FROM duckdb_temporary_files();
If a running query is producing temp files (default: /tmp/duckdb), your data doesn’t fit in memory and DuckDB is writing to disk — 10-100x slower.
Memory Configuration Trio
-- 1. Allocate enough memory (default is 75% of available RAM)
PRAGMA memory_limit = '8GB';
-- 2. Point temp files to SSD (not HDD or network mounts)
PRAGMA temp_directory = '/mnt/ssd/duckdb_tmp';
-- 3. Cap per-operation memory (prevent one query from starving others)
PRAGMA hash_table_size_limit = '2GB';
PRAGMA out_of_core_threshold = '2GB';
💡 Pro Tip: Unless you’re certain everything fits in memory (e.g., a single 100MB table), always set
temp_directoryto an SSD. The default/tmpis often a ramdisk — spilling there doesn’t help (you’re competing for the same RAM).
Minimum Memory by Operation
| Operation | Minimum | Suggested |
|---|---|---|
GROUP BY (full table) | Result set size | Usually < 1GB |
ORDER BY (full table) | 1.2x data size | Data < 80% of RAM |
HASH JOIN (two large tables) | Left table size | Left table < available RAM |
DISTINCT (high cardinality) | Distinct values size | Watch out above 10M distinct |
UNION / UNION ALL | Input size | UNION consumes ~2x memory |
Do this now: Add PRAGMA memory_limit = '80% of RAM' + PRAGMA temp_directory = 'SSD path', then re-run your slow query. If it’s 5x+ faster, you were spilling to disk.
Tip 5: Parallelism — Your 8-Core Machine May Be Using 1
DuckDB uses Morsel-Driven Parallelism — queries are split into small chunks (morsels) processed concurrently by multiple threads.
But default settings don’t always match your hardware.
-- Check current thread count
SELECT current_setting('threads');
-- Set explicitly (physical core count is usually the sweet spot)
SET threads = 8;
-- For production: diminishing returns above 16 cores
SET threads = 16;
Which Operations Parallelize?
| Operation | Parallel? | Scalability |
|---|---|---|
Seq Scan (Parquet) | ✅ File-level | Linear |
HASH_GROUP_BY | ✅ Phased parallel | Near-linear |
HASH_JOIN | ✅ Build phase | Good |
ORDER BY | ✅ Multi-way merge | Moderate |
WINDOW Functions | ⚠️ Partial | Depends on PARTITION BY |
UNION ALL | ✅ Per-query | Good |
COPY writes | ⚠️ File-lock limited | Set preserve_insertion_order=false |
Accelerating Bulk INSERT
-- Default: DuckDB preserves insertion order (for MVCC safety)
-- Disable it for faster bulk imports:
SET preserve_insertion_order = false;
-- Bulk insert becomes 2-3x faster
INSERT INTO large_table
SELECT * FROM read_parquet('batch_*.parquet');
Thread Count Sweet Spot
Real benchmark (64GB RAM, 1B row CSV aggregation):
| Threads | Time | vs 1 Core |
|---|---|---|
| 1 | 84s | 1x |
| 2 | 43s | 1.9x |
| 4 | 22s | 3.8x |
| 8 | 11s | 7.6x |
| 16 | 7s | 12x |
| 32 | 6.2s | 13.5x (diminishing returns) |
Beyond physical cores, linear scaling stops and context switching overhead kicks in.
-- Recommended: set to physical core count
SET threads = 8; -- for 8-core
-- Or auto-detect: DuckDB detects by default, but explicit is better
Do this now: Add SET threads = <your physical core count> before your slow query. It’s the simplest optimization on this list.
More Resources
Optimization Checklist
| Step | Action | Expected Impact |
|---|---|---|
| 1 | Run EXPLAIN ANALYZE | Find the >50% bottleneck |
| 2 | Check cardinality estimate | Run ANALYZE if 10x+ off |
| 3 | Convert CSV → Parquet | Usually 5-15x faster |
| 4 | Set memory_limit + SSD temp_directory | Prevent disk spill |
| 5 | SET threads = N | 2-8x on multi-core |
| 6 | Build pre-aggregated table | 50-100x for frequent queries |
Diagnostic Script
-- One-shot config check
SELECT name, value
FROM duckdb_settings()
WHERE name IN ('threads', 'memory_limit',
'temp_directory', 'preserve_insertion_order');
-- Check temp files (if a query is running)
SELECT * FROM duckdb_temporary_files()
ORDER BY size DESC;
Coming Next: DuckDB vs Polars — a real-world performance comparison on the same data with the same query.
This post is part of our Wednesday Quick Tips series. For weekend deep dives: DuckDB vs Pandas on 100GB-scale data.
