Featured image of post DuckDB Performance Tuning: 5 Tips from Slow Queries to Millisecond Response

DuckDB Performance Tuning: 5 Tips from Slow Queries to Millisecond Response

Query too slow? Don't guess — learn to read execution plans, choose the right file format, manage memory, and tune parallelism. 5 actionable SQL tips with real benchmarks.

The Problem: Your 3-Second Query Just Hit 3 Minutes

You run a simple aggregation on DuckDB:

SELECT category, SUM(revenue), AVG(discount)
FROM sales_1b
WHERE date >= '2026-01-01'
GROUP BY category;

Three minutes later — still waiting. RAM is maxed out. Fans are spinning.

Where’s the bottleneck? The data? The SQL? Or DuckDB itself?

The answer in most cases: it’s not DuckDB — it’s how you’re using it. DuckDB’s columnar engine and vectorized execution are already fast, but default settings and habitual query patterns can hold it back.

These 5 tips cover the most commonly overlooked performance levers. Every one includes executable SQL you can try on your own queries right now.


5 Performance Tuning Tips

Tip 1: Stop Guessing — Read the Execution Plan

EXPLAIN ANALYZE is the single most important tool for diagnosing performance. It’s also the one most people skip.

Most engineers optimize by instinct — “joins are slow, add an index” or “data is large, buy more RAM.” But 90% of the time, the real problem is different from what you guess.

The command is straightforward:

EXPLAIN ANALYZE
SELECT category, SUM(revenue)
FROM sales_1b
WHERE date >= '2026-01-01'
GROUP BY category;

The output has two parts:

  • Logical Plan — what DuckDB intends to do
  • Physical Plan — what actually happened, with timing and row counts
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││        Actual Time: 12.34s       ││
││    Hash GroupBy: 9750000 rows    ││
││      82% of total time           ││  ← bottleneck here
││        card estimate: 100        ││
││        actual cardinality: 2000  ││  ← severely underestimated!
│└───────────────────────────────────┘│
│┌───────────────────────────────────┐│
││      Seq Scan: sales_1b          ││
││      Actual Time: 2.11s          ││
││      rows scanned: 1B -> 600M    ││  ← predicate pushdown pruned 40%
│└───────────────────────────────────┘│
└─────────────────────────────────────┘

Three things to read in the output:

MetricWhat to checkRed flag
Actual TimeTime/percentage per nodeAny node above 50% is your bottleneck
Cardinality estimate vs actualRow estimate vs real count10x+ mismatch means optimizer chose wrong join strategy
Filter efficiencyRows after scanPredicate pushdown failed — check WHERE clause

Real-world case: A query dropped from 45s to 0.3s — not by rewriting SQL, but by discovering DuckDB’s cardinality estimate was off by 100x, causing a nested-loop join instead of hash join. Running ANALYZE to update statistics fixed it instantly.

What Each Plan Signal Means

  • "HASH_GROUP_BY" + "ACTUAL_TIME: 80%" — too many group keys or high cardinality
  • "CROSS_PRODUCT" — missing join condition
  • "SEQ_SCAN: rows=1B" — full table scan is inevitable, but column pruning can reduce data read
  • "card estimate: 2" / "actual: 500000" — stale statistics, run ANALYZE

Do this now: Prefix any slow query with EXPLAIN ANALYZE. Find the bottleneck, then optimize.


Tip 2: File Format & Partitioning Strategy

Your data source choice directly impacts DuckDB’s read efficiency. The priority order:

Parquet > DuckDB Native Format > CSV/JSON

CSV vs Parquet: Real Benchmarks

MetricCSVParquet
100M row scan18.4s1.2s
File size4.2GB780MB
Query 3 columns onlyStill reads all columnsReads only requested columns
Predicate pushdownNot supported (reads everything)Supported (stripe-level pruning)
-- CSV: must parse and scan the whole file
SELECT SUM(amount) FROM 'sales.csv';  -- 18s

-- Parquet: column pruning kicks in automatically
SELECT SUM(amount) FROM 'sales.parquet';  -- 1.2s

-- With predicate pushdown, Parquet's advantage grows
SELECT SUM(amount) FROM 'sales.parquet'
WHERE date >= '2026-06-01';  -- 0.3s (scans only relevant stripes)

Partitioned Reads with HIVE_PARTITIONING

If you have millions of files, read_parquet with hive partitioning can slash scan volume from all files to only matching subdirectories:

-- Full scan: 100 parquet files
SELECT region, SUM(sales)
FROM read_parquet('data/*.parquet')
GROUP BY region;

-- Partition-pruned: only January files
SELECT region, SUM(sales)
FROM read_parquet('data/*/*.parquet',
    hive_partitioning = true)
WHERE month = '2026-01'
  AND region = 'APAC';  -- reads only matching partition dirs

Same query, but the first scans 100 files and the second reads 2-3. In production, this is typically the difference between 30 seconds and 1 second.

FILE_GLOB Patterns

When you need precise file selection:

SELECT * FROM read_parquet('data/2026-{01,02,03}/*.parquet');
-- Or with glob patterns
SELECT * FROM read_parquet('data/2026-0[1-3]/*.parquet');

Do this now: If your data is still in CSV, spend 10 minutes converting to Parquet. It’s the highest-ROI optimization you can make.


Tip 3: Indexes — They Exist, but Not How You Think

Coming from PostgreSQL/MySQL, your first instinct might be “query is slow, add an index.” DuckDB’s index support is more limited in scope.

DuckDB’s Index Types

TypeBest forWhen it won’t help
ART (Adaptive Radix Tree)Point lookups WHERE id = 123Range scans, aggregations, joins
B-tree (Zone Maps)Auto-maintained column min/max statsLimited effect on high-cardinality columns
-- ART index helps point lookups
CREATE INDEX idx_user ON users USING ART(user_id);

SELECT * FROM users WHERE user_id = 42;  -- uses index, microseconds
SELECT * FROM users WHERE user_id > 100; -- full table scan, index ignored

The Real Performance Weapon: Pre-Aggregated Tables

DuckDB’s columnar storage and vectorized engine mean that materialized pre-aggregation beats indexes every time for analytical queries.

-- Before optimization: aggregates 1B rows on every query
SELECT DATE_TRUNC('day', ts), region,
       SUM(revenue), COUNT(DISTINCT user_id)
FROM raw_events
WHERE ts >= NOW() - INTERVAL '7 days'
GROUP BY ALL;

-- After: pre-aggregate to hourly level (transform-on-write)
CREATE TABLE hourly_metrics AS
SELECT DATE_TRUNC('hour', ts) AS hour,
       region,
       SUM(revenue) AS total_revenue,
       COUNT(DISTINCT user_id) AS unique_users
FROM raw_events
GROUP BY ALL;

-- Query from hourly table: scan 168 rows instead of 1B
SELECT DATE_TRUNC('day', hour) AS day,
       region,
       SUM(total_revenue),
       SUM(unique_users)
FROM hourly_metrics
WHERE hour >= NOW() - INTERVAL '7 days'
GROUP BY ALL;  -- millisecond
-- Alternative: CREATE MACRO for lightweight caching
CREATE MACRO daily_active_users(d DATE) AS (
    SELECT COUNT(DISTINCT user_id)
    FROM sessions
    WHERE session_date = d
);

-- Call: DuckDB caches macro results
SELECT daily_active_users('2026-05-01');

Do this now: Find your most frequently run aggregation query. Build a pre-aggregated table at a coarser granularity — row scan drops from hundreds of millions to thousands, query time from minutes to milliseconds.


Tip 4: Memory Management — Is Your Query Spilling to Disk?

The most common hidden cause of slow DuckDB queries: data doesn’t fit in memory and spills to disk.

Typical spill symptoms:

  • Query starts fast, then abruptly slows down
  • Disk I/O spikes while CPU usage stays low
  • Same query, same data, wildly different run times

How to Check for Spilling

-- Check temp file directory
PRAGMA show_temporary_files;

-- Or query directly
SELECT * FROM duckdb_temporary_files();

If a running query is producing temp files (default: /tmp/duckdb), your data doesn’t fit in memory and DuckDB is writing to disk — 10-100x slower.

Memory Configuration Trio

-- 1. Allocate enough memory (default is 75% of available RAM)
PRAGMA memory_limit = '8GB';

-- 2. Point temp files to SSD (not HDD or network mounts)
PRAGMA temp_directory = '/mnt/ssd/duckdb_tmp';

-- 3. Cap per-operation memory (prevent one query from starving others)
PRAGMA hash_table_size_limit = '2GB';
PRAGMA out_of_core_threshold = '2GB';

💡 Pro Tip: Unless you’re certain everything fits in memory (e.g., a single 100MB table), always set temp_directory to an SSD. The default /tmp is often a ramdisk — spilling there doesn’t help (you’re competing for the same RAM).

Minimum Memory by Operation

OperationMinimumSuggested
GROUP BY (full table)Result set sizeUsually < 1GB
ORDER BY (full table)1.2x data sizeData < 80% of RAM
HASH JOIN (two large tables)Left table sizeLeft table < available RAM
DISTINCT (high cardinality)Distinct values sizeWatch out above 10M distinct
UNION / UNION ALLInput sizeUNION consumes ~2x memory

Do this now: Add PRAGMA memory_limit = '80% of RAM' + PRAGMA temp_directory = 'SSD path', then re-run your slow query. If it’s 5x+ faster, you were spilling to disk.


Tip 5: Parallelism — Your 8-Core Machine May Be Using 1

DuckDB uses Morsel-Driven Parallelism — queries are split into small chunks (morsels) processed concurrently by multiple threads.

But default settings don’t always match your hardware.

-- Check current thread count
SELECT current_setting('threads');

-- Set explicitly (physical core count is usually the sweet spot)
SET threads = 8;

-- For production: diminishing returns above 16 cores
SET threads = 16;

Which Operations Parallelize?

OperationParallel?Scalability
Seq Scan (Parquet)✅ File-levelLinear
HASH_GROUP_BY✅ Phased parallelNear-linear
HASH_JOIN✅ Build phaseGood
ORDER BY✅ Multi-way mergeModerate
WINDOW Functions⚠️ PartialDepends on PARTITION BY
UNION ALL✅ Per-queryGood
COPY writes⚠️ File-lock limitedSet preserve_insertion_order=false

Accelerating Bulk INSERT

-- Default: DuckDB preserves insertion order (for MVCC safety)
-- Disable it for faster bulk imports:
SET preserve_insertion_order = false;

-- Bulk insert becomes 2-3x faster
INSERT INTO large_table
SELECT * FROM read_parquet('batch_*.parquet');

Thread Count Sweet Spot

Real benchmark (64GB RAM, 1B row CSV aggregation):

ThreadsTimevs 1 Core
184s1x
243s1.9x
422s3.8x
811s7.6x
167s12x
326.2s13.5x (diminishing returns)

Beyond physical cores, linear scaling stops and context switching overhead kicks in.

-- Recommended: set to physical core count
SET threads = 8;  -- for 8-core
-- Or auto-detect: DuckDB detects by default, but explicit is better

Do this now: Add SET threads = <your physical core count> before your slow query. It’s the simplest optimization on this list.


More Resources

Optimization Checklist

StepActionExpected Impact
1Run EXPLAIN ANALYZEFind the >50% bottleneck
2Check cardinality estimateRun ANALYZE if 10x+ off
3Convert CSV → ParquetUsually 5-15x faster
4Set memory_limit + SSD temp_directoryPrevent disk spill
5SET threads = N2-8x on multi-core
6Build pre-aggregated table50-100x for frequent queries

Diagnostic Script

-- One-shot config check
SELECT name, value
FROM duckdb_settings()
WHERE name IN ('threads', 'memory_limit',
               'temp_directory', 'preserve_insertion_order');

-- Check temp files (if a query is running)
SELECT * FROM duckdb_temporary_files()
ORDER BY size DESC;

Coming Next: DuckDB vs Polars — a real-world performance comparison on the same data with the same query.


This post is part of our Wednesday Quick Tips series. For weekend deep dives: DuckDB vs Pandas on 100GB-scale data.

DuckDB Performance Tuning

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy