DuckDB vs SQLite: Million-Row Query Speed Comparison — How Much Faster?

Performance Optimization

DuckDB vs SQLite: Million-Row Query Speed Comparison — How Much Faster?

Real-world benchmark comparing DuckDB vs SQLite on a 1-million-row e-commerce dataset. 10 typical analytical queries tested head-to-head with full SQL code and timing results. Find out when to use each database.

The Problem: When SQLite Can’t Handle Analytical Queries

Alice runs an e-commerce site generating ~500,000 orders per day, all stored in SQLite. When her boss asks for a “quarterly sales trend by category,” she writes the SQL, hits enter — and 30 seconds later, there’s still no result.

SQLite is an excellent embedded OLTP database — it shines at single-row inserts and simple primary-key lookups. But when you throw GROUP BY, window functions, and multi-table aggregations at it, the story changes.

DuckDB fills this gap perfectly — it’s also embedded (no server), but purpose-built for OLAP (Online Analytical Processing).

The questions are: How much faster is DuckDB than SQLite? And when should you switch?

This article runs 10 benchmark queries on a real 1-million-row e-commerce dataset and gives you quantified answers.

Test Environment & Data

Hardware / Software

Component	Specification
CPU	AMD EPYC (4 vCPUs)
RAM	8 GB
Storage	NVMe SSD
OS	Ubuntu 22.04
DuckDB	v1.5.2
SQLite	3.45.1

Test Dataset

A synthetic 1-million-row e-commerce orders table:

Column	Type	Description
id	INTEGER	Primary key
category	VARCHAR	Product category (6 types)
product_name	VARCHAR	Product name (10,000 variants)
price	DOUBLE	Price ($5–$505)
quantity	INTEGER	Quantity (1–10)
discount	DOUBLE	Discount ($1–$101)
order_date	DATE	Random date in 2025
region	VARCHAR	Region (CN/US)
user_id	VARCHAR	User ID (50,000 users)

Generate Data (DuckDB version)

-- Generate 1 million rows of test data with DuckDB
COPY (
  SELECT 
    range + 1 AS id,
    CASE WHEN random() < 0.3 THEN 'electronics'
         WHEN random() < 0.5 THEN 'clothing'
         WHEN random() < 0.65 THEN 'home'
         WHEN random() < 0.78 THEN 'books'
         WHEN random() < 0.88 THEN 'sports'
         ELSE 'food' END AS category,
    'product_' || (range % 10000 + 1) AS product_name,
    ROUND(random() * 500 + 5, 2) AS price,
    (random() * 10 + 1)::INT AS quantity,
    ROUND(random() * 100 + 1, 2) AS discount,
    DATE '2025-01-01' + INTERVAL (random() * 364) DAY AS order_date,
    CASE WHEN random() < 0.5 THEN 'CN' ELSE 'US' END AS region,
    'user_' || (range % 50000 + 1) AS user_id
  FROM range(1000000)
) TO 'ecommerce_1m.csv' (HEADER, DELIMITER ',');

10 Benchmark Queries — Head to Head

Methodology

DuckDB: Queries CSV directly with read_csv_auto, zero ETL
SQLite: .import CSV into a table first, then query
Each query was run multiple times; representative timings are shown
Both databases run identical SQL (syntax adapted minimally where needed)

SQL Test Code

DuckDB version:

-- DuckDB: Load CSV directly (zero ETL)
CREATE TABLE sales AS 
  SELECT * FROM read_csv_auto('ecommerce_1m.csv');

-- Q1: Simple count
SELECT COUNT(*) FROM sales;

-- Q2: Total revenue
SELECT SUM(price * quantity) AS total_revenue FROM sales;

-- Q3: GROUP BY category
SELECT category, COUNT(*) AS orders, 
       SUM(price * quantity) AS revenue, AVG(price) AS avg_price
FROM sales GROUP BY category ORDER BY revenue DESC;

-- Q4: Date range filter
SELECT COUNT(*), SUM(price * quantity) 
FROM sales WHERE order_date BETWEEN '2025-06-01' AND '2025-08-31';

-- Q5: Multi-dimensional GROUP BY
SELECT region, category, COUNT(*) AS cnt, SUM(price * quantity) AS revenue
FROM sales GROUP BY region, category ORDER BY revenue DESC;

-- Q6: Window function - monthly running total
SELECT strftime(order_date, '%Y-%m') AS month, 
       SUM(price * quantity) AS monthly_revenue,
       SUM(SUM(price * quantity)) OVER (ORDER BY strftime(order_date, '%Y-%m')) AS running_total
FROM sales GROUP BY month ORDER BY month;

-- Q7: Top 10 products by revenue
SELECT product_name, SUM(price * quantity) AS revenue, COUNT(*) AS orders
FROM sales GROUP BY product_name ORDER BY revenue DESC LIMIT 10;

-- Q8: Average order value by region
SELECT region, AVG(price * quantity) AS avg_order_value, 
       COUNT(*) AS orders, SUM(price * quantity) AS total_revenue
FROM sales GROUP BY region;

-- Q9: High-frequency users (>5 orders)
SELECT user_id, COUNT(*) AS order_count, SUM(price * quantity) AS total_spent
FROM sales GROUP BY user_id 
HAVING COUNT(*) > 5 ORDER BY total_spent DESC LIMIT 20;

-- Q10: Conditional aggregation
SELECT 
  SUM(CASE WHEN price > 200 THEN 1 ELSE 0 END) AS expensive_orders,
  SUM(CASE WHEN discount > 50 THEN 1 ELSE 0 END) AS high_discount_orders,
  AVG(CASE WHEN region = 'CN' THEN price ELSE NULL END) AS cn_avg_price,
  AVG(CASE WHEN region = 'US' THEN price ELSE NULL END) AS us_avg_price
FROM sales;

SQLite version (compatible syntax):

-- SQLite: Import CSV first
.mode csv
.import ecommerce_1m.csv sales

-- Queries are identical to DuckDB above (SQLite supports the same SQL syntax)

Benchmark Results

#	Query Type	DuckDB	SQLite	Speedup
Q1	COUNT(*)	0.004s	0.035s	8.7x
Q2	SUM (total revenue)	0.005s	0.408s	81.6x
Q3	GROUP BY category	0.020s	2.275s	113.8x
Q4	Date range filter	0.006s	0.413s	68.8x
Q5	Multi-dim GROUP BY	0.020s	4.185s	209.3x
Q6	Window function	0.094s	1.555s	16.5x
Q7	Top 10 products	0.041s	1.473s	35.9x
Q8	Avg order value	0.010s	1.687s	168.7x
Q9	HAVING clause	0.056s	2.323s	41.5x
Q10	Conditional agg	0.043s	1.303s	30.3x

Key finding: On full-scan + aggregation queries (SUM, GROUP BY), DuckDB is 80–200x faster than SQLite. Simple COUNT queries show the smallest gap (8.7x) because SQLite can leverage B-Tree indexes.

Why Is DuckDB So Much Faster?

1. Columnar vs Row-Based Storage

	DuckDB (Columnar)	SQLite (Row-based)
Read pattern	Only touches needed columns	Reads entire rows, discards unwanted columns
Compression	High (same type = predictable patterns)	Low
Cache efficiency	Column data stored contiguously, CPU cache friendly	Row data scattered

Example: Q2 only needs price and quantity columns. DuckDB reads 2 columns; SQLite reads all 9 columns. That’s 4.5x more disk I/O right off the bat.

2. Vectorized Execution

DuckDB processes data in batches of 1024 rows, leveraging CPU SIMD instructions. SQLite processes row-by-row with significant function call overhead.

3. Multi-threaded Parallelism

DuckDB automatically uses all CPU cores for every query. SQLite is single-threaded by default (concurrent writes in SQLite aren’t safe, either).

4. Zero-Copy Reads

DuckDB can query CSV/Parquet files directly — no import phase needed. This saves massive time for one-off analytical tasks.

When to Use SQLite vs DuckDB

✅ Keep SQLite When

Web app backend: Low-latency single-row inserts/updates
Mobile/desktop apps: SQLite is the go-to embedded database (~600KB)
Transaction-heavy: Multiple concurrent writes, ACID compliance needed
Dataset < 100K rows: The performance gap isn’t noticeable

✅ Switch to DuckDB When

Analytics & reporting: GROUP BY, window functions, complex aggregations
Large dataset exploration: 1M+ rows, quick insights needed
ETL pipelines: Read CSV/Parquet/JSON, transform, output
Batch processing: High throughput, low latency not required

Golden Rule

OLTP → SQLite. OLAP → DuckDB.

If your data needs both transactional writes AND analytical queries — write with SQLite, then analyze with DuckDB’s sqlite extension which queries SQLite databases directly:

-- DuckDB can query SQLite databases directly!
INSTALL sqlite;
LOAD sqlite;
SELECT category, SUM(price * quantity) AS revenue
FROM sqlite_scan('myapp.db', 'orders')
GROUP BY category ORDER BY revenue DESC;

Real-World Migration Story

Here’s how Alice solved her problem:

Keep SQLite for writes: The e-commerce system continues writing to SQLite
DuckDB as analytics layer: Every night, DuckDB reads from SQLite to generate reports
Result: The 30-second monthly sales report now runs in 0.1 seconds

# Python: Analyze SQLite data with DuckDB
import duckdb

con = duckdb.connect()
con.execute("INSTALL sqlite; LOAD sqlite;")

# Analyze SQLite data directly — no export needed
result = con.execute("""
    SELECT strftime(order_date, '%Y-%m') AS month,
           category,
           SUM(price * quantity) AS revenue
    FROM sqlite_scan('ecommerce.db', 'orders')
    GROUP BY month, category
    ORDER BY month, revenue DESC
""").fetchdf()

print(result)

Summary

Dimension	DuckDB	SQLite
Design goal	OLAP analytics	OLTP transactions
1M-row GROUP BY	0.02s	2.28s
Multi-dim aggregation	0.02s	4.19s
Window functions	0.09s	1.56s
Install size	~50MB	~600KB
Concurrent writes	Not supported	✅ Supported
Direct CSV query	✅ Native	❌ Must import

Bottom line: If your dataset exceeds 100K rows and your queries involve aggregation or analytics — DuckDB is 10–200x faster than SQLite. They’re not competitors but complements: SQLite for writes, DuckDB for analysis. Use both, and get the best of both worlds.

Architecture Overview

Recommended Reading

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.