Featured image of post DuckDB vs SQLite: Million-Row Query Speed Comparison — How Much Faster?

DuckDB vs SQLite: Million-Row Query Speed Comparison — How Much Faster?

Real-world benchmark comparing DuckDB vs SQLite on a 1-million-row e-commerce dataset. 10 typical analytical queries tested head-to-head with full SQL code and timing results. Find out when to use each database.

The Problem: When SQLite Can’t Handle Analytical Queries

Alice runs an e-commerce site generating ~500,000 orders per day, all stored in SQLite. When her boss asks for a “quarterly sales trend by category,” she writes the SQL, hits enter — and 30 seconds later, there’s still no result.

SQLite is an excellent embedded OLTP database — it shines at single-row inserts and simple primary-key lookups. But when you throw GROUP BY, window functions, and multi-table aggregations at it, the story changes.

DuckDB fills this gap perfectly — it’s also embedded (no server), but purpose-built for OLAP (Online Analytical Processing).

The questions are: How much faster is DuckDB than SQLite? And when should you switch?

This article runs 10 benchmark queries on a real 1-million-row e-commerce dataset and gives you quantified answers.


Test Environment & Data

Hardware / Software

ComponentSpecification
CPUAMD EPYC (4 vCPUs)
RAM8 GB
StorageNVMe SSD
OSUbuntu 22.04
DuckDBv1.5.2
SQLite3.45.1

Test Dataset

A synthetic 1-million-row e-commerce orders table:

ColumnTypeDescription
idINTEGERPrimary key
categoryVARCHARProduct category (6 types)
product_nameVARCHARProduct name (10,000 variants)
priceDOUBLEPrice ($5–$505)
quantityINTEGERQuantity (1–10)
discountDOUBLEDiscount ($1–$101)
order_dateDATERandom date in 2025
regionVARCHARRegion (CN/US)
user_idVARCHARUser ID (50,000 users)

Generate Data (DuckDB version)

-- Generate 1 million rows of test data with DuckDB
COPY (
  SELECT 
    range + 1 AS id,
    CASE WHEN random() < 0.3 THEN 'electronics'
         WHEN random() < 0.5 THEN 'clothing'
         WHEN random() < 0.65 THEN 'home'
         WHEN random() < 0.78 THEN 'books'
         WHEN random() < 0.88 THEN 'sports'
         ELSE 'food' END AS category,
    'product_' || (range % 10000 + 1) AS product_name,
    ROUND(random() * 500 + 5, 2) AS price,
    (random() * 10 + 1)::INT AS quantity,
    ROUND(random() * 100 + 1, 2) AS discount,
    DATE '2025-01-01' + INTERVAL (random() * 364) DAY AS order_date,
    CASE WHEN random() < 0.5 THEN 'CN' ELSE 'US' END AS region,
    'user_' || (range % 50000 + 1) AS user_id
  FROM range(1000000)
) TO 'ecommerce_1m.csv' (HEADER, DELIMITER ',');

10 Benchmark Queries — Head to Head

Methodology

  • DuckDB: Queries CSV directly with read_csv_auto, zero ETL
  • SQLite: .import CSV into a table first, then query
  • Each query was run multiple times; representative timings are shown
  • Both databases run identical SQL (syntax adapted minimally where needed)

SQL Test Code

DuckDB version:

-- DuckDB: Load CSV directly (zero ETL)
CREATE TABLE sales AS 
  SELECT * FROM read_csv_auto('ecommerce_1m.csv');

-- Q1: Simple count
SELECT COUNT(*) FROM sales;

-- Q2: Total revenue
SELECT SUM(price * quantity) AS total_revenue FROM sales;

-- Q3: GROUP BY category
SELECT category, COUNT(*) AS orders, 
       SUM(price * quantity) AS revenue, AVG(price) AS avg_price
FROM sales GROUP BY category ORDER BY revenue DESC;

-- Q4: Date range filter
SELECT COUNT(*), SUM(price * quantity) 
FROM sales WHERE order_date BETWEEN '2025-06-01' AND '2025-08-31';

-- Q5: Multi-dimensional GROUP BY
SELECT region, category, COUNT(*) AS cnt, SUM(price * quantity) AS revenue
FROM sales GROUP BY region, category ORDER BY revenue DESC;

-- Q6: Window function - monthly running total
SELECT strftime(order_date, '%Y-%m') AS month, 
       SUM(price * quantity) AS monthly_revenue,
       SUM(SUM(price * quantity)) OVER (ORDER BY strftime(order_date, '%Y-%m')) AS running_total
FROM sales GROUP BY month ORDER BY month;

-- Q7: Top 10 products by revenue
SELECT product_name, SUM(price * quantity) AS revenue, COUNT(*) AS orders
FROM sales GROUP BY product_name ORDER BY revenue DESC LIMIT 10;

-- Q8: Average order value by region
SELECT region, AVG(price * quantity) AS avg_order_value, 
       COUNT(*) AS orders, SUM(price * quantity) AS total_revenue
FROM sales GROUP BY region;

-- Q9: High-frequency users (>5 orders)
SELECT user_id, COUNT(*) AS order_count, SUM(price * quantity) AS total_spent
FROM sales GROUP BY user_id 
HAVING COUNT(*) > 5 ORDER BY total_spent DESC LIMIT 20;

-- Q10: Conditional aggregation
SELECT 
  SUM(CASE WHEN price > 200 THEN 1 ELSE 0 END) AS expensive_orders,
  SUM(CASE WHEN discount > 50 THEN 1 ELSE 0 END) AS high_discount_orders,
  AVG(CASE WHEN region = 'CN' THEN price ELSE NULL END) AS cn_avg_price,
  AVG(CASE WHEN region = 'US' THEN price ELSE NULL END) AS us_avg_price
FROM sales;

SQLite version (compatible syntax):

-- SQLite: Import CSV first
.mode csv
.import ecommerce_1m.csv sales

-- Queries are identical to DuckDB above (SQLite supports the same SQL syntax)

Benchmark Results

#Query TypeDuckDBSQLiteSpeedup
Q1COUNT(*)0.004s0.035s8.7x
Q2SUM (total revenue)0.005s0.408s81.6x
Q3GROUP BY category0.020s2.275s113.8x
Q4Date range filter0.006s0.413s68.8x
Q5Multi-dim GROUP BY0.020s4.185s209.3x
Q6Window function0.094s1.555s16.5x
Q7Top 10 products0.041s1.473s35.9x
Q8Avg order value0.010s1.687s168.7x
Q9HAVING clause0.056s2.323s41.5x
Q10Conditional agg0.043s1.303s30.3x

Key finding: On full-scan + aggregation queries (SUM, GROUP BY), DuckDB is 80–200x faster than SQLite. Simple COUNT queries show the smallest gap (8.7x) because SQLite can leverage B-Tree indexes.


Why Is DuckDB So Much Faster?

1. Columnar vs Row-Based Storage

DuckDB (Columnar)SQLite (Row-based)
Read patternOnly touches needed columnsReads entire rows, discards unwanted columns
CompressionHigh (same type = predictable patterns)Low
Cache efficiencyColumn data stored contiguously, CPU cache friendlyRow data scattered

Example: Q2 only needs price and quantity columns. DuckDB reads 2 columns; SQLite reads all 9 columns. That’s 4.5x more disk I/O right off the bat.

2. Vectorized Execution

DuckDB processes data in batches of 1024 rows, leveraging CPU SIMD instructions. SQLite processes row-by-row with significant function call overhead.

3. Multi-threaded Parallelism

DuckDB automatically uses all CPU cores for every query. SQLite is single-threaded by default (concurrent writes in SQLite aren’t safe, either).

4. Zero-Copy Reads

DuckDB can query CSV/Parquet files directly — no import phase needed. This saves massive time for one-off analytical tasks.


When to Use SQLite vs DuckDB

✅ Keep SQLite When

  • Web app backend: Low-latency single-row inserts/updates
  • Mobile/desktop apps: SQLite is the go-to embedded database (~600KB)
  • Transaction-heavy: Multiple concurrent writes, ACID compliance needed
  • Dataset < 100K rows: The performance gap isn’t noticeable

✅ Switch to DuckDB When

  • Analytics & reporting: GROUP BY, window functions, complex aggregations
  • Large dataset exploration: 1M+ rows, quick insights needed
  • ETL pipelines: Read CSV/Parquet/JSON, transform, output
  • Batch processing: High throughput, low latency not required

Golden Rule

OLTP → SQLite. OLAP → DuckDB.

If your data needs both transactional writes AND analytical queries — write with SQLite, then analyze with DuckDB’s sqlite extension which queries SQLite databases directly:

-- DuckDB can query SQLite databases directly!
INSTALL sqlite;
LOAD sqlite;
SELECT category, SUM(price * quantity) AS revenue
FROM sqlite_scan('myapp.db', 'orders')
GROUP BY category ORDER BY revenue DESC;

Real-World Migration Story

Here’s how Alice solved her problem:

  1. Keep SQLite for writes: The e-commerce system continues writing to SQLite
  2. DuckDB as analytics layer: Every night, DuckDB reads from SQLite to generate reports
  3. Result: The 30-second monthly sales report now runs in 0.1 seconds
# Python: Analyze SQLite data with DuckDB
import duckdb

con = duckdb.connect()
con.execute("INSTALL sqlite; LOAD sqlite;")

# Analyze SQLite data directly — no export needed
result = con.execute("""
    SELECT strftime(order_date, '%Y-%m') AS month,
           category,
           SUM(price * quantity) AS revenue
    FROM sqlite_scan('ecommerce.db', 'orders')
    GROUP BY month, category
    ORDER BY month, revenue DESC
""").fetchdf()

print(result)

Summary

DimensionDuckDBSQLite
Design goalOLAP analyticsOLTP transactions
1M-row GROUP BY0.02s2.28s
Multi-dim aggregation0.02s4.19s
Window functions0.09s1.56s
Install size~50MB~600KB
Concurrent writesNot supported✅ Supported
Direct CSV query✅ Native❌ Must import

Bottom line: If your dataset exceeds 100K rows and your queries involve aggregation or analytics — DuckDB is 10–200x faster than SQLite. They’re not competitors but complements: SQLite for writes, DuckDB for analysis. Use both, and get the best of both worlds.


Architecture Overview

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy