The Problem: When SQLite Can’t Handle Analytical Queries
Alice runs an e-commerce site generating ~500,000 orders per day, all stored in SQLite. When her boss asks for a “quarterly sales trend by category,” she writes the SQL, hits enter — and 30 seconds later, there’s still no result.
SQLite is an excellent embedded OLTP database — it shines at single-row inserts and simple primary-key lookups. But when you throw GROUP BY, window functions, and multi-table aggregations at it, the story changes.
DuckDB fills this gap perfectly — it’s also embedded (no server), but purpose-built for OLAP (Online Analytical Processing).
The questions are: How much faster is DuckDB than SQLite? And when should you switch?
This article runs 10 benchmark queries on a real 1-million-row e-commerce dataset and gives you quantified answers.
Test Environment & Data
Hardware / Software
| Component | Specification |
|---|---|
| CPU | AMD EPYC (4 vCPUs) |
| RAM | 8 GB |
| Storage | NVMe SSD |
| OS | Ubuntu 22.04 |
| DuckDB | v1.5.2 |
| SQLite | 3.45.1 |
Test Dataset
A synthetic 1-million-row e-commerce orders table:
| Column | Type | Description |
|---|---|---|
| id | INTEGER | Primary key |
| category | VARCHAR | Product category (6 types) |
| product_name | VARCHAR | Product name (10,000 variants) |
| price | DOUBLE | Price ($5–$505) |
| quantity | INTEGER | Quantity (1–10) |
| discount | DOUBLE | Discount ($1–$101) |
| order_date | DATE | Random date in 2025 |
| region | VARCHAR | Region (CN/US) |
| user_id | VARCHAR | User ID (50,000 users) |
Generate Data (DuckDB version)
-- Generate 1 million rows of test data with DuckDB
COPY (
SELECT
range + 1 AS id,
CASE WHEN random() < 0.3 THEN 'electronics'
WHEN random() < 0.5 THEN 'clothing'
WHEN random() < 0.65 THEN 'home'
WHEN random() < 0.78 THEN 'books'
WHEN random() < 0.88 THEN 'sports'
ELSE 'food' END AS category,
'product_' || (range % 10000 + 1) AS product_name,
ROUND(random() * 500 + 5, 2) AS price,
(random() * 10 + 1)::INT AS quantity,
ROUND(random() * 100 + 1, 2) AS discount,
DATE '2025-01-01' + INTERVAL (random() * 364) DAY AS order_date,
CASE WHEN random() < 0.5 THEN 'CN' ELSE 'US' END AS region,
'user_' || (range % 50000 + 1) AS user_id
FROM range(1000000)
) TO 'ecommerce_1m.csv' (HEADER, DELIMITER ',');
10 Benchmark Queries — Head to Head
Methodology
- DuckDB: Queries CSV directly with
read_csv_auto, zero ETL - SQLite:
.importCSV into a table first, then query - Each query was run multiple times; representative timings are shown
- Both databases run identical SQL (syntax adapted minimally where needed)
SQL Test Code
DuckDB version:
-- DuckDB: Load CSV directly (zero ETL)
CREATE TABLE sales AS
SELECT * FROM read_csv_auto('ecommerce_1m.csv');
-- Q1: Simple count
SELECT COUNT(*) FROM sales;
-- Q2: Total revenue
SELECT SUM(price * quantity) AS total_revenue FROM sales;
-- Q3: GROUP BY category
SELECT category, COUNT(*) AS orders,
SUM(price * quantity) AS revenue, AVG(price) AS avg_price
FROM sales GROUP BY category ORDER BY revenue DESC;
-- Q4: Date range filter
SELECT COUNT(*), SUM(price * quantity)
FROM sales WHERE order_date BETWEEN '2025-06-01' AND '2025-08-31';
-- Q5: Multi-dimensional GROUP BY
SELECT region, category, COUNT(*) AS cnt, SUM(price * quantity) AS revenue
FROM sales GROUP BY region, category ORDER BY revenue DESC;
-- Q6: Window function - monthly running total
SELECT strftime(order_date, '%Y-%m') AS month,
SUM(price * quantity) AS monthly_revenue,
SUM(SUM(price * quantity)) OVER (ORDER BY strftime(order_date, '%Y-%m')) AS running_total
FROM sales GROUP BY month ORDER BY month;
-- Q7: Top 10 products by revenue
SELECT product_name, SUM(price * quantity) AS revenue, COUNT(*) AS orders
FROM sales GROUP BY product_name ORDER BY revenue DESC LIMIT 10;
-- Q8: Average order value by region
SELECT region, AVG(price * quantity) AS avg_order_value,
COUNT(*) AS orders, SUM(price * quantity) AS total_revenue
FROM sales GROUP BY region;
-- Q9: High-frequency users (>5 orders)
SELECT user_id, COUNT(*) AS order_count, SUM(price * quantity) AS total_spent
FROM sales GROUP BY user_id
HAVING COUNT(*) > 5 ORDER BY total_spent DESC LIMIT 20;
-- Q10: Conditional aggregation
SELECT
SUM(CASE WHEN price > 200 THEN 1 ELSE 0 END) AS expensive_orders,
SUM(CASE WHEN discount > 50 THEN 1 ELSE 0 END) AS high_discount_orders,
AVG(CASE WHEN region = 'CN' THEN price ELSE NULL END) AS cn_avg_price,
AVG(CASE WHEN region = 'US' THEN price ELSE NULL END) AS us_avg_price
FROM sales;
SQLite version (compatible syntax):
-- SQLite: Import CSV first
.mode csv
.import ecommerce_1m.csv sales
-- Queries are identical to DuckDB above (SQLite supports the same SQL syntax)
Benchmark Results
| # | Query Type | DuckDB | SQLite | Speedup |
|---|---|---|---|---|
| Q1 | COUNT(*) | 0.004s | 0.035s | 8.7x |
| Q2 | SUM (total revenue) | 0.005s | 0.408s | 81.6x |
| Q3 | GROUP BY category | 0.020s | 2.275s | 113.8x |
| Q4 | Date range filter | 0.006s | 0.413s | 68.8x |
| Q5 | Multi-dim GROUP BY | 0.020s | 4.185s | 209.3x |
| Q6 | Window function | 0.094s | 1.555s | 16.5x |
| Q7 | Top 10 products | 0.041s | 1.473s | 35.9x |
| Q8 | Avg order value | 0.010s | 1.687s | 168.7x |
| Q9 | HAVING clause | 0.056s | 2.323s | 41.5x |
| Q10 | Conditional agg | 0.043s | 1.303s | 30.3x |
Key finding: On full-scan + aggregation queries (SUM, GROUP BY), DuckDB is 80–200x faster than SQLite. Simple COUNT queries show the smallest gap (8.7x) because SQLite can leverage B-Tree indexes.
Why Is DuckDB So Much Faster?
1. Columnar vs Row-Based Storage
| DuckDB (Columnar) | SQLite (Row-based) | |
|---|---|---|
| Read pattern | Only touches needed columns | Reads entire rows, discards unwanted columns |
| Compression | High (same type = predictable patterns) | Low |
| Cache efficiency | Column data stored contiguously, CPU cache friendly | Row data scattered |
Example: Q2 only needs price and quantity columns. DuckDB reads 2 columns; SQLite reads all 9 columns. That’s 4.5x more disk I/O right off the bat.
2. Vectorized Execution
DuckDB processes data in batches of 1024 rows, leveraging CPU SIMD instructions. SQLite processes row-by-row with significant function call overhead.
3. Multi-threaded Parallelism
DuckDB automatically uses all CPU cores for every query. SQLite is single-threaded by default (concurrent writes in SQLite aren’t safe, either).
4. Zero-Copy Reads
DuckDB can query CSV/Parquet files directly — no import phase needed. This saves massive time for one-off analytical tasks.
When to Use SQLite vs DuckDB
✅ Keep SQLite When
- Web app backend: Low-latency single-row inserts/updates
- Mobile/desktop apps: SQLite is the go-to embedded database (~600KB)
- Transaction-heavy: Multiple concurrent writes, ACID compliance needed
- Dataset < 100K rows: The performance gap isn’t noticeable
✅ Switch to DuckDB When
- Analytics & reporting: GROUP BY, window functions, complex aggregations
- Large dataset exploration: 1M+ rows, quick insights needed
- ETL pipelines: Read CSV/Parquet/JSON, transform, output
- Batch processing: High throughput, low latency not required
Golden Rule
OLTP → SQLite. OLAP → DuckDB.
If your data needs both transactional writes AND analytical queries — write with SQLite, then analyze with DuckDB’s sqlite extension which queries SQLite databases directly:
-- DuckDB can query SQLite databases directly!
INSTALL sqlite;
LOAD sqlite;
SELECT category, SUM(price * quantity) AS revenue
FROM sqlite_scan('myapp.db', 'orders')
GROUP BY category ORDER BY revenue DESC;
Real-World Migration Story
Here’s how Alice solved her problem:
- Keep SQLite for writes: The e-commerce system continues writing to SQLite
- DuckDB as analytics layer: Every night, DuckDB reads from SQLite to generate reports
- Result: The 30-second monthly sales report now runs in 0.1 seconds
# Python: Analyze SQLite data with DuckDB
import duckdb
con = duckdb.connect()
con.execute("INSTALL sqlite; LOAD sqlite;")
# Analyze SQLite data directly — no export needed
result = con.execute("""
SELECT strftime(order_date, '%Y-%m') AS month,
category,
SUM(price * quantity) AS revenue
FROM sqlite_scan('ecommerce.db', 'orders')
GROUP BY month, category
ORDER BY month, revenue DESC
""").fetchdf()
print(result)
Summary
| Dimension | DuckDB | SQLite |
|---|---|---|
| Design goal | OLAP analytics | OLTP transactions |
| 1M-row GROUP BY | 0.02s | 2.28s |
| Multi-dim aggregation | 0.02s | 4.19s |
| Window functions | 0.09s | 1.56s |
| Install size | ~50MB | ~600KB |
| Concurrent writes | Not supported | ✅ Supported |
| Direct CSV query | ✅ Native | ❌ Must import |
Bottom line: If your dataset exceeds 100K rows and your queries involve aggregation or analytics — DuckDB is 10–200x faster than SQLite. They’re not competitors but complements: SQLite for writes, DuckDB for analysis. Use both, and get the best of both worlds.
