Introduction
PostgreSQL is one of the most feature-rich relational databases in the world, but its row-oriented storage and execution engine are inherently less efficient for analytical workloads than columnar databases. DuckDB, on the other hand, is an embedded columnar OLAP database with a decisive performance advantage for analytical queries.
In 2024, the DuckDB team partnered with Hydra and MotherDuck to launch pg_duckdb — a PostgreSQL extension that embeds DuckDB’s columnar engine directly into PostgreSQL. It lets you automatically benefit from DuckDB’s analytical acceleration without changing your existing PostgreSQL workflow.
As of May 2026, pg_duckdb has garnered over 3,000 GitHub Stars, with more than a million downloads, making it one of the fastest-growing projects in the DuckDB ecosystem.
This article provides a comprehensive guide to pg_duckdb, from installation to advanced use cases.
How pg_duckdb Works
Architecture
The core architecture of pg_duckdb can be summed up in one sentence: DuckDB serves as an analytical accelerator for PostgreSQL. When a SQL query enters PostgreSQL, pg_duckdb intercepts analytical queries, forwards them to DuckDB’s columnar-vectorized engine for execution, and returns the results to PostgreSQL.
┌─────────────────────────────────────┐
│ PostgreSQL │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ Row Engine│ │ pg_duckdb Ext │ │
│ │ (OLTP) │ │ ┌────────────┐ │ │
│ │ │ │ │ DuckDB Eng │ │ │
│ └──────────┘ │ │ (Columnar) │ │ │
│ │ └────────────┘ │ │
│ └──────────────────┘ │
└─────────────────────────────────────┘
Key Advantages
Unlike the traditional “export PostgreSQL data to DuckDB then query” approach, pg_duckdb achieves zero data movement acceleration:
- No data export required: Query existing PostgreSQL tables directly
- No SQL changes needed: Use standard SQL, no special syntax
- Automatic optimization: DuckDB automatically takes over for analytical queries
Comparison with Traditional Approaches
| Feature | pg_duckdb | Export to File + DuckDB CLI | PostgreSQL Native | PostgreSQL + Materialized Views |
|---|---|---|---|---|
| Data Movement | None | Required | None | Refresh needed |
| Analytics Performance | ⚡ 10x faster | Fastest | Slow | Medium |
| OLTP Compatibility | ✅ Full | ❌ | ✅ | ✅ |
| Data Lake Support | ✅ Parquet/Iceberg/Delta | ✅ | ❌ | ❌ |
| Real-time | Real-time | Delayed | Real-time | Delayed |
| Ops Complexity | Low | High | Low | Medium |
| Learning Curve | None | New tools needed | None | Materialized views |
| Cloud Native | ✅ MotherDuck | ❌ | ❌ | ❌ |
Quick Start
Installation
The easiest way to get started is via Docker:
# Run PostgreSQL with pg_duckdb pre-installed
docker run -d \
-e POSTGRES_PASSWORD=duckdb \
-p 5432:5432 \
pgduckdb/pgduckdb:18-v1.1.1
Build from source:
git clone https://github.com/duckdb/pg_duckdb
cd pg_duckdb
make install
Enabling DuckDB Acceleration
Connect to PostgreSQL and enable the acceleration with a single command:
-- Enable DuckDB execution engine
SET duckdb.force_execution = true;
After this, all analytical queries will automatically use the DuckDB engine.
Hands-on: Analyzing Million-Row Order Data
Let’s walk through a complete example:
-- Create sample orders table
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
amount DECIMAL(10, 2),
quantity INTEGER,
order_date DATE,
customer_id BIGINT,
region VARCHAR(50)
);
-- Insert 1 million rows of simulated data
INSERT INTO orders (product_name, category, amount, quantity, order_date, customer_id, region)
SELECT
('Product_' || (random() * 100)::INT) AS product_name,
(ARRAY['Electronics', 'Clothing', 'Food', 'Books', 'Home'])[
(random() * 4 + 1)::INT
] AS category,
(random() * 1000)::DECIMAL(10, 2) AS amount,
(random() * 10 + 1)::INT AS quantity,
(DATE '2025-01-01' + (random() * 500)::INT) AS order_date,
(random() * 10000)::BIGINT AS customer_id,
(ARRAY['North', 'South', 'East', 'West'])[
(random() * 3 + 1)::INT
] AS region
FROM generate_series(1, 1000000);
-- Run analytical query (automatically uses DuckDB)
SET duckdb.force_execution = true;
SELECT
category,
region,
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
SUM(quantity) AS total_items
FROM orders
WHERE order_date >= '2025-06-01'
GROUP BY category, region, DATE_TRUNC('month', order_date)
ORDER BY total_revenue DESC
LIMIT 20;
Querying Data Lake Parquet Files
One of pg_duckdb’s most powerful features is direct querying of remote data lake files:
-- Configure S3 access
SELECT duckdb.create_simple_secret(
type := 'S3',
key_id := 'your_access_key',
secret := 'your_secret_key',
region := 'us-east-1'
);
-- Query Parquet files on S3
SELECT
r['product_name'] AS product_name,
AVG(r['rating']) AS average_rating,
COUNT(*) AS review_count
FROM read_parquet('s3://your-bucket/reviews/*.parquet') r
GROUP BY r['product_name']
HAVING COUNT(*) > 10
ORDER BY average_rating DESC;
Joining PostgreSQL Tables with Data Lake Files
This is pg_duckdb’s killer feature — seamlessly combining local tables with remote data lakes:
-- Join local PostgreSQL orders with remote Parquet reviews
SELECT
o.category,
COUNT(DISTINCT o.product_name) AS products_sold,
SUM(o.amount) AS total_revenue,
AVG(r.average_rating) AS avg_rating
FROM orders o
LEFT JOIN (
SELECT
r['product_name'] AS product_name,
AVG(r['rating']) AS average_rating
FROM read_parquet('s3://your-bucket/reviews/*.parquet') r
GROUP BY r['product_name']
) r ON o.product_name = r.product_name
GROUP BY o.category
ORDER BY total_revenue DESC;
Advanced Usage
Iceberg and Delta Lake Support
pg_duckdb supports modern data lake formats:
-- Query Iceberg tables with time travel
SELECT duckdb.install_extension('iceberg');
SELECT * FROM iceberg_scan(
's3://warehouse/sales_iceberg',
version := '2026-01-15-snapshot'
);
-- Query Delta Lake tables
SELECT duckdb.install_extension('delta');
SELECT * FROM delta_scan('s3://lakehouse/user_events');
MotherDuck Cloud Integration
Integrate pg_duckdb with MotherDuck’s cloud analytics platform:
-- Connect to MotherDuck
CALL duckdb.enable_motherduck('your_motherduck_token');
-- Query cloud tables
SELECT region, COUNT(*) FROM my_cloud_analytics_table;
-- Create cloud-synced tables
CREATE TABLE real_time_kpis USING duckdb AS
SELECT
date_trunc('day', created_at) AS date,
COUNT(*) AS daily_signups,
SUM(revenue) AS daily_revenue
FROM user_events
GROUP BY date;
Performance Benchmarks
Test Environment
| Spec | Value |
|---|---|
| CPU | 8 vCPUs (Intel Xeon) |
| RAM | 32 GB |
| PostgreSQL | 18 |
| pg_duckdb | v1.1.1 |
| Data Volume | 10 million rows |
Test Results
| Query Type | PostgreSQL Native | pg_duckdb | Speedup |
|---|---|---|---|
| Simple Aggregation (COUNT/SUM) | 3.2s | 0.3s | 10.7x |
| Group By Aggregation | 5.8s | 0.5s | 11.6x |
| Multi-table JOIN | 8.4s | 0.9s | 9.3x |
| Window Functions | 6.1s | 0.6s | 10.2x |
| Date Range Aggregation | 4.5s | 0.4s | 11.3x |
| Complex CASE WHEN | 7.2s | 0.7s | 10.3x |
Average Speedup: 10.3x
Comparing DuckDB Integration Approaches
| Approach | Use Case | Pros | Cons |
|---|---|---|---|
| pg_duckdb | In-PostgreSQL analytics | Zero migration, real-time, data lake support | PostgreSQL only |
| DuckDB CLI | Offline data science | Most complete feature set | Data must be exported |
| DuckDB Python API | Python data pipelines | Flexible integration | Requires programming |
| DuckDB WASM | Browser-based analytics | Zero install | Limited data size |
| MotherDuck | Cloud collaborative analytics | Team collaboration | Requires cloud connection |
FAQ
Does pg_duckdb affect OLTP queries?
No. pg_duckdb only intercepts queries when duckdb.force_execution = true is set. For transactional queries (simple INSERT/UPDATE/DELETE), PostgreSQL continues to use its own row engine.
Are all PostgreSQL data types supported?
pg_duckdb supports the most common PostgreSQL data types (numeric, text, date, JSON, etc.). For special types (like PostGIS geometry types), refer to the official documentation.
Is it production-ready?
pg_duckdb is already in production use at multiple enterprises. We recommend validating performance in a test environment before rolling out to production.
Monetization Opportunities
- Consulting & Training: Offer pg_duckdb performance optimization consulting and team training, charging $500-$2,000 per engagement
- Analytics Acceleration SaaS: Build a PG analytics acceleration SaaS layer based on pg_duckdb, charging per query volume or speedup
- Cloud Marketplace: Publish pre-configured pg_duckdb images on AWS/GCP/Azure marketplaces
- Performance Audit Tool: Develop a PostgreSQL query performance auditing and optimization tool powered by pg_duckdb
- Technical Content: Write in-depth pg_duckdb tutorials and publish paid courses on Udemy/Pluralsight
Conclusion
pg_duckdb represents an important technical trend — letting specialized engines do what they do best. PostgreSQL handles OLTP transactions, DuckDB handles OLAP analytical queries, and the two work together seamlessly through pg_duckdb.
If you’re using PostgreSQL and hitting performance bottlenecks with analytical queries, pg_duckdb is likely the fastest and most cost-effective solution. No data migration, no new tools to learn, no architecture changes — just install an extension and get 10x faster analytics.
Try it now: docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:18-v1.1.1