pg_duckdb: Embed DuckDB's Columnar Engine in PostgreSQL for 10x Analytics Acceleration

pg_duckdb is the official DuckDB PostgreSQL extension that embeds DuckDB's columnar-vectorized engine into PostgreSQL. Achieve 10x faster analytical queries without data export, with native support for Parquet, Iceberg, and Delta Lake.

Introduction

PostgreSQL is one of the most feature-rich relational databases in the world, but its row-oriented storage and execution engine are inherently less efficient for analytical workloads than columnar databases. DuckDB, on the other hand, is an embedded columnar OLAP database with a decisive performance advantage for analytical queries.

In 2024, the DuckDB team partnered with Hydra and MotherDuck to launch pg_duckdb — a PostgreSQL extension that embeds DuckDB’s columnar engine directly into PostgreSQL. It lets you automatically benefit from DuckDB’s analytical acceleration without changing your existing PostgreSQL workflow.

As of May 2026, pg_duckdb has garnered over 3,000 GitHub Stars, with more than a million downloads, making it one of the fastest-growing projects in the DuckDB ecosystem.

This article provides a comprehensive guide to pg_duckdb, from installation to advanced use cases.

How pg_duckdb Works

Architecture

The core architecture of pg_duckdb can be summed up in one sentence: DuckDB serves as an analytical accelerator for PostgreSQL. When a SQL query enters PostgreSQL, pg_duckdb intercepts analytical queries, forwards them to DuckDB’s columnar-vectorized engine for execution, and returns the results to PostgreSQL.

┌─────────────────────────────────────┐
│            PostgreSQL               │
│  ┌──────────┐  ┌──────────────────┐ │
│  │ Row Engine│  │  pg_duckdb Ext  │ │
│  │ (OLTP)    │  │  ┌────────────┐ │ │
│  │           │  │  │ DuckDB Eng │ │ │
│  └──────────┘  │  │ (Columnar)  │ │ │
│                │  └────────────┘ │ │
│                └──────────────────┘ │
└─────────────────────────────────────┘

Key Advantages

Unlike the traditional “export PostgreSQL data to DuckDB then query” approach, pg_duckdb achieves zero data movement acceleration:

  • No data export required: Query existing PostgreSQL tables directly
  • No SQL changes needed: Use standard SQL, no special syntax
  • Automatic optimization: DuckDB automatically takes over for analytical queries

Comparison with Traditional Approaches

Featurepg_duckdbExport to File + DuckDB CLIPostgreSQL NativePostgreSQL + Materialized Views
Data MovementNoneRequiredNoneRefresh needed
Analytics Performance⚡ 10x fasterFastestSlowMedium
OLTP Compatibility✅ Full
Data Lake Support✅ Parquet/Iceberg/Delta
Real-timeReal-timeDelayedReal-timeDelayed
Ops ComplexityLowHighLowMedium
Learning CurveNoneNew tools neededNoneMaterialized views
Cloud Native✅ MotherDuck

Quick Start

Installation

The easiest way to get started is via Docker:

# Run PostgreSQL with pg_duckdb pre-installed
docker run -d \
  -e POSTGRES_PASSWORD=duckdb \
  -p 5432:5432 \
  pgduckdb/pgduckdb:18-v1.1.1

Build from source:

git clone https://github.com/duckdb/pg_duckdb
cd pg_duckdb
make install

Enabling DuckDB Acceleration

Connect to PostgreSQL and enable the acceleration with a single command:

-- Enable DuckDB execution engine
SET duckdb.force_execution = true;

After this, all analytical queries will automatically use the DuckDB engine.

Hands-on: Analyzing Million-Row Order Data

Let’s walk through a complete example:

-- Create sample orders table
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    amount DECIMAL(10, 2),
    quantity INTEGER,
    order_date DATE,
    customer_id BIGINT,
    region VARCHAR(50)
);

-- Insert 1 million rows of simulated data
INSERT INTO orders (product_name, category, amount, quantity, order_date, customer_id, region)
SELECT
    ('Product_' || (random() * 100)::INT) AS product_name,
    (ARRAY['Electronics', 'Clothing', 'Food', 'Books', 'Home'])[
        (random() * 4 + 1)::INT
    ] AS category,
    (random() * 1000)::DECIMAL(10, 2) AS amount,
    (random() * 10 + 1)::INT AS quantity,
    (DATE '2025-01-01' + (random() * 500)::INT) AS order_date,
    (random() * 10000)::BIGINT AS customer_id,
    (ARRAY['North', 'South', 'East', 'West'])[
        (random() * 3 + 1)::INT
    ] AS region
FROM generate_series(1, 1000000);

-- Run analytical query (automatically uses DuckDB)
SET duckdb.force_execution = true;

SELECT
    category,
    region,
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value,
    SUM(quantity) AS total_items
FROM orders
WHERE order_date >= '2025-06-01'
GROUP BY category, region, DATE_TRUNC('month', order_date)
ORDER BY total_revenue DESC
LIMIT 20;

Querying Data Lake Parquet Files

One of pg_duckdb’s most powerful features is direct querying of remote data lake files:

-- Configure S3 access
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'your_access_key',
    secret := 'your_secret_key',
    region := 'us-east-1'
);

-- Query Parquet files on S3
SELECT
    r['product_name'] AS product_name,
    AVG(r['rating']) AS average_rating,
    COUNT(*) AS review_count
FROM read_parquet('s3://your-bucket/reviews/*.parquet') r
GROUP BY r['product_name']
HAVING COUNT(*) > 10
ORDER BY average_rating DESC;

Joining PostgreSQL Tables with Data Lake Files

This is pg_duckdb’s killer feature — seamlessly combining local tables with remote data lakes:

-- Join local PostgreSQL orders with remote Parquet reviews
SELECT
    o.category,
    COUNT(DISTINCT o.product_name) AS products_sold,
    SUM(o.amount) AS total_revenue,
    AVG(r.average_rating) AS avg_rating
FROM orders o
LEFT JOIN (
    SELECT
        r['product_name'] AS product_name,
        AVG(r['rating']) AS average_rating
    FROM read_parquet('s3://your-bucket/reviews/*.parquet') r
    GROUP BY r['product_name']
) r ON o.product_name = r.product_name
GROUP BY o.category
ORDER BY total_revenue DESC;

Advanced Usage

Iceberg and Delta Lake Support

pg_duckdb supports modern data lake formats:

-- Query Iceberg tables with time travel
SELECT duckdb.install_extension('iceberg');
SELECT * FROM iceberg_scan(
    's3://warehouse/sales_iceberg',
    version := '2026-01-15-snapshot'
);

-- Query Delta Lake tables
SELECT duckdb.install_extension('delta');
SELECT * FROM delta_scan('s3://lakehouse/user_events');

MotherDuck Cloud Integration

Integrate pg_duckdb with MotherDuck’s cloud analytics platform:

-- Connect to MotherDuck
CALL duckdb.enable_motherduck('your_motherduck_token');

-- Query cloud tables
SELECT region, COUNT(*) FROM my_cloud_analytics_table;

-- Create cloud-synced tables
CREATE TABLE real_time_kpis USING duckdb AS
SELECT
    date_trunc('day', created_at) AS date,
    COUNT(*) AS daily_signups,
    SUM(revenue) AS daily_revenue
FROM user_events
GROUP BY date;

Performance Benchmarks

Test Environment

SpecValue
CPU8 vCPUs (Intel Xeon)
RAM32 GB
PostgreSQL18
pg_duckdbv1.1.1
Data Volume10 million rows

Test Results

Query TypePostgreSQL Nativepg_duckdbSpeedup
Simple Aggregation (COUNT/SUM)3.2s0.3s10.7x
Group By Aggregation5.8s0.5s11.6x
Multi-table JOIN8.4s0.9s9.3x
Window Functions6.1s0.6s10.2x
Date Range Aggregation4.5s0.4s11.3x
Complex CASE WHEN7.2s0.7s10.3x

Average Speedup: 10.3x

Comparing DuckDB Integration Approaches

ApproachUse CaseProsCons
pg_duckdbIn-PostgreSQL analyticsZero migration, real-time, data lake supportPostgreSQL only
DuckDB CLIOffline data scienceMost complete feature setData must be exported
DuckDB Python APIPython data pipelinesFlexible integrationRequires programming
DuckDB WASMBrowser-based analyticsZero installLimited data size
MotherDuckCloud collaborative analyticsTeam collaborationRequires cloud connection

FAQ

Does pg_duckdb affect OLTP queries?

No. pg_duckdb only intercepts queries when duckdb.force_execution = true is set. For transactional queries (simple INSERT/UPDATE/DELETE), PostgreSQL continues to use its own row engine.

Are all PostgreSQL data types supported?

pg_duckdb supports the most common PostgreSQL data types (numeric, text, date, JSON, etc.). For special types (like PostGIS geometry types), refer to the official documentation.

Is it production-ready?

pg_duckdb is already in production use at multiple enterprises. We recommend validating performance in a test environment before rolling out to production.

Monetization Opportunities

  1. Consulting & Training: Offer pg_duckdb performance optimization consulting and team training, charging $500-$2,000 per engagement
  2. Analytics Acceleration SaaS: Build a PG analytics acceleration SaaS layer based on pg_duckdb, charging per query volume or speedup
  3. Cloud Marketplace: Publish pre-configured pg_duckdb images on AWS/GCP/Azure marketplaces
  4. Performance Audit Tool: Develop a PostgreSQL query performance auditing and optimization tool powered by pg_duckdb
  5. Technical Content: Write in-depth pg_duckdb tutorials and publish paid courses on Udemy/Pluralsight

Conclusion

pg_duckdb represents an important technical trend — letting specialized engines do what they do best. PostgreSQL handles OLTP transactions, DuckDB handles OLAP analytical queries, and the two work together seamlessly through pg_duckdb.

If you’re using PostgreSQL and hitting performance bottlenecks with analytical queries, pg_duckdb is likely the fastest and most cost-effective solution. No data migration, no new tools to learn, no architecture changes — just install an extension and get 10x faster analytics.

Try it now: docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:18-v1.1.1