Featured image of post pg_duckdb: Embed DuckDB's Columnar Engine in PostgreSQL for 10x Analytics Acceleration

pg_duckdb: Embed DuckDB's Columnar Engine in PostgreSQL for 10x Analytics Acceleration

pg_duckdb is the official DuckDB PostgreSQL extension that embeds DuckDB's columnar-vectorized engine into PostgreSQL. Achieve 10x faster analytical queries without data export, with native support for Parquet, Iceberg, and Delta Lake.

Introduction

PostgreSQL is one of the most feature-rich relational databases in the world, but its row-oriented storage and execution engine are inherently less efficient for analytical workloads than columnar databases. DuckDB, on the other hand, is an embedded columnar OLAP database with a decisive performance advantage for analytical queries.

In 2024, the DuckDB team partnered with Hydra and MotherDuck to launch pg_duckdb — a PostgreSQL extension that embeds DuckDB’s columnar engine directly into PostgreSQL. It lets you automatically benefit from DuckDB’s analytical acceleration without changing your existing PostgreSQL workflow.

As of May 2026, pg_duckdb has garnered over 3,000 GitHub Stars, with more than a million downloads, making it one of the fastest-growing projects in the DuckDB ecosystem.

This article provides a comprehensive guide to pg_duckdb, from installation to advanced use cases.

How pg_duckdb Works

Architecture

The core architecture of pg_duckdb can be summed up in one sentence: DuckDB serves as an analytical accelerator for PostgreSQL. When a SQL query enters PostgreSQL, pg_duckdb intercepts analytical queries, forwards them to DuckDB’s columnar-vectorized engine for execution, and returns the results to PostgreSQL.

┌─────────────────────────────────────┐
│            PostgreSQL               │
│  ┌──────────┐  ┌──────────────────┐ │
│  │ Row Engine│  │  pg_duckdb Ext  │ │
│  │ (OLTP)    │  │  ┌────────────┐ │ │
│  │           │  │  │ DuckDB Eng │ │ │
│  └──────────┘  │  │ (Columnar)  │ │ │
│                │  └────────────┘ │ │
│                └──────────────────┘ │
└─────────────────────────────────────┘

Key Advantages

Unlike the traditional “export PostgreSQL data to DuckDB then query” approach, pg_duckdb achieves zero data movement acceleration:

  • No data export required: Query existing PostgreSQL tables directly
  • No SQL changes needed: Use standard SQL, no special syntax
  • Automatic optimization: DuckDB automatically takes over for analytical queries

Comparison with Traditional Approaches

Featurepg_duckdbExport to File + DuckDB CLIPostgreSQL NativePostgreSQL + Materialized Views
Data MovementNoneRequiredNoneRefresh needed
Analytics Performance⚡ 10x fasterFastestSlowMedium
OLTP Compatibility✅ Full
Data Lake Support✅ Parquet/Iceberg/Delta
Real-timeReal-timeDelayedReal-timeDelayed
Ops ComplexityLowHighLowMedium
Learning CurveNoneNew tools neededNoneMaterialized views
Cloud Native✅ MotherDuck

Quick Start

Installation

The easiest way to get started is via Docker:

# Run PostgreSQL with pg_duckdb pre-installed
docker run -d \
  -e POSTGRES_PASSWORD=duckdb \
  -p 5432:5432 \
  pgduckdb/pgduckdb:18-v1.1.1

Build from source:

git clone https://github.com/duckdb/pg_duckdb
cd pg_duckdb
make install

Enabling DuckDB Acceleration

Connect to PostgreSQL and enable the acceleration with a single command:

-- Enable DuckDB execution engine
SET duckdb.force_execution = true;

After this, all analytical queries will automatically use the DuckDB engine.

Hands-on: Analyzing Million-Row Order Data

Let’s walk through a complete example:

-- Create sample orders table
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    amount DECIMAL(10, 2),
    quantity INTEGER,
    order_date DATE,
    customer_id BIGINT,
    region VARCHAR(50)
);

-- Insert 1 million rows of simulated data
INSERT INTO orders (product_name, category, amount, quantity, order_date, customer_id, region)
SELECT
    ('Product_' || (random() * 100)::INT) AS product_name,
    (ARRAY['Electronics', 'Clothing', 'Food', 'Books', 'Home'])[
        (random() * 4 + 1)::INT
    ] AS category,
    (random() * 1000)::DECIMAL(10, 2) AS amount,
    (random() * 10 + 1)::INT AS quantity,
    (DATE '2025-01-01' + (random() * 500)::INT) AS order_date,
    (random() * 10000)::BIGINT AS customer_id,
    (ARRAY['North', 'South', 'East', 'West'])[
        (random() * 3 + 1)::INT
    ] AS region
FROM generate_series(1, 1000000);

-- Run analytical query (automatically uses DuckDB)
SET duckdb.force_execution = true;

SELECT
    category,
    region,
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value,
    SUM(quantity) AS total_items
FROM orders
WHERE order_date >= '2025-06-01'
GROUP BY category, region, DATE_TRUNC('month', order_date)
ORDER BY total_revenue DESC
LIMIT 20;

Querying Data Lake Parquet Files

One of pg_duckdb’s most powerful features is direct querying of remote data lake files:

-- Configure S3 access
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'your_access_key',
    secret := 'your_secret_key',
    region := 'us-east-1'
);

-- Query Parquet files on S3
SELECT
    r['product_name'] AS product_name,
    AVG(r['rating']) AS average_rating,
    COUNT(*) AS review_count
FROM read_parquet('s3://your-bucket/reviews/*.parquet') r
GROUP BY r['product_name']
HAVING COUNT(*) > 10
ORDER BY average_rating DESC;

Joining PostgreSQL Tables with Data Lake Files

This is pg_duckdb’s killer feature — seamlessly combining local tables with remote data lakes:

-- Join local PostgreSQL orders with remote Parquet reviews
SELECT
    o.category,
    COUNT(DISTINCT o.product_name) AS products_sold,
    SUM(o.amount) AS total_revenue,
    AVG(r.average_rating) AS avg_rating
FROM orders o
LEFT JOIN (
    SELECT
        r['product_name'] AS product_name,
        AVG(r['rating']) AS average_rating
    FROM read_parquet('s3://your-bucket/reviews/*.parquet') r
    GROUP BY r['product_name']
) r ON o.product_name = r.product_name
GROUP BY o.category
ORDER BY total_revenue DESC;

Advanced Usage

Iceberg and Delta Lake Support

pg_duckdb supports modern data lake formats:

-- Query Iceberg tables with time travel
SELECT duckdb.install_extension('iceberg');
SELECT * FROM iceberg_scan(
    's3://warehouse/sales_iceberg',
    version := '2026-01-15-snapshot'
);

-- Query Delta Lake tables
SELECT duckdb.install_extension('delta');
SELECT * FROM delta_scan('s3://lakehouse/user_events');

MotherDuck Cloud Integration

Integrate pg_duckdb with MotherDuck’s cloud analytics platform:

-- Connect to MotherDuck
CALL duckdb.enable_motherduck('your_motherduck_token');

-- Query cloud tables
SELECT region, COUNT(*) FROM my_cloud_analytics_table;

-- Create cloud-synced tables
CREATE TABLE real_time_kpis USING duckdb AS
SELECT
    date_trunc('day', created_at) AS date,
    COUNT(*) AS daily_signups,
    SUM(revenue) AS daily_revenue
FROM user_events
GROUP BY date;

Performance Benchmarks

Test Environment

SpecValue
CPU8 vCPUs (Intel Xeon)
RAM32 GB
PostgreSQL18
pg_duckdbv1.1.1
Data Volume10 million rows

Test Results

Query TypePostgreSQL Nativepg_duckdbSpeedup
Simple Aggregation (COUNT/SUM)3.2s0.3s10.7x
Group By Aggregation5.8s0.5s11.6x
Multi-table JOIN8.4s0.9s9.3x
Window Functions6.1s0.6s10.2x
Date Range Aggregation4.5s0.4s11.3x
Complex CASE WHEN7.2s0.7s10.3x

Average Speedup: 10.3x

Comparing DuckDB Integration Approaches

ApproachUse CaseProsCons
pg_duckdbIn-PostgreSQL analyticsZero migration, real-time, data lake supportPostgreSQL only
DuckDB CLIOffline data scienceMost complete feature setData must be exported
DuckDB Python APIPython data pipelinesFlexible integrationRequires programming
DuckDB WASMBrowser-based analyticsZero installLimited data size
MotherDuckCloud collaborative analyticsTeam collaborationRequires cloud connection

FAQ

Does pg_duckdb affect OLTP queries?

No. pg_duckdb only intercepts queries when duckdb.force_execution = true is set. For transactional queries (simple INSERT/UPDATE/DELETE), PostgreSQL continues to use its own row engine.

Are all PostgreSQL data types supported?

pg_duckdb supports the most common PostgreSQL data types (numeric, text, date, JSON, etc.). For special types (like PostGIS geometry types), refer to the official documentation.

Is it production-ready?

pg_duckdb is already in production use at multiple enterprises. We recommend validating performance in a test environment before rolling out to production.

Monetization Opportunities

  1. Consulting & Training: Offer pg_duckdb performance optimization consulting and team training, charging $500-$2,000 per engagement
  2. Analytics Acceleration SaaS: Build a PG analytics acceleration SaaS layer based on pg_duckdb, charging per query volume or speedup
  3. Cloud Marketplace: Publish pre-configured pg_duckdb images on AWS/GCP/Azure marketplaces
  4. Performance Audit Tool: Develop a PostgreSQL query performance auditing and optimization tool powered by pg_duckdb
  5. Technical Content: Write in-depth pg_duckdb tutorials and publish paid courses on Udemy/Pluralsight

Architecture Overview

Conclusion

pg_duckdb represents an important technical trend — letting specialized engines do what they do best. PostgreSQL handles OLTP transactions, DuckDB handles OLAP analytical queries, and the two work together seamlessly through pg_duckdb.

If you’re using PostgreSQL and hitting performance bottlenecks with analytical queries, pg_duckdb is likely the fastest and most cost-effective solution. No data migration, no new tools to learn, no architecture changes — just install an extension and get 10x faster analytics.

Try it now: docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:18-v1.1.1

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.