Featured image of post DuckDB PIVOT: Transform Long Tables to Wide Tables in One SQL Statement

DuckDB PIVOT: Transform Long Tables to Wide Tables in One SQL Statement

DuckDB's native PIVOT syntax lets you transform long-format data into wide-format reports with a single SQL statement. No more verbose CASE WHEN gymnastics — pivot your data the modern way.

The Pain: Long-to-Wide Table Transformation Is Every Analyst’s Nightmare

Imagine you have a sales dataset in long format, where each row represents a product’s sales in a given month:

product    | month   | sales
-----------|---------|-------
Laptop     | 2026-01 | 12000
Laptop     | 2026-02 | 15000
Tablet     | 2026-01 | 8000
Tablet     | 2026-02 | 9500
Phone      | 2026-01 | 22000
Phone      | 2026-02 | 25000

Now your manager wants a monthly comparison report, with each month as a separate column:

product    | 2026-01 | 2026-02
-----------|---------|--------
Laptop     | 12000   | 15000
Tablet     | 8000    | 9500
Phone      | 22000   | 25000

In traditional SQL, this requires the CASE WHEN + GROUP BY combo:

SELECT
    product,
    SUM(CASE WHEN month = '2026-01' THEN sales ELSE 0 END) AS "2026-01",
    SUM(CASE WHEN month = '2026-02' THEN sales ELSE 0 END) AS "2026-02"
FROM sales_data
GROUP BY product;

When the number of months grows, this query becomes unwieldy. With 12 months, you need 12 CASE WHEN expressions. With dynamic columns, you might even need to construct SQL strings programmatically.

The good news: DuckDB provides native PIVOT syntax that handles all of this in a single, readable statement.

DuckDB PIVOT Syntax Deep Dive

Basic Usage: One Statement for Data Pivot

SELECT * FROM sales_data
PIVOT
  sales_data ON month
  USING SUM(sales)
  GROUP BY product;

The meaning is intuitive:

  • PIVOT sales_data ON month: Use values from the month column as new column names
  • USING SUM(sales): Apply SUM aggregation on the sales column
  • GROUP BY product: Group results by product

The output is exactly the wide table you need:

product    | 2026-01 | 2026-02
-----------|---------|--------
Laptop     | 12000   | 15000
Tablet     | 8000    | 9500
Phone      | 22000   | 25000

Multiple Aggregations: Compute Several Metrics at Once

PIVOT supports multiple aggregate functions in a single operation:

SELECT * FROM sales_data
PIVOT
  sales_data ON month
  USING SUM(sales) AS total_sales, AVG(sales) AS avg_sales, COUNT(*) AS order_count
  GROUP BY product;

This generates 6 columns (3 months × 2 aggregations):

product    | 2026-01_total_sales | 2026-02_total_sales | 2026-01_avg_sales | ...
-----------|--------------------|--------------------|--------------------|----
Laptop     | 12000              | 15000              | 12000              | ...

Custom Column Names: Make Your Output Elegant

SELECT * FROM sales_data
PIVOT
  sales_data ON month
  USING SUM(sales)
  GROUP BY product
  COLUMNS ('January', 'February');

The COLUMNS clause lets you define custom output column names instead of using raw values directly.

Handling Missing Values: NULL vs Default

When some products have no sales records in certain months, PIVOT fills them with NULL. You can combine with COALESCE to set defaults:

SELECT
    product,
    COALESCE("2026-01", 0) AS jan_sales,
    COALESCE("2026-02", 0) AS feb_sales
FROM sales_data
PIVOT (SUM(sales) ON month GROUP BY product);

Complete Practical Examples

Scenario 1: E-commerce Monthly Sales Matrix

Suppose you run an e-commerce platform and need to generate a cross-category, cross-month revenue matrix for management decision-making.

-- Create sample data
CREATE TABLE monthly_sales AS
SELECT * FROM (VALUES
    ('Electronics', '2026-01', 50000),
    ('Electronics', '2026-02', 62000),
    ('Electronics', '2026-03', 58000),
    ('Clothing',    '2026-01', 30000),
    ('Clothing',    '2026-02', 35000),
    ('Clothing',    '2026-03', 42000),
    ('Food',        '2026-01', 20000),
    ('Food',        '2026-02', 22000),
    ('Food',        '2026-03', 25000)
) AS t(category, month, revenue);

-- Pivot the data
SELECT * FROM monthly_sales
PIVOT
  monthly_sales ON month
  USING SUM(revenue) AS total_revenue
  GROUP BY category;

Output:

category  | 2026-01_total_revenue | 2026-02_total_revenue | 2026-03_total_revenue
----------|----------------------|----------------------|----------------------
Electronics| 50000               | 62000                | 58000
Clothing   | 30000               | 35000                | 42000
Food       | 20000               | 22000                | 25000

Scenario 2: Multi-dimensional Cross Analysis

PIVOT can be combined with UNPIVOT for bidirectional data transformation:

-- Pivot then unpivot for data cleaning
WITH pivoted AS (
    SELECT * FROM monthly_sales
    PIVOT (SUM(revenue) ON month GROUP BY category)
),
unpivoted AS (
    SELECT
        category,
        month,
        revenue
    FROM pivoted
    UNPIVOT (revenue ON month FROM total_revenue)
)
SELECT * FROM unpivoted
ORDER BY category, month;

Scenario 3: Production-ready Dynamic Column Handling

In production, months are dynamically changing. DuckDB supports querying metadata to dynamically construct column definitions:

-- Get all unique months as column definitions
SELECT DISTINCT month FROM monthly_sales ORDER BY month;

-- Results used to dynamically generate SQL or handle in application layer
-- 2026-01, 2026-02, 2026-03, ...

PIVOT vs Traditional Methods Comparison

FeaturePIVOT SyntaxCASE WHEN + GROUP BYApplication-layer
Code Lines1 SQL statementN CASE WHEN clausesDozens of Python lines
Readability⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Dynamic Columns✅ Auto-detected❌ Requires SQL concatenation✅ Flexible
Multi-aggregation✅ Built-in✅ Manual addition needed✅ Flexible
NULL Handling⚠️ Needs COALESCE⚠️ Needs COALESCE✅ Fully flexible
Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐
Learning CurveLowMediumHigh

Performance Advantages

For large datasets, PIVOT outperforms hand-written CASE WHEN approaches. The reasons:

  1. Vectorized Execution: DuckDB’s PIVOT leverages columnar storage for vectorized aggregation in memory
  2. Single Scan: Regardless of how many aggregate functions, PIVOT scans source data only once
  3. Memory Optimization: Intermediate results are stored directly in wide-table format, avoiding subquery overhead

Benchmarks show PIVOT is 2-3x faster than equivalent CASE WHEN solutions on million-row datasets.

Monetization Strategies

1. Productize Data Services

PIVOT is ideal for automated reporting services. Integrate PIVOT into a Streamlit or FastAPI backend to offer clients one-click sales matrix and user behavior matrix generation APIs. Charge monthly subscriptions at 500-2000 RMB/month.

2. Enterprise Training & Consulting

Mastering advanced PIVOT techniques (multi-aggregation, dynamic columns, combination with UNPIVOT) positions you for data warehouse modeling consulting. Help enterprises quickly build wide tables suitable for BI tools from raw long-format log data. Single projects charge 5,000-30,000 RMB.

3. Template-based Data Products

Leverage PIVOT + DuckDB’s embedded nature to rapidly build lightweight analytics SaaS:

  • E-commerce sales dashboards (monthly/quarterly/yearly cross-analysis)
  • Marketing ROI matrices (channel × product line × time period)
  • HR dashboards (department × metric × time)

Package these as white-label solutions for SMBs at 10,000-50,000 RMB per license.

4. Content Monetization

Create short videos or tutorials on PIVOT techniques and publish them on YouTube, Medium, or Twitter/X. This type of “SQL tips” content spreads rapidly in data communities. Monetize through ad revenue, paid courses, or funneling to premium communities.


DuckDB PIVOT Data Transformation Flow

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy