Featured image of post DuckDB PIVOT/UNPIVOT in Action: Turn Messy Data into Profitable Data Products

DuckDB PIVOT/UNPIVOT in Action: Turn Messy Data into Profitable Data Products

Master DuckDB's PIVOT and UNPIVOT to transform raw data into beautiful reports. Learn how to build profitable data products using row-column transformations, with real-world monetization strategies.

From Raw Data to Data Products: Row-Column Transformation is the Key Step

Anyone who has worked in the data industry knows that the data clients give you is never “ready to use.”

Either it’s a wide table exported from an ERP system—every month in a column, every product in a column, densely packed—or it’s a narrow table from sensor uploads—one reading per row, thousands of records stacked together.

And what your clients (boss, investors, external customers) want is always a specific format report.

PIVOT and UNPIVOT are the bridge between “raw data” and “target report.” Today, I’m not just teaching syntax—I’m showing you how to use these features to build a data product that actually makes money.

PIVOT/UNPIVOT Data Transformation Flow


1. PIVOT in Practice: Transform Narrow Tables into Boss-Approved Reports

Scenario: E-commerce Weekly Sales Report

Imagine you work in data analytics for an e-commerce company. Your database stores transaction logs (in narrow format):

-- Simulating e-commerce transaction logs
CREATE TABLE orders AS
SELECT * FROM (VALUES
    ('2026-06-01', 'Shanghai', 'iPhone 15', 7999, 1),
    ('2026-06-01', 'Beijing', 'MacBook Pro', 14999, 1),
    ('2026-06-01', 'Shanghai', 'AirPods Pro', 1899, 2),
    ('2026-06-02', 'Guangzhou', 'iPhone 15', 7999, 1),
    ('2026-06-02', 'Beijing', 'iPad Air', 4799, 1),
    ('2026-06-02', 'Shanghai', 'MacBook Pro', 14999, 1),
    ('2026-06-03', 'Shenzhen', 'iPhone 15', 7999, 2),
    ('2026-06-03', 'Guangzhou', 'AirPods Pro', 1899, 3),
    ('2026-06-03', 'Beijing', 'iPhone 15', 7999, 1),
    ('2026-06-03', 'Shenzhen', 'MacBook Pro', 14999, 1)
) AS t(order_date, city, product, price, quantity);

The boss wants a weekly report format: total sales per city per product, arranged horizontally.

The traditional approach uses CASE WHEN:

SELECT
    city,
    SUM(CASE WHEN product = 'iPhone 15' THEN price * quantity ELSE 0 END) AS "iPhone 15",
    SUM(CASE WHEN product = 'MacBook Pro' THEN price * quantity ELSE 0 END) AS "MacBook Pro",
    SUM(CASE WHEN product = 'AirPods Pro' THEN price * quantity ELSE 0 END) AS "AirPods Pro",
    SUM(CASE WHEN product = 'iPad Air' THEN price * quantity ELSE 0 END) AS "iPad Air"
FROM orders
GROUP BY city;

When there are many products, this SQL becomes extremely verbose. And every time a new product is added, you need to modify the SQL—which is a disaster for automated reporting.

One-Liner with PIVOT

SELECT * FROM orders
PIVOT ON product
USING SUM(price * quantity)
GROUP BY order_date, city;

Result:

order_date | city     | iPhone 15 | MacBook Pro | AirPods Pro | iPad Air
-----------|----------|-----------|-------------|-------------|----------
2026-06-01 | Shanghai  | 7999      |             | 3798        |
2026-06-01 | Beijing   |           | 14999       |             |
2026-06-02 | Guangzhou |           |             |             | 4799
2026-06-02 | Beijing   |           |             |             |
2026-06-02 | Shanghai  |           | 14999       |             |
2026-06-03 | Shenzhen  | 15998     | 14999       |             |
2026-06-03 | Guangzhou |           |             | 5697        |
2026-06-03 | Beijing   | 7999      |             |             |

Key advantages:

  1. Dynamic columns — PIVOT automatically detects and creates columns for all products
  2. Simplicity — One SQL line replaces dozens of CASE WHEN lines
  3. Maintainability — Adding new products requires zero code changes

Multiple Aggregations in One Query

In real business scenarios, you often need revenue, order count, and average order value simultaneously:

SELECT * FROM orders
PIVOT ON product
USING SUM(price * quantity) AS total_revenue,
      COUNT(*) AS order_count,
      AVG(price * quantity) AS avg_order_value
GROUP BY order_date, city;

The result automatically generates three groups of columns: total_revenue, order_count, and avg_order_value, with three data columns per product.


2. UNPIVOT in Practice: Flatten Excel Wide Tables Back to Analyzable Long Tables

Scenario: The Monthly Excel Nightmare

This is a nightmare every data analyst has experienced—colleagues from marketing send an Excel file structured like this:

ProductJanFebMarAprMayJun
iPhone 15500520480610590650
MacBook Pro200210195230225240
AirPods Pro150016001450170016501800

You want to do time series analysis, trend forecasting, month-over-month calculations—but there are too many columns to GROUP BY.

Flatten It with UNPIVOT

CREATE TABLE monthly_excel AS
SELECT 'iPhone 15' as product, 500 as "Jan", 520 as "Feb", 480 as "Mar", 610 as "Apr", 590 as "May", 650 as "Jun"
UNION ALL SELECT 'MacBook Pro', 200, 210, 195, 230, 225, 240
UNION ALL SELECT 'AirPods Pro', 1500, 1600, 1450, 1700, 1650, 1800;

SELECT * FROM monthly_excel
UNPIVOT ON "Jan", "Feb", "Mar", "Apr", "May", "Jun"
INTO NAME month VALUE sales;

Result:

product     | month | sales
------------|-------|------
iPhone 15   | Jan   | 500
iPhone 15   | Feb   | 520
iPhone 15   | Mar   | 480
iPhone 15   | Apr   | 610
iPhone 15   | May   | 590
iPhone 15   | Jun   | 650
MacBook Pro | Jan   | 200
...

Now you can do any time series analysis:

-- Calculate month-over-month growth rate
WITH unpivoted AS (
    SELECT * FROM monthly_excel
    UNPIVOT ON "Jan", "Feb", "Mar", "Apr", "May", "Jun"
    INTO NAME month VALUE sales
)
SELECT
    product,
    month,
    sales,
    ROUND(sales - LAG(sales) OVER (PARTITION BY product ORDER BY month), 2) AS mom_change
FROM unpivoted
ORDER BY product, month;

3. Combo Move: PIVOT + UNPIVOT to Build Data Pipelines

In real data products, PIVOT and UNPIVOT are often used together.

Typical Data Pipeline Flow

Raw Data (narrow) → UNPIVOT (clean) → Processing → PIVOT (report) → Deliver to client

Let’s look at a complete example: Multi-channel Sales Data Integration.

Assume you have sales data from three channels, each in a different format:

-- Channel A: Wide table (monthly columns)
CREATE TABLE channel_a AS
SELECT 'East China' as region, 100000 as jan, 120000 as feb, 110000 as mar;

-- Channel B: Narrow table (transaction logs)
CREATE TABLE channel_b AS
SELECT * FROM (VALUES
    ('East China', '2026-01-15', 'Electronics', 50000),
    ('East China', '2026-02-20', 'Daily Goods', 30000),
    ('East China', '2026-03-10', 'Electronics', 45000)
) AS t(region, date, category, amount);

-- Channel C: Mixed table (quarterly)
CREATE TABLE channel_c AS
SELECT 'East China' as region, 350000 as q1, 420000 as q2;

Step 1: UNPIVOT Channel A to unify format

SELECT region, 'jan' as period, jan as amount FROM channel_a
UNION ALL
SELECT region, 'feb', feb FROM channel_a
UNION ALL
SELECT region, 'mar', mar FROM channel_a;

Step 2: UNPIVOT Channel C to monthly granularity

SELECT region,
    'q1_jan' as period, q1/3 as amount  -- Simplified: divide quarterly evenly
FROM channel_c;

Step 3: Merge and PIVOT to final report

WITH unified AS (
    -- Channel A expanded
    SELECT region, 'jan' as month, jan as amount FROM channel_a
    UNION ALL SELECT region, 'feb', feb FROM channel_a
    UNION ALL SELECT region, 'mar', mar FROM channel_a
    -- Channel B aggregated by month
    SELECT region, STRFTIME(date, '%m') as month, SUM(amount)
    FROM channel_b GROUP BY region, STRFTIME(date, '%m')
    -- Channel C expanded
    SELECT region, 'q1_jan' as month, q1/3 FROM channel_c
    UNION ALL SELECT region, 'q1_feb', q1/3 FROM channel_c
    UNION ALL SELECT region, 'q1_mar', q1/3 FROM channel_c
)
SELECT * FROM unified
PIVOT ON month
USING SUM(amount)
GROUP BY region;

4. Python Integration: Seamless Connection in Data Products

Most data products have a Python backend. DuckDB’s duckdb.sql() can directly operate on Python variables, making PIVOT/UNPIVOT usage at the application level extremely convenient.

import duckdb
import pandas as pd
from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()

@app.get("/report/sales-pivot")
def get_sales_pivot(city: str = "Shanghai"):
    """Real-time sales pivot report"""
    
    # Read raw transaction data directly from database
    conn = duckdb.connect(":memory:")
    conn.execute("INSTALL httpfs; LOAD httpfs;")
    conn.execute("CALL httpfs_download('https://api.example.com/sales', '/tmp/sales.csv')")
    
    # Auto-read CSV
    conn.execute("CREATE TABLE raw_orders AS SELECT * FROM read_csv_auto('/tmp/sales.csv')")
    
    # PIVOT to generate report
    result = conn.execute(f"""
        SELECT * FROM raw_orders
        WHERE city = '{city}'
        PIVOT ON product
        USING SUM(price * quantity)
        GROUP BY order_date
    """).fetchdf()
    
    # Convert to JSON for frontend
    return result.to_dict(orient='records')

Key points:

  1. read_csv_auto automatically infers column types—no manual schema definition needed
  2. duckdb.sql() can directly reference Python variables without CREATE TABLE
  3. .fetchdf() converts results directly to DataFrame for further processing
  4. Entire flow runs in memory—no disk I/O required

5. Comparison: PIVOT/UNPIVOT vs Traditional Approaches

ApproachCode LinesDynamic ColumnsPerformanceMaintainability
DuckDB PIVOT1 SQL line✅ Automatic⚡ Vectorized⭐⭐⭐⭐⭐
CASE WHENN SQL lines❌ Manual🐢 Row-by-row⭐⭐
Python pivot_table3-5 lines✅ Automatic🐢 In-memory⭐⭐⭐
Excel PivotTableClick operations✅ Automatic🐢 Slow with big data⭐⭐⭐
Dynamic SQL concatComplex✅ Manual🐢 Compilation overhead

Why DuckDB PIVOT is the best choice?

  1. Zero code — One SQL line replaces N lines of CASE WHEN
  2. Vectorized execution — Uses SIMD instructions under the hood, 10-100x faster than Python
  3. Streaming processing — Data exceeding memory automatically spills to disk, no OOM
  4. Seamless integration — Combines with read_csv_auto, httpfs, and other features

6. Monetization Strategies: Turn Row-Column Transformation Skills into Revenue

Strategy 1: Automated Reporting Service (Monthly $400-$1100)

Many SMEs still manually create monthly reports in Excel. You can:

  1. Build an automated reporting pipeline with DuckDB
  2. Extract data from their databases/CSVs weekly
  3. Generate standardized reports using PIVOT
  4. Push automatically via email or Telegram

Pricing: Basic plan $70/month (1 report), Pro plan $280/month (5 reports + custom fields).

Strategy 2: Data Cleaning SaaS (Monthly $1400+)

Cross-border e-commerce sellers frequently receive sales data from different platforms in varying formats. You can build an online tool:

  1. Users upload CSV/Excel files
  2. DuckDB automatically UNPIVOTs wide tables and PIVOTs to desired format
  3. Outputs standardized CSV for ERP import

Tech stack: FastAPI + DuckDB + Streamlit frontend, deployed on a cheap VPS (monthly cost under $7).

Strategy 3: Data Product Template Marketplace

Package commonly used PIVOT/UNPIVOT templates into reusable SQL scripts and sell them on a platform:

  • E-commerce weekly sales report template
  • Financial report template
  • Social media analytics template

Each template priced at $14-$40, zero marginal cost.


Summary

PIVOT and UNPIVOT are not flashy advanced features—they are core skills that data analysts use every single day.

Master them, and you gain the ability to transform any data format into any report format. In a data-driven business world, that ability is pure gold.

Next steps:

  1. Open your DuckDB and try PIVOT and UNPIVOT
  2. Find a repetitive data format conversion in your work and rewrite it with PIVOT/UNPIVOT
  3. Consider packaging your reporting automation capability into a service

📖 The complete code examples and datasets from this article are compiled into tutorials. Visit duckdblab.org for more DuckDB实战 cases and monetization strategies.

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.