From Raw Data to Data Products: Row-Column Transformation is the Key Step
Anyone who has worked in the data industry knows that the data clients give you is never “ready to use.”
Either it’s a wide table exported from an ERP system—every month in a column, every product in a column, densely packed—or it’s a narrow table from sensor uploads—one reading per row, thousands of records stacked together.
And what your clients (boss, investors, external customers) want is always a specific format report.
PIVOT and UNPIVOT are the bridge between “raw data” and “target report.” Today, I’m not just teaching syntax—I’m showing you how to use these features to build a data product that actually makes money.

1. PIVOT in Practice: Transform Narrow Tables into Boss-Approved Reports
Scenario: E-commerce Weekly Sales Report
Imagine you work in data analytics for an e-commerce company. Your database stores transaction logs (in narrow format):
-- Simulating e-commerce transaction logs
CREATE TABLE orders AS
SELECT * FROM (VALUES
('2026-06-01', 'Shanghai', 'iPhone 15', 7999, 1),
('2026-06-01', 'Beijing', 'MacBook Pro', 14999, 1),
('2026-06-01', 'Shanghai', 'AirPods Pro', 1899, 2),
('2026-06-02', 'Guangzhou', 'iPhone 15', 7999, 1),
('2026-06-02', 'Beijing', 'iPad Air', 4799, 1),
('2026-06-02', 'Shanghai', 'MacBook Pro', 14999, 1),
('2026-06-03', 'Shenzhen', 'iPhone 15', 7999, 2),
('2026-06-03', 'Guangzhou', 'AirPods Pro', 1899, 3),
('2026-06-03', 'Beijing', 'iPhone 15', 7999, 1),
('2026-06-03', 'Shenzhen', 'MacBook Pro', 14999, 1)
) AS t(order_date, city, product, price, quantity);
The boss wants a weekly report format: total sales per city per product, arranged horizontally.
The traditional approach uses CASE WHEN:
SELECT
city,
SUM(CASE WHEN product = 'iPhone 15' THEN price * quantity ELSE 0 END) AS "iPhone 15",
SUM(CASE WHEN product = 'MacBook Pro' THEN price * quantity ELSE 0 END) AS "MacBook Pro",
SUM(CASE WHEN product = 'AirPods Pro' THEN price * quantity ELSE 0 END) AS "AirPods Pro",
SUM(CASE WHEN product = 'iPad Air' THEN price * quantity ELSE 0 END) AS "iPad Air"
FROM orders
GROUP BY city;
When there are many products, this SQL becomes extremely verbose. And every time a new product is added, you need to modify the SQL—which is a disaster for automated reporting.
One-Liner with PIVOT
SELECT * FROM orders
PIVOT ON product
USING SUM(price * quantity)
GROUP BY order_date, city;
Result:
order_date | city | iPhone 15 | MacBook Pro | AirPods Pro | iPad Air
-----------|----------|-----------|-------------|-------------|----------
2026-06-01 | Shanghai | 7999 | | 3798 |
2026-06-01 | Beijing | | 14999 | |
2026-06-02 | Guangzhou | | | | 4799
2026-06-02 | Beijing | | | |
2026-06-02 | Shanghai | | 14999 | |
2026-06-03 | Shenzhen | 15998 | 14999 | |
2026-06-03 | Guangzhou | | | 5697 |
2026-06-03 | Beijing | 7999 | | |
Key advantages:
- Dynamic columns — PIVOT automatically detects and creates columns for all products
- Simplicity — One SQL line replaces dozens of CASE WHEN lines
- Maintainability — Adding new products requires zero code changes
Multiple Aggregations in One Query
In real business scenarios, you often need revenue, order count, and average order value simultaneously:
SELECT * FROM orders
PIVOT ON product
USING SUM(price * quantity) AS total_revenue,
COUNT(*) AS order_count,
AVG(price * quantity) AS avg_order_value
GROUP BY order_date, city;
The result automatically generates three groups of columns: total_revenue, order_count, and avg_order_value, with three data columns per product.
2. UNPIVOT in Practice: Flatten Excel Wide Tables Back to Analyzable Long Tables
Scenario: The Monthly Excel Nightmare
This is a nightmare every data analyst has experienced—colleagues from marketing send an Excel file structured like this:
| Product | Jan | Feb | Mar | Apr | May | Jun |
|---|---|---|---|---|---|---|
| iPhone 15 | 500 | 520 | 480 | 610 | 590 | 650 |
| MacBook Pro | 200 | 210 | 195 | 230 | 225 | 240 |
| AirPods Pro | 1500 | 1600 | 1450 | 1700 | 1650 | 1800 |
You want to do time series analysis, trend forecasting, month-over-month calculations—but there are too many columns to GROUP BY.
Flatten It with UNPIVOT
CREATE TABLE monthly_excel AS
SELECT 'iPhone 15' as product, 500 as "Jan", 520 as "Feb", 480 as "Mar", 610 as "Apr", 590 as "May", 650 as "Jun"
UNION ALL SELECT 'MacBook Pro', 200, 210, 195, 230, 225, 240
UNION ALL SELECT 'AirPods Pro', 1500, 1600, 1450, 1700, 1650, 1800;
SELECT * FROM monthly_excel
UNPIVOT ON "Jan", "Feb", "Mar", "Apr", "May", "Jun"
INTO NAME month VALUE sales;
Result:
product | month | sales
------------|-------|------
iPhone 15 | Jan | 500
iPhone 15 | Feb | 520
iPhone 15 | Mar | 480
iPhone 15 | Apr | 610
iPhone 15 | May | 590
iPhone 15 | Jun | 650
MacBook Pro | Jan | 200
...
Now you can do any time series analysis:
-- Calculate month-over-month growth rate
WITH unpivoted AS (
SELECT * FROM monthly_excel
UNPIVOT ON "Jan", "Feb", "Mar", "Apr", "May", "Jun"
INTO NAME month VALUE sales
)
SELECT
product,
month,
sales,
ROUND(sales - LAG(sales) OVER (PARTITION BY product ORDER BY month), 2) AS mom_change
FROM unpivoted
ORDER BY product, month;
3. Combo Move: PIVOT + UNPIVOT to Build Data Pipelines
In real data products, PIVOT and UNPIVOT are often used together.
Typical Data Pipeline Flow
Raw Data (narrow) → UNPIVOT (clean) → Processing → PIVOT (report) → Deliver to client
Let’s look at a complete example: Multi-channel Sales Data Integration.
Assume you have sales data from three channels, each in a different format:
-- Channel A: Wide table (monthly columns)
CREATE TABLE channel_a AS
SELECT 'East China' as region, 100000 as jan, 120000 as feb, 110000 as mar;
-- Channel B: Narrow table (transaction logs)
CREATE TABLE channel_b AS
SELECT * FROM (VALUES
('East China', '2026-01-15', 'Electronics', 50000),
('East China', '2026-02-20', 'Daily Goods', 30000),
('East China', '2026-03-10', 'Electronics', 45000)
) AS t(region, date, category, amount);
-- Channel C: Mixed table (quarterly)
CREATE TABLE channel_c AS
SELECT 'East China' as region, 350000 as q1, 420000 as q2;
Step 1: UNPIVOT Channel A to unify format
SELECT region, 'jan' as period, jan as amount FROM channel_a
UNION ALL
SELECT region, 'feb', feb FROM channel_a
UNION ALL
SELECT region, 'mar', mar FROM channel_a;
Step 2: UNPIVOT Channel C to monthly granularity
SELECT region,
'q1_jan' as period, q1/3 as amount -- Simplified: divide quarterly evenly
FROM channel_c;
Step 3: Merge and PIVOT to final report
WITH unified AS (
-- Channel A expanded
SELECT region, 'jan' as month, jan as amount FROM channel_a
UNION ALL SELECT region, 'feb', feb FROM channel_a
UNION ALL SELECT region, 'mar', mar FROM channel_a
-- Channel B aggregated by month
SELECT region, STRFTIME(date, '%m') as month, SUM(amount)
FROM channel_b GROUP BY region, STRFTIME(date, '%m')
-- Channel C expanded
SELECT region, 'q1_jan' as month, q1/3 FROM channel_c
UNION ALL SELECT region, 'q1_feb', q1/3 FROM channel_c
UNION ALL SELECT region, 'q1_mar', q1/3 FROM channel_c
)
SELECT * FROM unified
PIVOT ON month
USING SUM(amount)
GROUP BY region;
4. Python Integration: Seamless Connection in Data Products
Most data products have a Python backend. DuckDB’s duckdb.sql() can directly operate on Python variables, making PIVOT/UNPIVOT usage at the application level extremely convenient.
import duckdb
import pandas as pd
from fastapi import FastAPI
from fastapi.responses import JSONResponse
app = FastAPI()
@app.get("/report/sales-pivot")
def get_sales_pivot(city: str = "Shanghai"):
"""Real-time sales pivot report"""
# Read raw transaction data directly from database
conn = duckdb.connect(":memory:")
conn.execute("INSTALL httpfs; LOAD httpfs;")
conn.execute("CALL httpfs_download('https://api.example.com/sales', '/tmp/sales.csv')")
# Auto-read CSV
conn.execute("CREATE TABLE raw_orders AS SELECT * FROM read_csv_auto('/tmp/sales.csv')")
# PIVOT to generate report
result = conn.execute(f"""
SELECT * FROM raw_orders
WHERE city = '{city}'
PIVOT ON product
USING SUM(price * quantity)
GROUP BY order_date
""").fetchdf()
# Convert to JSON for frontend
return result.to_dict(orient='records')
Key points:
read_csv_autoautomatically infers column types—no manual schema definition neededduckdb.sql()can directly reference Python variables withoutCREATE TABLE.fetchdf()converts results directly to DataFrame for further processing- Entire flow runs in memory—no disk I/O required
5. Comparison: PIVOT/UNPIVOT vs Traditional Approaches
| Approach | Code Lines | Dynamic Columns | Performance | Maintainability |
|---|---|---|---|---|
| DuckDB PIVOT | 1 SQL line | ✅ Automatic | ⚡ Vectorized | ⭐⭐⭐⭐⭐ |
| CASE WHEN | N SQL lines | ❌ Manual | 🐢 Row-by-row | ⭐⭐ |
| Python pivot_table | 3-5 lines | ✅ Automatic | 🐢 In-memory | ⭐⭐⭐ |
| Excel PivotTable | Click operations | ✅ Automatic | 🐢 Slow with big data | ⭐⭐⭐ |
| Dynamic SQL concat | Complex | ✅ Manual | 🐢 Compilation overhead | ⭐ |
Why DuckDB PIVOT is the best choice?
- Zero code — One SQL line replaces N lines of CASE WHEN
- Vectorized execution — Uses SIMD instructions under the hood, 10-100x faster than Python
- Streaming processing — Data exceeding memory automatically spills to disk, no OOM
- Seamless integration — Combines with read_csv_auto, httpfs, and other features
6. Monetization Strategies: Turn Row-Column Transformation Skills into Revenue
Strategy 1: Automated Reporting Service (Monthly $400-$1100)
Many SMEs still manually create monthly reports in Excel. You can:
- Build an automated reporting pipeline with DuckDB
- Extract data from their databases/CSVs weekly
- Generate standardized reports using PIVOT
- Push automatically via email or Telegram
Pricing: Basic plan $70/month (1 report), Pro plan $280/month (5 reports + custom fields).
Strategy 2: Data Cleaning SaaS (Monthly $1400+)
Cross-border e-commerce sellers frequently receive sales data from different platforms in varying formats. You can build an online tool:
- Users upload CSV/Excel files
- DuckDB automatically UNPIVOTs wide tables and PIVOTs to desired format
- Outputs standardized CSV for ERP import
Tech stack: FastAPI + DuckDB + Streamlit frontend, deployed on a cheap VPS (monthly cost under $7).
Strategy 3: Data Product Template Marketplace
Package commonly used PIVOT/UNPIVOT templates into reusable SQL scripts and sell them on a platform:
- E-commerce weekly sales report template
- Financial report template
- Social media analytics template
Each template priced at $14-$40, zero marginal cost.
Summary
PIVOT and UNPIVOT are not flashy advanced features—they are core skills that data analysts use every single day.
Master them, and you gain the ability to transform any data format into any report format. In a data-driven business world, that ability is pure gold.
Next steps:
- Open your DuckDB and try PIVOT and UNPIVOT
- Find a repetitive data format conversion in your work and rewrite it with PIVOT/UNPIVOT
- Consider packaging your reporting automation capability into a service
📖 The complete code examples and datasets from this article are compiled into tutorials. Visit duckdblab.org for more DuckDB实战 cases and monetization strategies.