
One: Window Functions — The Underrated Business Analysis Power Tool
Many data analysts still think of window functions as “interview questions” or “SQL flexing.” But if you’re using DuckDB to build data products, automated reports, or commercial analytics tools, window functions are the last mile from SQL to profit.
Why? Because window functions let you accomplish all common business analysis scenarios without breaking the original data granularity:
- Customer retention rate calculation (cohort analysis)
- Rolling revenue/loss statistics
- In-group ranking and Top-N filtering
- Row-to-row comparison (month-over-month, year-over-year)
- Cumulative sums and moving averages
These aren’t academic exercises — each one maps to a billable commercial analytics service.
Two: Use Case 1 — Customer Retention Cohort Analysis
Imagine you run a SaaS product and need to show clients monthly retention rates — this is a core analytics module you can charge projects for.
Traditional approach: multiple table JOINs or subqueries, long and slow code.
DuckDB window function solution:
-- Simulate user registration and activity data
CREATE TABLE user_activity AS
WITH dates AS (
SELECT unnest(generate_series(DATE '2025-01-01', DATE '2025-12-31', INTERVAL 1 DAY)) AS day
),
users AS (
SELECT unnest(generate_series(1, 5000)) AS user_id
),
user_cohorts AS (
SELECT
u.user_id,
MIN(da.day) AS first_active_day,
date_trunc('month', MIN(da.day)) AS cohort_month
FROM users u
CROSS JOIN dates da
WHERE random() < 0.3 -- 30% chance of being active on any given day
GROUP BY u.user_id
),
daily_activity AS (
SELECT
uc.user_id,
uc.cohort_month,
date_trunc('month', da.day) AS activity_month,
COUNT(*) AS active_days
FROM user_cohorts uc
CROSS JOIN dates da
WHERE random() < 0.15 -- 15% monthly active probability
GROUP BY uc.user_id, date_trunc('month', da.day)
)
SELECT * FROM daily_activity;
-- Core: Calculate retained users per cohort
WITH cohort_counts AS (
SELECT
cohort_month,
COUNT(DISTINCT user_id) AS cohort_size
FROM user_activity
GROUP BY cohort_month
),
monthly_retention AS (
SELECT
cohort_month,
activity_month,
datediff('month', cohort_month, activity_month) AS months_since_join,
COUNT(DISTINCT user_id) AS retained_users
FROM user_activity
GROUP BY cohort_month, activity_month, months_since_join
)
SELECT
mr.cohort_month,
mr.months_since_join,
mr.retained_users,
cc.cohort_size,
ROUND(mr.retained_users * 100.0 / cc.cohort_size, 2) AS retention_rate_pct,
-- Key window function: calculate month-over-month change
LAG(mr.retained_users) OVER (
PARTITION BY mr.cohort_month
ORDER BY mr.months_since_join
) AS prev_month_retained,
CASE
WHEN LAG(mr.retained_users) OVER (
PARTITION BY mr.cohort_month
ORDER BY mr.months_since_join
) IS NOT NULL
THEN ROUND(
(mr.retained_users - LAG(mr.retained_users) OVER (
PARTITION BY mr.cohort_month
ORDER BY mr.months_since_join
)) * 100.0 / LAG(mr.retained_users) OVER (
PARTITION BY mr.cohort_month
ORDER BY mr.months_since_join
), 2
)
ELSE NULL
END AS mom_change_pct
FROM monthly_retention mr
JOIN cohort_counts cc ON mr.cohort_month = cc.cohort_month
WHERE mr.months_since_join <= 6
ORDER BY mr.cohort_month, mr.months_since_join;
Business Value: This retention analysis report can be sold directly to SaaS companies as the core deliverable of a monthly paid service. One template, swap the data, repeat. Marginal cost approaches zero.
Three: Use Case 2 — E-commerce GMV Rolling Trends + Anomaly Detection
The metric e-commerce clients care about most is GMV trends. With window functions, you can accomplish trend analysis and anomaly flagging in a single SQL query.
-- Simulate e-commerce order data
CREATE TABLE orders AS
SELECT
generate_series(1, 10000) AS order_id,
date '2025-01-01' + (random() * 365)::INT AS order_date,
('electronics'::VARCHAR, 'clothing'::VARCHAR, 'food'::VARCHAR, 'home'::VARCHAR)[floor(random()*4)+1] AS category,
ROUND((random() * 500 + 10)::NUMERIC, 2) AS amount
FROM generate_series(1, 10000);
-- Daily/weekly GMV + moving average + anomaly detection
WITH daily_gmv AS (
SELECT
order_date::DATE AS day,
SUM(amount) AS daily_gmv,
COUNT(*) AS order_count
FROM orders
GROUP BY 1
),
enriched AS (
SELECT
day,
daily_gmv,
order_count,
-- 7-day moving average
AVG(daily_gmv) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7day,
-- 30-day moving average
AVG(daily_gmv) OVER (
ORDER BY day
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS ma_30day,
-- Yesterday comparison (MoM)
LAG(daily_gmv, 1) OVER (ORDER BY day) AS prev_day_gmv,
-- Same day last week comparison
LAG(daily_gmv, 7) OVER (ORDER BY day) AS same_week_prev_gmv,
-- Standard deviation (for anomaly detection)
STDDEV(daily_gmv) OVER (
ORDER BY day
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS rolling_stddev,
-- Ranking
RANK() OVER (
PARTITION BY EXTRACT(MONTH FROM day)
ORDER BY daily_gmv DESC
) AS day_rank_in_month,
-- Cumulative GMV (year-to-date)
SUM(daily_gmv) OVER (
ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS ytd_gmv
FROM daily_gmv
)
SELECT
day,
ROUND(daily_gmv, 2) AS daily_gmv,
ROUND(ma_7day, 2) AS ma_7day,
ROUND(ma_30day, 2) AS ma_30day,
ROUND(prev_day_gmv, 2) AS prev_day_gmv,
ROUND(same_week_prev_gmv, 2) AS same_week_prev_gmv,
ROUND(ytd_gmv, 2) AS ytd_gmv,
-- Anomaly flag: exceeds 2 standard deviations
CASE
WHEN ABS(daily_gmv - ma_30day) > 2 * COALESCE(rolling_stddev, daily_gmv * 0.1)
THEN '⚠️ Anomaly'
ELSE '✓ Normal'
END AS anomaly_flag
FROM enriched
ORDER BY day
LIMIT 30;
Business Value: This query template can be packaged into an “E-commerce Health Monitoring SaaS,” charged monthly. Each e-commerce client only needs to plug in their own orders table — the query logic is fully reusable.
Four: Use Case 3 — Employee Performance Rankings + Bonus Allocation
This is a scenario many SMBs would pay for: automated performance evaluation.
-- Simulate sales team performance data
CREATE TABLE sales_performance AS
SELECT
generate_series(1, 200) AS emp_id,
('Zhang San'::VARCHAR, 'Li Si'::VARCHAR, 'Wang Wu'::VARCHAR, 'Zhao Liu'::VARCHAR,
'Qian Qi'::VARCHAR, 'Sun Ba'::VARCHAR, 'Zhou Jiu'::VARCHAR, 'Wu Shi'::VARCHAR)
[floor(random()*8)+1] AS name,
('East China'::VARCHAR, 'South China'::VARCHAR, 'North China'::VARCHAR, 'West'::VARCHAR)[floor(random()*4)+1] AS region,
date '2025-01-01' + (random()*365)::INT AS month_date,
ROUND((random() * 100000 + 5000)::NUMERIC, 2) AS revenue,
floor(random() * 50 + 10)::INT AS deals_closed
FROM generate_series(1, 200);
-- Performance ranking + bonus calculation
WITH monthly_stats AS (
SELECT
name,
region,
month_date,
SUM(revenue) AS total_revenue,
SUM(deals_closed) AS total_deals,
-- Regional ranking
RANK() OVER (
PARTITION BY region, month_date
ORDER BY SUM(revenue) DESC
) AS region_rank,
-- Company-wide ranking
RANK() OVER (
PARTITION BY month_date
ORDER BY SUM(revenue) DESC
) AS company_rank,
-- Gap percentage vs regional average
ROUND(
(SUM(revenue) - AVG(SUM(revenue)) OVER (PARTITION BY region, month_date))
* 100.0 / AVG(SUM(revenue)) OVER (PARTITION BY region, month_date),
2
) AS vs_region_avg_pct,
-- Month-over-month growth
LAG(SUM(revenue)) OVER (
PARTITION BY name, region
ORDER BY month_date
) AS prev_month_revenue
FROM sales_performance
GROUP BY name, region, month_date
)
SELECT
name,
region,
month_date,
ROUND(total_revenue, 2) AS total_revenue,
total_deals,
region_rank,
company_rank,
ROUND(vs_region_avg_pct, 2) AS vs_region_avg_pct,
ROUND(prev_month_revenue, 2) AS prev_month_revenue,
CASE
WHEN prev_month_revenue IS NOT NULL
THEN ROUND((total_revenue - prev_month_revenue) * 100.0 / prev_month_revenue, 2)
ELSE NULL
END AS mom_growth_pct,
-- Bonus calculation: higher rank = higher bonus
CASE
WHEN company_rank = 1 THEN total_revenue * 0.15
WHEN company_rank <= 3 THEN total_revenue * 0.10
WHEN company_rank <= 10 THEN total_revenue * 0.05
ELSE total_revenue * 0.02
END AS bonus_amount
FROM monthly_stats
ORDER BY month_date, company_rank
LIMIT 50;
Business Value: This performance analysis system can be packaged as an “SMB HR Data Product,” sold to small and medium companies needing automated performance evaluation. One piece of code, N customers.
Five: Window Functions vs Traditional Methods Comparison
| Dimension | Traditional GROUP BY + JOIN | DuckDB Window Functions | DuckDB Advantage |
|---|---|---|---|
| Lines of code | 50-100 | 10-20 | 70%+ less code |
| Execution speed | Multiple full table scans | Single scan | 3-10x faster |
| Readability | Nested subquery hell | Linear logic flow | 60% less maintenance cost |
| Flexibility | Fixed grouping | Arbitrary sliding windows | Real-time dynamic analysis |
| Memory usage | Intermediate results explode | Stream processing | Large tables need no extra storage |
Six: Monetization Strategy — Turn Window Functions into Money Printers
The real value of window functions isn’t in the technology itself — it’s that they solve high-frequency business problems at extremely low marginal cost. Here are three monetization paths:
Path 1: Standardized Analytics Template Products
Package the cohort retention, GMV trends, and performance ranking queries above into configurable templates. Clients only provide a data connection; you deliver a complete analytics dashboard. Pricing: $70-280/template/month.
Path 2: Data as a Service (DaaS)
Build an “E-commerce/Retail Industry Data Monitoring Platform” that uses window functions to compute core metrics in real time, delivered via API or email daily reports. Pricing: $280-700/month/client.
Path 3: Consulting + Training Combo
Train enterprise data teams to use window functions to replace inefficient Pandas/Excel workflows. Single consulting engagement: $700-2,800; Training: $420-1,100/day.
Key Insight: Window functions are the most undervalued monetization lever in the DuckDB ecosystem. One well-crafted window function query can simultaneously serve three business scenarios — retention analysis, anomaly detection, and ranking statistics — in under 30 lines of code.
Learn more DuckDB battle-tested techniques → duckdblab.org