Building an Automated E-commerce Monitoring Dashboard with DuckDB
A data product blueprint: Build monthly sales monitoring dashboards for e-commerce brands, charging $400-700/month per client, with the entire pipeline under 100 lines of code.
One. From Requirement to Product
Many data analysts have received requests like this:
“Help me monitor the sales data of these stores. Generate automatic daily reports and alert me immediately when there are anomalies.”
What’s the traditional approach? Python script calling APIs → storing in MySQL → Jupyter Notebook analysis → exporting Excel → email delivery. The whole workflow takes at least three hours and has extremely high maintenance costs.
With DuckDB, the entire process compresses to under 100 lines of code. And this exact solution is what I use to provide monthly data services to three brand clients, charging $400-700/month per client.

Core idea: DuckDB can directly read CSV files (read_csv_auto) without importing into a database; use the LAG() window function for period-over-period calculations; use moving average ± standard deviation for anomaly detection; finally export an HTML dashboard for clients to view independently.
Two. Data Preparation
Assume your data sources look like this:
orders.csv— Orders table (order_id, order_date, product_id, customer_id, amount)products.csv— Products table (product_id, category, shop_name)customers.csv— Customers table (customer_id, city, member_level)
You can use any CSV exported from e-commerce platforms — just match the structure. DuckDB’s read_csv_auto() automatically infers column types, no manual schema definition needed.
import duckdb
import pandas as pd
con = duckdb.connect(':memory:')
# Directly read CSV files, no database import required
orders = con.execute("SELECT * FROM read_csv_auto('orders.csv')").fetchdf()
products = con.execute("SELECT * FROM read_csv_auto('products.csv')").fetchdf()
customers = con.execute("SELECT * FROM read_csv_auto('customers.csv')").fetchdf()
Pro tip: If data volume is large (exceeding memory), use
duckdb.connect('cache.duckdb')to open a persistent database. DuckDB will automatically create columnar storage format and use memory mapping.
Three. Core Analysis: Multi-dimensional Aggregation + Period-over-Period Calculation
Here comes the core part — use a single SQL query to accomplish multi-dimensional aggregation and period-over-period calculations simultaneously:
daily_dashboard = con.execute("""
WITH order_stats AS (
SELECT
DATE(o.order_date) AS sale_date,
p.category,
p.shop_name,
COUNT(*) AS order_count,
SUM(o.amount) AS total_revenue,
AVG(o.amount) AS avg_order_value,
COUNT(DISTINCT o.customer_id) AS unique_customers
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY DATE(o.order_date), p.category, p.shop_name
),
day_over_day AS (
SELECT *,
LAG(total_revenue) OVER (
PARTITION BY shop_name ORDER BY sale_date
) AS prev_day_revenue,
LAG(order_count) OVER (
PARTITION BY shop_name ORDER BY sale_date
) AS prev_day_orders
FROM order_stats
)
SELECT
sale_date,
shop_name,
category,
order_count,
ROUND(total_revenue, 2) AS total_revenue,
ROUND(avg_order_value, 2) AS avg_order_value,
unique_customers,
CASE
WHEN prev_day_revenue IS NOT NULL THEN
ROUND((total_revenue - prev_day_revenue) / prev_day_revenue * 100, 2)
ELSE NULL
END AS revenue_change_pct,
CASE
WHEN prev_day_orders IS NOT NULL AND prev_day_orders > 0 THEN
ROUND((order_count - prev_day_orders) / prev_day_orders * 100, 2)
ELSE NULL
END AS orders_change_pct
FROM day_over_day
WHERE sale_date >= CURRENT_DATE - INTERVAL '30' DAY
ORDER BY sale_date DESC, total_revenue DESC
""").fetchdf()
Two key techniques here:
- CTE chain queries — First use
order_statsfor aggregation, then useday_over_dayfor window function calculations. Clear logic, easy to maintain. - LAG window function — Calculate previous day’s data without Python loops, then compute period-over-period percentage changes.
Four. Anomaly Detection: Z-Score to Automatically Identify Data Irregularities
Just looking at reports isn’t enough — you also need to automatically identify anomalies. This code uses a 7-day moving average ± 2 standard deviations statistical method to find categories with significant sales drops:
alert_sql = """
WITH category_trend AS (
SELECT
category,
DATE(sale_date) AS dt,
SUM(total_revenue) AS daily_rev,
AVG(daily_rev) OVER (
ORDER BY dt
ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
) AS moving_avg_7d,
STDDEV(daily_rev) OVER (
ORDER BY dt
ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
) AS moving_std_7d
FROM order_stats
WHERE sale_date >= CURRENT_DATE - INTERVAL '14' DAY
GROUP BY category, DATE(sale_date)
)
SELECT
category,
dt,
ROUND(daily_rev, 2) AS today_revenue,
ROUND(moving_avg_7d, 2) AS seven_day_avg,
ROUND((daily_rev - moving_avg_7d) / NULLIF(moving_std_7d, 0), 2) AS z_score
FROM category_trend
WHERE dt = (SELECT MAX(dt) FROM category_trend)
AND daily_rev < moving_avg_7d - 2 * moving_std_7d
ORDER BY z_score ASC
"""
alerts = con.execute(alert_sql).fetchdf()
print(f"⚠️ Found {len(alerts)} anomalous categories")
for _, row in alerts.iterrows():
print(f" - {row['category']}: Today ${row['today_revenue']} vs Avg ${row['seven_day_avg']} (Z-Score: {row['z_score']})")
How it works:
- Calculate moving average and standard deviation from the previous 7 days of daily revenue
- If today’s data falls below
mean - 2 × standard_deviation, a significant drop has occurred - The more negative the Z-Score, the more severe the anomaly
All done in SQL — no Python loops needed.
Five. Generating the HTML Dashboard
The final step: export analysis results as an HTML file that can be shared directly:
import html
html_content = f"""
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>E-commerce Sales Dashboard - {pd.Timestamp.now().strftime('%Y-%m-%d')}</title>
<style>
body {{ font-family: -apple-system, sans-serif; margin: 20px; background: #f5f5f5; }}
.card {{ background: white; border-radius: 8px; padding: 20px; margin: 10px 0;
box-shadow: 0 1px 3px rgba(0,0,0,0.1); }}
.metric {{ display: inline-block; margin-right: 30px; }}
.metric-value {{ font-size: 28px; font-weight: bold; color: #1a73e8; }}
.metric-label {{ font-size: 12px; color: #999; }}
.negative {{ color: #e74c3c; }}
.positive {{ color: #27ae60; }}
</style>
</head>
<body>
<h1>📊 E-commerce Sales Monitoring Dashboard</h1>
<p>Data Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M')}</p>
"""
# Insert core metrics
summary = con.execute("""
SELECT
COUNT(*) as total_orders,
ROUND(SUM(total_revenue), 2) as total_revenue,
ROUND(AVG(avg_order_value), 2) as avg_order_value
FROM order_stats
""").fetchone()
html_content += f"""
<div class="card">
<h2>Core Metrics (Last 30 Days)</h2>
<div class="metrics">
<div class="metric"><div class="metric-value">{summary[0]}</div><div class="metric-label">Total Orders</div></div>
<div class="metric"><div class="metric-value">${summary[1]:,.0f}</div><div class="metric-label">Total Revenue</div></div>
<div class="metric"><div class="metric-value">${summary[2]:,.0f}</div><div class="metric-label">Avg Order Value</div></div>
</div>
</div>
"""
# Insert alert module
if len(alerts) > 0:
html_content += '<div class="card"><h2 style="color:#e74c3c">⚠️ Anomaly Alerts</h2><ul>'
for _, row in alerts.iterrows():
html_content += f"<li><b>{row['category']}</b>: Today ${row['today_revenue']} (Z-Score: {row['z_score']})</li>"
html_content += '</ul></div>'
html_content += '</body></html>'
with open('dashboard.html', 'w') as f:
f.write(html_content)
print("✅ Dashboard generated: dashboard.html")
The generated HTML file can be sent via email, placed on an intranet server, or published using any static hosting service.
Six. Comparison with Traditional Approaches
| Dimension | Traditional Approach (Python + MySQL + Jupyter) | DuckDB Approach |
|---|---|---|
| Data Loading | Import to MySQL first, then query | Direct CSV read, zero ETL |
| Period-over-Period | Python loops + merge | One-line SQL LAG() |
| Anomaly Detection | Need statsmodels library | Window functions + math ops |
| Lines of Code | 200+ | Under 100 |
| Runtime | 30 seconds ~ minutes | 1-3 seconds |
| Deployment Cost | Requires MySQL server | Single pip install duckdb |
Seven. Monetization Strategy
The core value of this solution lies in turning data analysis into a subscription-based data service product:
- Monthly Monitoring Service ($400-700/month/client): Provide automated sales dashboards for e-commerce brands with daily updates and anomaly alerts.
- SaaS Upgrade: Wrap with FastAPI as an API, use Streamlit or Evidence for multi-tenant dashboards, each client gets independent data sources.
- Industry Templates: Package this logic as a universal template (retail, dining, education) and sell in bulk to merchants in the same industry.
- Value-Added Data Services: Add competitor comparison and trend prediction features on top of monitoring, doubling the price.
Key insight: Don’t charge per project, charge per subscription. DuckDB’s low maintenance cost lets your data service cover many small clients — small amounts add up quickly.
📖 The complete runnable code (including sample data generation) is published on duckdblab.org with more detailed deployment steps and additional e-commerce scenario examples.
💡 Want to systematically learn DuckDB? duckdblab.org has a complete tutorial series from beginner to advanced.