Hands-on Projects

Building an Automated E-commerce Monitoring Dashboard with DuckDB

Build a complete e-commerce monitoring dashboard from scratch using DuckDB: CSV direct reading, YoY analysis, Z-Score anomaly detection, and HTML report generation — all in under 100 lines of code.

Building an Automated E-commerce Monitoring Dashboard with DuckDB

A data product blueprint: Build monthly sales monitoring dashboards for e-commerce brands, charging $400-700/month per client, with the entire pipeline under 100 lines of code.

One. From Requirement to Product

Many data analysts have received requests like this:

“Help me monitor the sales data of these stores. Generate automatic daily reports and alert me immediately when there are anomalies.”

What’s the traditional approach? Python script calling APIs → storing in MySQL → Jupyter Notebook analysis → exporting Excel → email delivery. The whole workflow takes at least three hours and has extremely high maintenance costs.

With DuckDB, the entire process compresses to under 100 lines of code. And this exact solution is what I use to provide monthly data services to three brand clients, charging $400-700/month per client.

Core idea: DuckDB can directly read CSV files (read_csv_auto) without importing into a database; use the LAG() window function for period-over-period calculations; use moving average ± standard deviation for anomaly detection; finally export an HTML dashboard for clients to view independently.

Two. Data Preparation

Assume your data sources look like this:

orders.csv — Orders table (order_id, order_date, product_id, customer_id, amount)
products.csv — Products table (product_id, category, shop_name)
customers.csv — Customers table (customer_id, city, member_level)

You can use any CSV exported from e-commerce platforms — just match the structure. DuckDB’s read_csv_auto() automatically infers column types, no manual schema definition needed.

import duckdb
import pandas as pd

con = duckdb.connect(':memory:')

# Directly read CSV files, no database import required
orders = con.execute("SELECT * FROM read_csv_auto('orders.csv')").fetchdf()
products = con.execute("SELECT * FROM read_csv_auto('products.csv')").fetchdf()
customers = con.execute("SELECT * FROM read_csv_auto('customers.csv')").fetchdf()

Pro tip: If data volume is large (exceeding memory), use duckdb.connect('cache.duckdb') to open a persistent database. DuckDB will automatically create columnar storage format and use memory mapping.

Three. Core Analysis: Multi-dimensional Aggregation + Period-over-Period Calculation

Here comes the core part — use a single SQL query to accomplish multi-dimensional aggregation and period-over-period calculations simultaneously:

daily_dashboard = con.execute("""
    WITH order_stats AS (
        SELECT 
            DATE(o.order_date) AS sale_date,
            p.category,
            p.shop_name,
            COUNT(*) AS order_count,
            SUM(o.amount) AS total_revenue,
            AVG(o.amount) AS avg_order_value,
            COUNT(DISTINCT o.customer_id) AS unique_customers
        FROM orders o
        JOIN products p ON o.product_id = p.product_id
        GROUP BY DATE(o.order_date), p.category, p.shop_name
    ),
    day_over_day AS (
        SELECT *,
            LAG(total_revenue) OVER (
                PARTITION BY shop_name ORDER BY sale_date
            ) AS prev_day_revenue,
            LAG(order_count) OVER (
                PARTITION BY shop_name ORDER BY sale_date
            ) AS prev_day_orders
        FROM order_stats
    )
    SELECT 
        sale_date,
        shop_name,
        category,
        order_count,
        ROUND(total_revenue, 2) AS total_revenue,
        ROUND(avg_order_value, 2) AS avg_order_value,
        unique_customers,
        CASE 
            WHEN prev_day_revenue IS NOT NULL THEN 
                ROUND((total_revenue - prev_day_revenue) / prev_day_revenue * 100, 2)
            ELSE NULL 
        END AS revenue_change_pct,
        CASE 
            WHEN prev_day_orders IS NOT NULL AND prev_day_orders > 0 THEN
                ROUND((order_count - prev_day_orders) / prev_day_orders * 100, 2)
            ELSE NULL 
        END AS orders_change_pct
    FROM day_over_day
    WHERE sale_date >= CURRENT_DATE - INTERVAL '30' DAY
    ORDER BY sale_date DESC, total_revenue DESC
""").fetchdf()

Two key techniques here:

CTE chain queries — First use order_stats for aggregation, then use day_over_day for window function calculations. Clear logic, easy to maintain.
LAG window function — Calculate previous day’s data without Python loops, then compute period-over-period percentage changes.

Four. Anomaly Detection: Z-Score to Automatically Identify Data Irregularities

Just looking at reports isn’t enough — you also need to automatically identify anomalies. This code uses a 7-day moving average ± 2 standard deviations statistical method to find categories with significant sales drops:

alert_sql = """
    WITH category_trend AS (
        SELECT 
            category,
            DATE(sale_date) AS dt,
            SUM(total_revenue) AS daily_rev,
            AVG(daily_rev) OVER (
                ORDER BY dt 
                ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
            ) AS moving_avg_7d,
            STDDEV(daily_rev) OVER (
                ORDER BY dt 
                ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
            ) AS moving_std_7d
        FROM order_stats
        WHERE sale_date >= CURRENT_DATE - INTERVAL '14' DAY
        GROUP BY category, DATE(sale_date)
    )
    SELECT 
        category,
        dt,
        ROUND(daily_rev, 2) AS today_revenue,
        ROUND(moving_avg_7d, 2) AS seven_day_avg,
        ROUND((daily_rev - moving_avg_7d) / NULLIF(moving_std_7d, 0), 2) AS z_score
    FROM category_trend
    WHERE dt = (SELECT MAX(dt) FROM category_trend)
      AND daily_rev < moving_avg_7d - 2 * moving_std_7d
    ORDER BY z_score ASC
"""

alerts = con.execute(alert_sql).fetchdf()
print(f"⚠️ Found {len(alerts)} anomalous categories")
for _, row in alerts.iterrows():
    print(f"  - {row['category']}: Today ${row['today_revenue']} vs Avg ${row['seven_day_avg']} (Z-Score: {row['z_score']})")

How it works:

Calculate moving average and standard deviation from the previous 7 days of daily revenue
If today’s data falls below mean - 2 × standard_deviation, a significant drop has occurred
The more negative the Z-Score, the more severe the anomaly

All done in SQL — no Python loops needed.

Five. Generating the HTML Dashboard

The final step: export analysis results as an HTML file that can be shared directly:

import html

html_content = f"""
<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>E-commerce Sales Dashboard - {pd.Timestamp.now().strftime('%Y-%m-%d')}</title>
    <style>
        body {{ font-family: -apple-system, sans-serif; margin: 20px; background: #f5f5f5; }}
        .card {{ background: white; border-radius: 8px; padding: 20px; margin: 10px 0; 
                 box-shadow: 0 1px 3px rgba(0,0,0,0.1); }}
        .metric {{ display: inline-block; margin-right: 30px; }}
        .metric-value {{ font-size: 28px; font-weight: bold; color: #1a73e8; }}
        .metric-label {{ font-size: 12px; color: #999; }}
        .negative {{ color: #e74c3c; }}
        .positive {{ color: #27ae60; }}
    </style>
</head>
<body>
    <h1>📊 E-commerce Sales Monitoring Dashboard</h1>
    <p>Data Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M')}</p>
"""

# Insert core metrics
summary = con.execute("""
    SELECT 
        COUNT(*) as total_orders,
        ROUND(SUM(total_revenue), 2) as total_revenue,
        ROUND(AVG(avg_order_value), 2) as avg_order_value
    FROM order_stats
""").fetchone()

html_content += f"""
    <div class="card">
        <h2>Core Metrics (Last 30 Days)</h2>
        <div class="metrics">
            <div class="metric"><div class="metric-value">{summary[0]}</div><div class="metric-label">Total Orders</div></div>
            <div class="metric"><div class="metric-value">${summary[1]:,.0f}</div><div class="metric-label">Total Revenue</div></div>
            <div class="metric"><div class="metric-value">${summary[2]:,.0f}</div><div class="metric-label">Avg Order Value</div></div>
        </div>
    </div>
"""

# Insert alert module
if len(alerts) > 0:
    html_content += '<div class="card"><h2 style="color:#e74c3c">⚠️ Anomaly Alerts</h2><ul>'
    for _, row in alerts.iterrows():
        html_content += f"<li><b>{row['category']}</b>: Today ${row['today_revenue']} (Z-Score: {row['z_score']})</li>"
    html_content += '</ul></div>'

html_content += '</body></html>'

with open('dashboard.html', 'w') as f:
    f.write(html_content)

print("✅ Dashboard generated: dashboard.html")

The generated HTML file can be sent via email, placed on an intranet server, or published using any static hosting service.

Six. Comparison with Traditional Approaches

Dimension	Traditional Approach (Python + MySQL + Jupyter)	DuckDB Approach
Data Loading	Import to MySQL first, then query	Direct CSV read, zero ETL
Period-over-Period	Python loops + merge	One-line SQL `LAG()`
Anomaly Detection	Need statsmodels library	Window functions + math ops
Lines of Code	200+	Under 100
Runtime	30 seconds ~ minutes	1-3 seconds
Deployment Cost	Requires MySQL server	Single `pip install duckdb`

Seven. Monetization Strategy

The core value of this solution lies in turning data analysis into a subscription-based data service product:

Monthly Monitoring Service ($400-700/month/client): Provide automated sales dashboards for e-commerce brands with daily updates and anomaly alerts.
SaaS Upgrade: Wrap with FastAPI as an API, use Streamlit or Evidence for multi-tenant dashboards, each client gets independent data sources.
Industry Templates: Package this logic as a universal template (retail, dining, education) and sell in bulk to merchants in the same industry.
Value-Added Data Services: Add competitor comparison and trend prediction features on top of monitoring, doubling the price.

Key insight: Don’t charge per project, charge per subscription. DuckDB’s low maintenance cost lets your data service cover many small clients — small amounts add up quickly.

📖 The complete runnable code (including sample data generation) is published on duckdblab.org with more detailed deployment steps and additional e-commerce scenario examples.

💡 Want to systematically learn DuckDB? duckdblab.org has a complete tutorial series from beginner to advanced.

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials