Featured image of post E-Commerce Daily Report with DuckDB: 5 SQL Queries for Automated Sales Reports, ¥99/Month SaaS

E-Commerce Daily Report with DuckDB: 5 SQL Queries for Automated Sales Reports, ¥99/Month SaaS

Build a fully automated e-commerce daily report system with DuckDB. 5 core SQL queries cover sales overview, category analysis, city heatmap, repurchase rate, and trend comparison. Deploy as a SaaS product charging $99/month.

E-Commerce Daily Report with DuckDB: 5 SQL Queries for Automated Sales Reports

If you make a living doing data analysis freelance work, the most frustrating part isn’t writing SQL — it’s running the same daily report over and over: yesterday’s sales summary, category rankings, city heatmaps, repurchase trends. You charge ¥500-2000 per client, but each one takes 1-2 hours of manual processing.

Today I’ll break down a real, reproducible project: building a fully automated e-commerce daily report system with DuckDB, from data ingestion to report generation in one click. Once set up, this system can be sold to any e-commerce seller as a monthly SaaS subscription at ¥99/month.

E-Commerce Daily Report System Architecture


1. System Architecture: 3 Files, Complete System

The entire system needs just 3 files:

  1. generate_report.py — Core report generation script
  2. daily_report.md.j2 — Jinja2 template
  3. config.yaml — Configuration file

No database needed. No web framework needed. DuckDB reads CSV/Parquet directly, Jinja2 generates Markdown, and SMTP sends emails.

Data flow:

Order CSV → DuckDB (direct read) → 5 SQL queries → Jinja2 template → Markdown report → Email delivery

Let’s break down each of the 5 core SQL queries.


2. Query 1: Real-Time Sales Overview

First, calculate the day’s key KPIs in a single query.

SELECT
    COUNT(*) AS total_orders,
    ROUND(SUM(amount), 2) AS total_revenue,
    ROUND(AVG(amount), 2) AS avg_order_value,
    COUNT(DISTINCT customer) AS unique_customers,
    MAX(order_date) AS latest_order_date
FROM sales
WHERE order_date = CURRENT_DATE;

This query produces the KPI card at the top of your daily report:

📊 Yesterday's Sales Overview (2026-06-09)
━━━━━━━━━━━━━━━━━━━━━━━━━━
📦 Total Orders: 42
💰 Total Revenue: ¥186,420.00
🎯 Average Order Value: ¥4,438.57
👥 Unique Customers: 38

Monetization tip: Many e-commerce sellers rely on built-in dashboards from platforms like Taobao’s QianNiu or Baidu’s ShengYiCanMou. These tools don’t show custom metrics like “unique customers” or “average order value trends.” Your daily report adds 2-3 dimensions that give sellers a real edge — and that’s what they pay for.

Comparison with Traditional Tools

ToolSetup TimeCode LinesDeploymentBest For
Excel/VBA2-3 days50+MediumSmall shops, manual updates
Pandas + Jupyter1 day80+MediumData analysts’ personal use
Power BI1-2 weeksVisual drag-and-dropHighInteractive dashboards
DuckDB + Python2 hours30LowAutomated daily reports, zero ops

3. Query 2: Deep Category Analysis

Second, analyze revenue contribution and potential by category.

WITH category_stats AS (
    SELECT
        category,
        COUNT(*) AS orders,
        ROUND(SUM(amount), 2) AS revenue,
        ROUND(AVG(amount), 2) AS avg_price,
        COUNT(DISTINCT customer) AS customers,
        ROUND(
            SUM(amount) * 100.0 / SUM(SUM(amount)) OVER(), 1
        ) AS revenue_share
    FROM sales
    WHERE order_date = CURRENT_DATE
    GROUP BY category
),
ranked AS (
    SELECT *,
        ROW_NUMBER() OVER (ORDER BY revenue DESC) AS rev_rank,
        ROW_NUMBER() OVER (ORDER BY customers DESC) AS cust_rank
    FROM category_stats
)
SELECT
    category,
    orders,
    revenue,
    avg_price,
    customers,
    revenue_share,
    rev_rank,
    cust_rank
FROM ranked
ORDER BY revenue DESC;

Sample output:

Category Revenue Ranking (2026-06-09)
━━━━━━━━━━━━━━━━━━━━━━━━━━
① Electronics — ¥86,390 (46.3%) — Avg ¥8,639
② Home Appliances — ¥35,994 (19.3%) — Avg ¥5,999
③ Peripherals — ¥10,096 (5.4%) — Avg ¥2,524
④ Personal Care — ¥4,598 (2.5%) — Avg ¥2,299

Key technical point: Window Functions. The SUM() OVER() pattern lets you calculate aggregate proportions in a single pass without subqueries. ROW_NUMBER() handles ranking. This is a core DuckDB capability — no additional installation needed.

Monetization tip: SUM() OVER() can be done in MySQL but is much slower, and in Pandas it requires ~10 lines of code. Emphasizing this efficiency gap is the key argument for convincing clients to migrate from Excel/Pandas to DuckDB.


4. Query 3: City Heatmap

Third, identify high-value cities.

SELECT
    city,
    COUNT(*) AS orders,
    ROUND(SUM(amount), 2) AS total_revenue,
    ROUND(AVG(amount), 2) AS avg_order,
    COUNT(DISTINCT customer) AS unique_customers,
    ROUND(PERCENT_RANK() OVER (ORDER BY total_revenue), 3) AS revenue_percentile,
    ROUND(PERCENT_RANK() OVER (ORDER BY avg_order), 3) AS avg_order_percentile
FROM sales
WHERE order_date = CURRENT_DATE
GROUP BY city
ORDER BY total_revenue DESC
LIMIT 15;

Sample output:

City Value Ranking Top 10
━━━━━━━━━━━━━━━━━━━━━━━━━━
Beijing   — ¥19,999  (Revenue percentile: 1.000) 🏆
Changsha  — ¥18,999  (Revenue percentile: 0.950)
Shanghai  — ¥15,999  (Revenue percentile: 0.900)
Dalian    — ¥14,999  (Revenue percentile: 0.850)
Ningbo    — ¥12,999  (Revenue percentile: 0.800)

Key technique: PERCENT_RANK() — This is one of DuckDB’s killer features. It quickly shows which cities, despite having fewer orders, have exceptionally high average order values — exactly the kind of insight you need to decide where to target ad spend. In Excel, you’d manually sort and calculate ranks. In DuckDB, one line of SQL.

PERCENT_RANK(): DuckDB vs Pandas

# Pandas approach: multiple steps
df['revenue_percentile'] = df['total_revenue'].rank(pct=True)
-- DuckDB: one line
PERCENT_RANK() OVER (ORDER BY total_revenue)

For million-row datasets, DuckDB typically processes 5-10x faster than Pandas.


5. Query 4: Repurchase Rate Analysis

Fourth, calculate the repurchase rate — the single metric e-commerce bosses care about most.

WITH customer_orders AS (
    SELECT
        customer,
        COUNT(*) AS order_count,
        MIN(order_date) AS first_order,
        MAX(order_date) AS last_order,
        ROUND(SUM(amount), 2) AS total_spent
    FROM sales
    GROUP BY customer
),
repurchase AS (
    SELECT
        COUNT(*) AS total_customers,
        SUM(CASE WHEN order_count > 1 THEN 1 ELSE 0 END) AS repurchase_count
    FROM customer_orders
)
SELECT
    total_customers,
    repurchase_count,
    ROUND(repurchase_count * 100.0 / total_customers, 1) AS repurchase_rate,
    ROUND(avg_clv, 2) AS avg_customer_lifetime_value
FROM repurchase
CROSS JOIN (
    SELECT AVG(total_spent) AS avg_clv
    FROM (
        SELECT SUM(amount) AS total_spent
        FROM sales
        GROUP BY customer
    ) sub
) avg_calc;

Sample output:

🔄 Repurchase Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━
👥 Total Customers: 128
🔁 Repurchasers: 23 (18.0%)
💎 Avg Customer Lifetime Value: ¥8,456.32

Key technique: CTEs (Common Table Expressions). The multi-step CTE approach — first aggregate orders by customer, then count repurchasers, then compute ratios — makes SQL logic as structured as writing code. Readability far exceeds nested subqueries.

Monetization tip: An 18% repurchase rate is average for e-commerce. But pushing it to 30%+ drastically reduces customer acquisition costs. This analysis directly ties to ROI: every 1% increase in repurchase rate equals additional monthly profit.


6. Query 5: Trend Comparison (vs. Yesterday)

Fifth, the most valuable part of any daily report — answering “is today better or worse than yesterday?”

WITH today AS (
    SELECT
        COUNT(*) AS orders,
        ROUND(SUM(amount), 2) AS revenue,
        ROUND(AVG(amount), 2) AS avg_order
    FROM sales
    WHERE order_date = CURRENT_DATE
),
yesterday AS (
    SELECT
        COUNT(*) AS orders,
        ROUND(SUM(amount), 2) AS revenue,
        ROUND(AVG(amount), 2) AS avg_order
    FROM sales
    WHERE order_date = CURRENT_DATE - INTERVAL 1 DAY
)
SELECT
    t.orders, t.revenue, t.avg_order,
    y.orders AS prev_orders,
    y.revenue AS prev_revenue,
    y.avg_order AS prev_avg_order,
    ROUND((t.revenue - y.revenue) * 100.0 / NULLIF(y.revenue, 0), 1) AS revenue_change_pct,
    ROUND((t.orders - y.orders) * 100.0 / NULLIF(y.orders, 0), 1) AS orders_change_pct
FROM today t
CROSS JOIN yesterday y;

Sample output:

📈 vs. Yesterday
━━━━━━━━━━━━━━━━━━━━━━━━━━
📦 Orders: 42 (+12.8%)
💰 Revenue: ¥186,420 (+8.3%)
🎯 Avg Order Value: ¥4,438.57 (-3.9%)

Key technique: NULLIF() for division-by-zero safety. This query contains the core pattern for time-series analysis — using CTEs for self-joins comparison, without any external tool.


7. Automated Deployment: cron + Python

Set up cron to auto-generate and email the report daily at 2 AM:

# Edit crontab
crontab -e

# Add this line: generate and send daily report at 2 AM
0 2 * * * cd /root/telegram-duckdb-bot && python3 generate_report.py --send-email

Core logic of generate_report.py:

import duckdb
import jinja2
import smtplib
from email.mime.text import MIMEText
from datetime import date

# 1. Connect DuckDB (reads Parquet/CSV directly)
con = duckdb.connect("ecommerce.duckdb")

# 2. Execute all queries
kpi_result = con.execute("""
    SELECT COUNT(*) AS total_orders,
           ROUND(SUM(amount), 2) AS total_revenue,
           ROUND(AVG(amount), 2) AS avg_order_value,
           COUNT(DISTINCT customer) AS unique_customers
    FROM sales
    WHERE order_date = CURRENT_DATE
""").fetchdf()

# 3. Render Markdown with Jinja2
template = jinja2.Template(open("daily_report.md.j2").read())
report_md = template.render(
    date=date.today(),
    kpi=kpi_result.to_dict('records')[0],
)

# 4. Send email
msg = MIMEText(report_md, 'html', 'utf-8')
msg['Subject'] = f'E-Commerce Daily Report {date.today()}'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'

smtp = smtplib.SMTP('smtp.yourdomain.com', 587)
smtp.starttls()
smtp.login('user', 'password')
smtp.send_message(msg)

The entire system — from data ingestion to email delivery — is under 50 lines of core code. DuckDB processes 1 million rows in typically 0.5-2 seconds, far faster than Pandas at 3-8 seconds.


8. Complete Comparison with Traditional Approaches

DimensionExcel/ManualPandas + JupyterPower BIDuckDB Daily Report
Setup TimeDaysHalf day1-2 weeks2 hours
MaintenanceHigh (manual each time)MediumHighLow (cron automated)
Query SpeedSlow (>100K rows lags)MediumMediumFast (columnar storage)
ScalabilityPoorMediumPoorGood (supports Parquet)
Monthly Cost¥0 + labor¥0 + labor¥500+/month¥0 + server ¥50
Suitable Data Volume<100K rows<10M rows<100M rowsUnlimited (disk-based)

9. Monetization: Three Paths to Revenue

Path A: Sell the Template (¥199 one-time)

Package the scripts as an “E-Commerce Daily Report Automation Template.” Share sample SQL on platforms like Zsxq, V2EX, or Juejin. Charge ¥199 for the full package.

Target: E-commerce operators, data analysts, small studios

Path B: SaaS Service (¥99/month per store)

Deploy the script on a cloud server as a web product. Clients upload their CSV orders and receive automated Markdown/PDF daily reports.

Pricing: ¥99/month per store, multi-store support Target: Small-to-medium e-commerce sellers who can’t afford a full-time data analyst

Path C: Freelance Delivery (¥500-2000 per project)

Help clients with custom configuration: integrate with their order systems (Shopify, YouZan, QianNiu), customize report templates, set up automated email delivery.

Delivery time: 1-2 days Profit margin: 80%+ (mostly SQL and Python scripts, near-zero marginal cost)

Path D: Data Monitoring as a Service (Advanced)

Add anomaly detection on top of the daily report: when sales drop more than 20% suddenly, automatically send a Telegram message to the boss. This premium feature can be sold separately at ¥49/month.


10. Advanced Expansion Directions

Once clients are hooked on the basic daily report, you can expand:

  1. Weekly/Monthly reports: Aggregate daily data into weekly/monthly views with YoY/MoM trend visualization
  2. Competitor analysis: Use DuckDB’s httpfs extension to scrape competitor prices and compare automatically
  3. Customer profiling: Use DuckDB’s UNNEST and list functions to analyze purchase patterns and build customer personas
  4. Visual upgrade: Connect Markdown reports to Evidence or Grafana for interactive dashboards
  5. Multi-store aggregation: Use DuckDB’s read_parquet with glob patterns to process Parquet files from multiple stores simultaneously

Summary

The core advantages of this DuckDB e-commerce daily report system:

  • 5 SQL queries cover 90% of daily report needs: sales overview, category analysis, city heatmap, repurchase rate, trend comparison
  • Zero-dependency deployment: no database, web framework, or BI tool required
  • Automated execution: cron + Python runs daily automatically, boss wakes up to a report
  • Highly scalable: from 1 store to 100 stores, code hardly changes

For data service professionals, this is a classic “build once, sell repeatedly” money-making project. Mastering DuckDB in these scenarios lets you deliver freelance projects and build SaaS products 10x faster than competitors.


📖 The complete version of this article is published at duckdblab.org, including the full code files, Jinja2 template, and downloadable sample dataset.

💡 Want to systematically learn DuckDB for e-commerce analytics? duckdblab.org has a complete tutorial series from beginner to monetization, covering advanced topics like Parquet processing, window function mastery, and automated deployment.

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy