E-Commerce Daily Report with DuckDB: 5 SQL Queries for Automated Sales Reports
If you make a living doing data analysis freelance work, the most frustrating part isn’t writing SQL — it’s running the same daily report over and over: yesterday’s sales summary, category rankings, city heatmaps, repurchase trends. You charge ¥500-2000 per client, but each one takes 1-2 hours of manual processing.
Today I’ll break down a real, reproducible project: building a fully automated e-commerce daily report system with DuckDB, from data ingestion to report generation in one click. Once set up, this system can be sold to any e-commerce seller as a monthly SaaS subscription at ¥99/month.

1. System Architecture: 3 Files, Complete System
The entire system needs just 3 files:
generate_report.py— Core report generation scriptdaily_report.md.j2— Jinja2 templateconfig.yaml— Configuration file
No database needed. No web framework needed. DuckDB reads CSV/Parquet directly, Jinja2 generates Markdown, and SMTP sends emails.
Data flow:
Order CSV → DuckDB (direct read) → 5 SQL queries → Jinja2 template → Markdown report → Email delivery
Let’s break down each of the 5 core SQL queries.
2. Query 1: Real-Time Sales Overview
First, calculate the day’s key KPIs in a single query.
SELECT
COUNT(*) AS total_orders,
ROUND(SUM(amount), 2) AS total_revenue,
ROUND(AVG(amount), 2) AS avg_order_value,
COUNT(DISTINCT customer) AS unique_customers,
MAX(order_date) AS latest_order_date
FROM sales
WHERE order_date = CURRENT_DATE;
This query produces the KPI card at the top of your daily report:
📊 Yesterday's Sales Overview (2026-06-09)
━━━━━━━━━━━━━━━━━━━━━━━━━━
📦 Total Orders: 42
💰 Total Revenue: ¥186,420.00
🎯 Average Order Value: ¥4,438.57
👥 Unique Customers: 38
Monetization tip: Many e-commerce sellers rely on built-in dashboards from platforms like Taobao’s QianNiu or Baidu’s ShengYiCanMou. These tools don’t show custom metrics like “unique customers” or “average order value trends.” Your daily report adds 2-3 dimensions that give sellers a real edge — and that’s what they pay for.
Comparison with Traditional Tools
| Tool | Setup Time | Code Lines | Deployment | Best For |
|---|---|---|---|---|
| Excel/VBA | 2-3 days | 50+ | Medium | Small shops, manual updates |
| Pandas + Jupyter | 1 day | 80+ | Medium | Data analysts’ personal use |
| Power BI | 1-2 weeks | Visual drag-and-drop | High | Interactive dashboards |
| DuckDB + Python | 2 hours | 30 | Low | Automated daily reports, zero ops |
3. Query 2: Deep Category Analysis
Second, analyze revenue contribution and potential by category.
WITH category_stats AS (
SELECT
category,
COUNT(*) AS orders,
ROUND(SUM(amount), 2) AS revenue,
ROUND(AVG(amount), 2) AS avg_price,
COUNT(DISTINCT customer) AS customers,
ROUND(
SUM(amount) * 100.0 / SUM(SUM(amount)) OVER(), 1
) AS revenue_share
FROM sales
WHERE order_date = CURRENT_DATE
GROUP BY category
),
ranked AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY revenue DESC) AS rev_rank,
ROW_NUMBER() OVER (ORDER BY customers DESC) AS cust_rank
FROM category_stats
)
SELECT
category,
orders,
revenue,
avg_price,
customers,
revenue_share,
rev_rank,
cust_rank
FROM ranked
ORDER BY revenue DESC;
Sample output:
Category Revenue Ranking (2026-06-09)
━━━━━━━━━━━━━━━━━━━━━━━━━━
① Electronics — ¥86,390 (46.3%) — Avg ¥8,639
② Home Appliances — ¥35,994 (19.3%) — Avg ¥5,999
③ Peripherals — ¥10,096 (5.4%) — Avg ¥2,524
④ Personal Care — ¥4,598 (2.5%) — Avg ¥2,299
Key technical point: Window Functions. The SUM() OVER() pattern lets you calculate aggregate proportions in a single pass without subqueries. ROW_NUMBER() handles ranking. This is a core DuckDB capability — no additional installation needed.
Monetization tip: SUM() OVER() can be done in MySQL but is much slower, and in Pandas it requires ~10 lines of code. Emphasizing this efficiency gap is the key argument for convincing clients to migrate from Excel/Pandas to DuckDB.
4. Query 3: City Heatmap
Third, identify high-value cities.
SELECT
city,
COUNT(*) AS orders,
ROUND(SUM(amount), 2) AS total_revenue,
ROUND(AVG(amount), 2) AS avg_order,
COUNT(DISTINCT customer) AS unique_customers,
ROUND(PERCENT_RANK() OVER (ORDER BY total_revenue), 3) AS revenue_percentile,
ROUND(PERCENT_RANK() OVER (ORDER BY avg_order), 3) AS avg_order_percentile
FROM sales
WHERE order_date = CURRENT_DATE
GROUP BY city
ORDER BY total_revenue DESC
LIMIT 15;
Sample output:
City Value Ranking Top 10
━━━━━━━━━━━━━━━━━━━━━━━━━━
Beijing — ¥19,999 (Revenue percentile: 1.000) 🏆
Changsha — ¥18,999 (Revenue percentile: 0.950)
Shanghai — ¥15,999 (Revenue percentile: 0.900)
Dalian — ¥14,999 (Revenue percentile: 0.850)
Ningbo — ¥12,999 (Revenue percentile: 0.800)
Key technique: PERCENT_RANK() — This is one of DuckDB’s killer features. It quickly shows which cities, despite having fewer orders, have exceptionally high average order values — exactly the kind of insight you need to decide where to target ad spend. In Excel, you’d manually sort and calculate ranks. In DuckDB, one line of SQL.
PERCENT_RANK(): DuckDB vs Pandas
# Pandas approach: multiple steps
df['revenue_percentile'] = df['total_revenue'].rank(pct=True)
-- DuckDB: one line
PERCENT_RANK() OVER (ORDER BY total_revenue)
For million-row datasets, DuckDB typically processes 5-10x faster than Pandas.
5. Query 4: Repurchase Rate Analysis
Fourth, calculate the repurchase rate — the single metric e-commerce bosses care about most.
WITH customer_orders AS (
SELECT
customer,
COUNT(*) AS order_count,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order,
ROUND(SUM(amount), 2) AS total_spent
FROM sales
GROUP BY customer
),
repurchase AS (
SELECT
COUNT(*) AS total_customers,
SUM(CASE WHEN order_count > 1 THEN 1 ELSE 0 END) AS repurchase_count
FROM customer_orders
)
SELECT
total_customers,
repurchase_count,
ROUND(repurchase_count * 100.0 / total_customers, 1) AS repurchase_rate,
ROUND(avg_clv, 2) AS avg_customer_lifetime_value
FROM repurchase
CROSS JOIN (
SELECT AVG(total_spent) AS avg_clv
FROM (
SELECT SUM(amount) AS total_spent
FROM sales
GROUP BY customer
) sub
) avg_calc;
Sample output:
🔄 Repurchase Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━
👥 Total Customers: 128
🔁 Repurchasers: 23 (18.0%)
💎 Avg Customer Lifetime Value: ¥8,456.32
Key technique: CTEs (Common Table Expressions). The multi-step CTE approach — first aggregate orders by customer, then count repurchasers, then compute ratios — makes SQL logic as structured as writing code. Readability far exceeds nested subqueries.
Monetization tip: An 18% repurchase rate is average for e-commerce. But pushing it to 30%+ drastically reduces customer acquisition costs. This analysis directly ties to ROI: every 1% increase in repurchase rate equals additional monthly profit.
6. Query 5: Trend Comparison (vs. Yesterday)
Fifth, the most valuable part of any daily report — answering “is today better or worse than yesterday?”
WITH today AS (
SELECT
COUNT(*) AS orders,
ROUND(SUM(amount), 2) AS revenue,
ROUND(AVG(amount), 2) AS avg_order
FROM sales
WHERE order_date = CURRENT_DATE
),
yesterday AS (
SELECT
COUNT(*) AS orders,
ROUND(SUM(amount), 2) AS revenue,
ROUND(AVG(amount), 2) AS avg_order
FROM sales
WHERE order_date = CURRENT_DATE - INTERVAL 1 DAY
)
SELECT
t.orders, t.revenue, t.avg_order,
y.orders AS prev_orders,
y.revenue AS prev_revenue,
y.avg_order AS prev_avg_order,
ROUND((t.revenue - y.revenue) * 100.0 / NULLIF(y.revenue, 0), 1) AS revenue_change_pct,
ROUND((t.orders - y.orders) * 100.0 / NULLIF(y.orders, 0), 1) AS orders_change_pct
FROM today t
CROSS JOIN yesterday y;
Sample output:
📈 vs. Yesterday
━━━━━━━━━━━━━━━━━━━━━━━━━━
📦 Orders: 42 (+12.8%)
💰 Revenue: ¥186,420 (+8.3%)
🎯 Avg Order Value: ¥4,438.57 (-3.9%)
Key technique: NULLIF() for division-by-zero safety. This query contains the core pattern for time-series analysis — using CTEs for self-joins comparison, without any external tool.
7. Automated Deployment: cron + Python
Set up cron to auto-generate and email the report daily at 2 AM:
# Edit crontab
crontab -e
# Add this line: generate and send daily report at 2 AM
0 2 * * * cd /root/telegram-duckdb-bot && python3 generate_report.py --send-email
Core logic of generate_report.py:
import duckdb
import jinja2
import smtplib
from email.mime.text import MIMEText
from datetime import date
# 1. Connect DuckDB (reads Parquet/CSV directly)
con = duckdb.connect("ecommerce.duckdb")
# 2. Execute all queries
kpi_result = con.execute("""
SELECT COUNT(*) AS total_orders,
ROUND(SUM(amount), 2) AS total_revenue,
ROUND(AVG(amount), 2) AS avg_order_value,
COUNT(DISTINCT customer) AS unique_customers
FROM sales
WHERE order_date = CURRENT_DATE
""").fetchdf()
# 3. Render Markdown with Jinja2
template = jinja2.Template(open("daily_report.md.j2").read())
report_md = template.render(
date=date.today(),
kpi=kpi_result.to_dict('records')[0],
)
# 4. Send email
msg = MIMEText(report_md, 'html', 'utf-8')
msg['Subject'] = f'E-Commerce Daily Report {date.today()}'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
smtp = smtplib.SMTP('smtp.yourdomain.com', 587)
smtp.starttls()
smtp.login('user', 'password')
smtp.send_message(msg)
The entire system — from data ingestion to email delivery — is under 50 lines of core code. DuckDB processes 1 million rows in typically 0.5-2 seconds, far faster than Pandas at 3-8 seconds.
8. Complete Comparison with Traditional Approaches
| Dimension | Excel/Manual | Pandas + Jupyter | Power BI | DuckDB Daily Report |
|---|---|---|---|---|
| Setup Time | Days | Half day | 1-2 weeks | 2 hours |
| Maintenance | High (manual each time) | Medium | High | Low (cron automated) |
| Query Speed | Slow (>100K rows lags) | Medium | Medium | Fast (columnar storage) |
| Scalability | Poor | Medium | Poor | Good (supports Parquet) |
| Monthly Cost | ¥0 + labor | ¥0 + labor | ¥500+/month | ¥0 + server ¥50 |
| Suitable Data Volume | <100K rows | <10M rows | <100M rows | Unlimited (disk-based) |
9. Monetization: Three Paths to Revenue
Path A: Sell the Template (¥199 one-time)
Package the scripts as an “E-Commerce Daily Report Automation Template.” Share sample SQL on platforms like Zsxq, V2EX, or Juejin. Charge ¥199 for the full package.
Target: E-commerce operators, data analysts, small studios
Path B: SaaS Service (¥99/month per store)
Deploy the script on a cloud server as a web product. Clients upload their CSV orders and receive automated Markdown/PDF daily reports.
Pricing: ¥99/month per store, multi-store support Target: Small-to-medium e-commerce sellers who can’t afford a full-time data analyst
Path C: Freelance Delivery (¥500-2000 per project)
Help clients with custom configuration: integrate with their order systems (Shopify, YouZan, QianNiu), customize report templates, set up automated email delivery.
Delivery time: 1-2 days Profit margin: 80%+ (mostly SQL and Python scripts, near-zero marginal cost)
Path D: Data Monitoring as a Service (Advanced)
Add anomaly detection on top of the daily report: when sales drop more than 20% suddenly, automatically send a Telegram message to the boss. This premium feature can be sold separately at ¥49/month.
10. Advanced Expansion Directions
Once clients are hooked on the basic daily report, you can expand:
- Weekly/Monthly reports: Aggregate daily data into weekly/monthly views with YoY/MoM trend visualization
- Competitor analysis: Use DuckDB’s
httpfsextension to scrape competitor prices and compare automatically - Customer profiling: Use DuckDB’s
UNNESTand list functions to analyze purchase patterns and build customer personas - Visual upgrade: Connect Markdown reports to Evidence or Grafana for interactive dashboards
- Multi-store aggregation: Use DuckDB’s
read_parquetwith glob patterns to process Parquet files from multiple stores simultaneously
Summary
The core advantages of this DuckDB e-commerce daily report system:
- 5 SQL queries cover 90% of daily report needs: sales overview, category analysis, city heatmap, repurchase rate, trend comparison
- Zero-dependency deployment: no database, web framework, or BI tool required
- Automated execution: cron + Python runs daily automatically, boss wakes up to a report
- Highly scalable: from 1 store to 100 stores, code hardly changes
For data service professionals, this is a classic “build once, sell repeatedly” money-making project. Mastering DuckDB in these scenarios lets you deliver freelance projects and build SaaS products 10x faster than competitors.
📖 The complete version of this article is published at duckdblab.org, including the full code files, Jinja2 template, and downloadable sample dataset.
💡 Want to systematically learn DuckDB for e-commerce analytics? duckdblab.org has a complete tutorial series from beginner to monetization, covering advanced topics like Parquet processing, window function mastery, and automated deployment.