Build a Price Monitoring & Arbitrage Discovery System with DuckDB

Why This Direction Makes Money
E-commerce price monitoring and arbitrage discovery is a severely undervalued money-making direction.
Look at shopping groups, cashback communities, deal-finding platforms like “Smzdm” — the essence of these platforms is someone helps you monitor prices, finds low prices, and tells you when to buy. Behind most of these platforms are expensive commercial tools or cumbersome ETL frameworks.
If you can build an automated price monitoring system with DuckDB, you can do three things:
- Build a Price Monitoring SaaS: Offer competitor price monitoring services for Taobao/Pinduoduo sellers, charged monthly
- Build a Cashback Affiliate Business: Find low-priced items → generate recommendations → monetize through affiliate links
- Build E-commerce Data Products: Package price data and sell to brands, distributors, and investment firms
The key is — the core technology stack for this system only requires Python + DuckDB, with zero database servers needed.
System Architecture
Data Collection → DuckDB Storage → SQL Analysis → Monitoring Reports → Telegram/WeChat Alerts
The entire workflow is driven by SQL, without writing a single line of Python business logic. DuckDB’s advantages include:
- Columnar storage: Extremely fast aggregation queries when price data accumulates as time series
- Built-in time series functions: LAG, LEAD, window functions ready to use
- Zero operations: A single
.duckdbfile handles all data storage - UNION ALL cross-table queries: Easily merge price data from multiple platforms
Complete Code Implementation
Step 1: Initialize Database and Product Catalog
import duckdb
import os
# Connect to database (auto-creates if not exists)
db_path = "/tmp/price_monitor.duckdb"
if os.path.exists(db_path):
os.remove(db_path)
con = duckdb.connect(db_path)
# Create product catalog table
con.execute("""
CREATE TABLE products (
product_id VARCHAR PRIMARY KEY,
name VARCHAR,
category VARCHAR,
base_price DOUBLE,
target_margin DOUBLE DEFAULT 0.05
);
INSERT INTO products VALUES
('P001', 'iPhone 15 Pro Max 256GB', 'Phones', 8999.00),
('P002', 'Samsung Galaxy S24 Ultra', 'Phones', 7999.00),
('P003', 'MacBook Air M3 13-inch', 'Laptops', 8999.00),
('P004', 'iPad Air M2 256GB', 'Tablets', 5499.00),
('P005', 'Sony WH-1000XM5', 'Headphones', 2499.00),
('P006', 'AirPods Pro 2', 'Headphones', 1899.00),
('P007', 'Nintendo Switch OLED', 'Gaming', 2349.00),
('P008', 'DJI Mini 4 Pro', 'Drones', 4788.00),
('P009', 'Logitech MX Master 3S', 'Mice', 799.00),
('P010', 'Keychron Q1 Pro', 'Keyboards', 1299.00);
""")
Step 2: Collect Prices from Multiple Platforms
# Taobao prices
con.execute("""
CREATE TABLE prices_taobao (
product_id VARCHAR, price DOUBLE,
seller VARCHAR,
captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO prices_taobao VALUES
('P001', 8299.00, 'Taobao Flagship A', CURRENT_TIMESTAMP),
('P002', 7499.00, 'Taobao Shop B', CURRENT_TIMESTAMP),
('P003', 8199.00, 'Taobao Flagship C', CURRENT_TIMESTAMP),
('P004', 5199.00, 'Taobao Shop D', CURRENT_TIMESTAMP),
('P005', 2199.00, 'Taobao Shop E', CURRENT_TIMESTAMP),
('P006', 1699.00, 'Taobao Flagship F', CURRENT_TIMESTAMP),
('P007', 2099.00, 'Taobao Shop G', CURRENT_TIMESTAMP),
('P008', 4388.00, 'Taobao Shop H', CURRENT_TIMESTAMP),
('P009', 649.00, 'Taobao Shop I', CURRENT_TIMESTAMP),
('P010', 1099.00, 'Taobao Shop J', CURRENT_TIMESTAMP);
""")
# JD.com prices
con.execute("""
CREATE TABLE prices_jd (
product_id VARCHAR, price DOUBLE,
seller VARCHAR,
captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO prices_jd VALUES
('P001', 8499.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P002', 7699.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P003', 8399.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P004', 5299.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P005', 2299.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P006', 1799.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P007', 2199.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P008', 4588.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P009', 699.00, 'JD Self-operated', CURRENT_TIMESTAMP),
('P010', 1199.00, 'JD Self-operated', CURRENT_TIMESTAMP);
""")
# Pinduoduo prices
con.execute("""
CREATE TABLE prices_pdd (
product_id VARCHAR, price DOUBLE,
seller VARCHAR,
captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO prices_pdd VALUES
('P001', 7899.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P002', 7199.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P003', 7799.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P004', 4899.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P005', 1999.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P006', 1599.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P007', 1899.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
('P008', 4188.00, 'PDD Shop L', CURRENT_TIMESTAMP),
('P009', 549.00, 'PDD Shop M', CURRENT_TIMESTAMP),
('P010', 949.00, 'PDD Subsidy', CURRENT_TIMESTAMP);
""")
Step 3: Cross-Platform Price Comparison
price_comparison = con.execute("""
SELECT
p.product_id, p.name, p.category, p.base_price,
(SELECT MIN(price) FROM prices_taobao t WHERE t.product_id = p.product_id) as taobao_min,
(SELECT MIN(price) FROM prices_jd j WHERE j.product_id = p.product_id) as jd_min,
(SELECT MIN(price) FROM prices_pdd pr WHERE pr.product_id = p.product_id) as pdd_min,
LEAST(
COALESCE((SELECT MIN(price) FROM prices_taobao t WHERE t.product_id = p.product_id), 999999),
COALESCE((SELECT MIN(price) FROM prices_jd j WHERE j.product_id = p.product_id), 999999),
COALESCE((SELECT MIN(price) FROM prices_pdd pr WHERE pr.product_id = p.product_id), 999999)
) as global_min,
GREATEST(
COALESCE((SELECT MIN(price) FROM prices_taobao t WHERE t.product_id = p.product_id), 0),
COALESCE((SELECT MIN(price) FROM prices_jd j WHERE j.product_id = p.product_id), 0),
COALESCE((SELECT MIN(price) FROM prices_pdd pr WHERE pr.product_id = p.product_id), 0)
) as global_max
FROM products p
ORDER BY
(GREATEST(...) - LEAST(...)) DESC
""").fetchdf()
Step 4: Spread Analysis and Arbitrage Opportunities
arbitrage_opportunities = con.execute("""
WITH platform_prices AS (
SELECT product_id, 'taobao' as platform, MIN(price) as min_price FROM prices_taobao GROUP BY product_id
UNION ALL
SELECT product_id, 'jd', MIN(price) FROM prices_jd GROUP BY product_id
UNION ALL
SELECT product_id, 'pdd', MIN(price) FROM prices_pdd GROUP BY product_id
),
price_spreads AS (
SELECT
pp1.product_id,
pp1.platform as buy_platform,
pp1.min_price as buy_price,
pp2.platform as sell_platform,
pp2.min_price as sell_price,
pp2.min_price - pp1.min_price as spread,
ROUND((pp2.min_price - pp1.min_price) / pp1.min_price * 100, 2) as spread_pct
FROM platform_prices pp1
JOIN platform_prices pp2
ON pp1.product_id = pp2.product_id
AND pp1.platform != pp2.platform
)
SELECT * FROM price_spreads
WHERE spread > 0
ORDER BY spread_pct DESC
LIMIT 20
""").fetchdf()
Step 5: Historical Price Trend Analysis
# Simulate historical price data
con.execute("""
CREATE TABLE price_history (
product_id VARCHAR,
platform VARCHAR,
price DOUBLE,
captured_at TIMESTAMP
);
INSERT INTO price_history VALUES
('P001', 'pdd', 8299.00, DATE '2026-06-15'),
('P001', 'pdd', 8099.00, DATE '2026-06-16'),
('P001', 'pdd', 7999.00, DATE '2026-06-17'),
('P001', 'pdd', 7899.00, DATE '2026-06-18'),
('P001', 'pdd', 7899.00, DATE '2026-06-19'),
('P001', 'pdd', 7899.00, DATE '2026-06-20'),
('P001', 'pdd', 7899.00, DATE '2026-06-21'),
('P001', 'pdd', 7899.00, DATE '2026-06-22');
""")
# Use LAG to calculate daily changes
trend_analysis = con.execute("""
SELECT
product_id,
platform,
captured_at,
price,
LAG(price) OVER (PARTITION BY product_id, platform ORDER BY captured_at) as prev_price,
price - LAG(price) OVER (PARTITION BY product_id, platform ORDER BY captured_at) as daily_change,
ROUND(price / LAG(price) OVER (PARTITION BY product_id, platform ORDER BY captured_at) * 100 - 100, 2) as change_pct
FROM price_history
WHERE product_id = 'P001' AND platform = 'pdd'
ORDER BY captured_at
""").fetchdf()
Comparison with Traditional Tools
| Tool | Deployment Difficulty | Query Speed (1M rows) | Memory Usage | Best For |
|---|---|---|---|---|
| MySQL + Python | High | Medium | High | Online services |
| PostgreSQL + Python | High | Medium | High | Complex transactions |
| Pandas + CSV | Low | Slow (full load) | Very High | Small datasets |
| DuckDB | Very Low | Very Fast | Low | Analytics/Monitoring |
DuckDB’s core advantage lies in columnar storage + vectorized execution + zero operations. For price monitoring scenarios that are read-heavy, require frequent aggregation analysis, DuckDB is more efficient than traditional relational databases and more memory-efficient than Pandas.
Three Monetization Paths in Detail
Path 1: Price Monitoring SaaS (Monthly Income: $3000-$10000)
Competitor price monitoring services for e-commerce sellers:
- Charge 99-299 RMB per month subscription
- Provide dashboards showing competitor price changes
- Automatic push notifications for price anomalies
Path 2: Affiliate Cashback (Monthly Income: $2000-$5000)
Find low-priced items → generate recommendations → monetize through affiliate links:
- Monitor low-priced items on Pinduoduo/Taobao
- Automatically generate recommendation copy
- Embed affiliate links, earn commission on purchases
Path 3: E-commerce Data Products (Monthly Income: $5000-$20000)
Package price data and sell to brands, distributors, and investment firms:
- Regular category price analysis reports
- Competitor price trend data
- Customized data products
Summary
Building a price monitoring system with DuckDB, the biggest advantage is simplicity, speed, and zero operations. No need to set up complex database clusters — a single .duckdb file handles all data storage and analysis. Combined with Python’s data collection capabilities, one person can run the complete loop from data collection to monetization.
The core value of this system is: achieving the highest data efficiency at the lowest cost. When you can discover pricing opportunities faster than others, you’ve built a moat around your ability to make money.
📖 The full version of this article is published on duckdblab.org, with more detailed steps and additional case studies. Learn more DuckDB practical experience → duckdblab.org