Featured image of post Build a Price Monitoring & Arbitrage Discovery System with DuckDB

Build a Price Monitoring & Arbitrage Discovery System with DuckDB

Build an e-commerce price monitoring and arbitrage discovery system using DuckDB. Cross-platform price comparison, spread analysis, and automated alerts — a money-making pipeline anyone can run.

Build a Price Monitoring & Arbitrage Discovery System with DuckDB

Price Monitoring Architecture

Why This Direction Makes Money

E-commerce price monitoring and arbitrage discovery is a severely undervalued money-making direction.

Look at shopping groups, cashback communities, deal-finding platforms like “Smzdm” — the essence of these platforms is someone helps you monitor prices, finds low prices, and tells you when to buy. Behind most of these platforms are expensive commercial tools or cumbersome ETL frameworks.

If you can build an automated price monitoring system with DuckDB, you can do three things:

  1. Build a Price Monitoring SaaS: Offer competitor price monitoring services for Taobao/Pinduoduo sellers, charged monthly
  2. Build a Cashback Affiliate Business: Find low-priced items → generate recommendations → monetize through affiliate links
  3. Build E-commerce Data Products: Package price data and sell to brands, distributors, and investment firms

The key is — the core technology stack for this system only requires Python + DuckDB, with zero database servers needed.

System Architecture

Data Collection → DuckDB Storage → SQL Analysis → Monitoring Reports → Telegram/WeChat Alerts

The entire workflow is driven by SQL, without writing a single line of Python business logic. DuckDB’s advantages include:

  • Columnar storage: Extremely fast aggregation queries when price data accumulates as time series
  • Built-in time series functions: LAG, LEAD, window functions ready to use
  • Zero operations: A single .duckdb file handles all data storage
  • UNION ALL cross-table queries: Easily merge price data from multiple platforms

Complete Code Implementation

Step 1: Initialize Database and Product Catalog

import duckdb
import os

# Connect to database (auto-creates if not exists)
db_path = "/tmp/price_monitor.duckdb"
if os.path.exists(db_path):
    os.remove(db_path)

con = duckdb.connect(db_path)

# Create product catalog table
con.execute("""
CREATE TABLE products (
    product_id VARCHAR PRIMARY KEY,
    name VARCHAR,
    category VARCHAR,
    base_price DOUBLE,
    target_margin DOUBLE DEFAULT 0.05
);

INSERT INTO products VALUES
    ('P001', 'iPhone 15 Pro Max 256GB', 'Phones', 8999.00),
    ('P002', 'Samsung Galaxy S24 Ultra', 'Phones', 7999.00),
    ('P003', 'MacBook Air M3 13-inch', 'Laptops', 8999.00),
    ('P004', 'iPad Air M2 256GB', 'Tablets', 5499.00),
    ('P005', 'Sony WH-1000XM5', 'Headphones', 2499.00),
    ('P006', 'AirPods Pro 2', 'Headphones', 1899.00),
    ('P007', 'Nintendo Switch OLED', 'Gaming', 2349.00),
    ('P008', 'DJI Mini 4 Pro', 'Drones', 4788.00),
    ('P009', 'Logitech MX Master 3S', 'Mice', 799.00),
    ('P010', 'Keychron Q1 Pro', 'Keyboards', 1299.00);
""")

Step 2: Collect Prices from Multiple Platforms

# Taobao prices
con.execute("""
CREATE TABLE prices_taobao (
    product_id VARCHAR, price DOUBLE,
    seller VARCHAR,
    captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO prices_taobao VALUES
    ('P001', 8299.00, 'Taobao Flagship A', CURRENT_TIMESTAMP),
    ('P002', 7499.00, 'Taobao Shop B', CURRENT_TIMESTAMP),
    ('P003', 8199.00, 'Taobao Flagship C', CURRENT_TIMESTAMP),
    ('P004', 5199.00, 'Taobao Shop D', CURRENT_TIMESTAMP),
    ('P005', 2199.00, 'Taobao Shop E', CURRENT_TIMESTAMP),
    ('P006', 1699.00, 'Taobao Flagship F', CURRENT_TIMESTAMP),
    ('P007', 2099.00, 'Taobao Shop G', CURRENT_TIMESTAMP),
    ('P008', 4388.00, 'Taobao Shop H', CURRENT_TIMESTAMP),
    ('P009', 649.00, 'Taobao Shop I', CURRENT_TIMESTAMP),
    ('P010', 1099.00, 'Taobao Shop J', CURRENT_TIMESTAMP);
""")

# JD.com prices
con.execute("""
CREATE TABLE prices_jd (
    product_id VARCHAR, price DOUBLE,
    seller VARCHAR,
    captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO prices_jd VALUES
    ('P001', 8499.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P002', 7699.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P003', 8399.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P004', 5299.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P005', 2299.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P006', 1799.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P007', 2199.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P008', 4588.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P009', 699.00, 'JD Self-operated', CURRENT_TIMESTAMP),
    ('P010', 1199.00, 'JD Self-operated', CURRENT_TIMESTAMP);
""")

# Pinduoduo prices
con.execute("""
CREATE TABLE prices_pdd (
    product_id VARCHAR, price DOUBLE,
    seller VARCHAR,
    captured_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO prices_pdd VALUES
    ('P001', 7899.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P002', 7199.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P003', 7799.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P004', 4899.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P005', 1999.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P006', 1599.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P007', 1899.00, 'PDD Subsidy', CURRENT_TIMESTAMP),
    ('P008', 4188.00, 'PDD Shop L', CURRENT_TIMESTAMP),
    ('P009', 549.00, 'PDD Shop M', CURRENT_TIMESTAMP),
    ('P010', 949.00, 'PDD Subsidy', CURRENT_TIMESTAMP);
""")

Step 3: Cross-Platform Price Comparison

price_comparison = con.execute("""
    SELECT 
        p.product_id, p.name, p.category, p.base_price,
        (SELECT MIN(price) FROM prices_taobao t WHERE t.product_id = p.product_id) as taobao_min,
        (SELECT MIN(price) FROM prices_jd j WHERE j.product_id = p.product_id) as jd_min,
        (SELECT MIN(price) FROM prices_pdd pr WHERE pr.product_id = p.product_id) as pdd_min,
        LEAST(
            COALESCE((SELECT MIN(price) FROM prices_taobao t WHERE t.product_id = p.product_id), 999999),
            COALESCE((SELECT MIN(price) FROM prices_jd j WHERE j.product_id = p.product_id), 999999),
            COALESCE((SELECT MIN(price) FROM prices_pdd pr WHERE pr.product_id = p.product_id), 999999)
        ) as global_min,
        GREATEST(
            COALESCE((SELECT MIN(price) FROM prices_taobao t WHERE t.product_id = p.product_id), 0),
            COALESCE((SELECT MIN(price) FROM prices_jd j WHERE j.product_id = p.product_id), 0),
            COALESCE((SELECT MIN(price) FROM prices_pdd pr WHERE pr.product_id = p.product_id), 0)
        ) as global_max
    FROM products p
    ORDER BY 
        (GREATEST(...) - LEAST(...)) DESC
""").fetchdf()

Step 4: Spread Analysis and Arbitrage Opportunities

arbitrage_opportunities = con.execute("""
    WITH platform_prices AS (
        SELECT product_id, 'taobao' as platform, MIN(price) as min_price FROM prices_taobao GROUP BY product_id
        UNION ALL
        SELECT product_id, 'jd', MIN(price) FROM prices_jd GROUP BY product_id
        UNION ALL
        SELECT product_id, 'pdd', MIN(price) FROM prices_pdd GROUP BY product_id
    ),
    price_spreads AS (
        SELECT 
            pp1.product_id,
            pp1.platform as buy_platform,
            pp1.min_price as buy_price,
            pp2.platform as sell_platform,
            pp2.min_price as sell_price,
            pp2.min_price - pp1.min_price as spread,
            ROUND((pp2.min_price - pp1.min_price) / pp1.min_price * 100, 2) as spread_pct
        FROM platform_prices pp1
        JOIN platform_prices pp2 
            ON pp1.product_id = pp2.product_id 
            AND pp1.platform != pp2.platform
    )
    SELECT * FROM price_spreads 
    WHERE spread > 0 
    ORDER BY spread_pct DESC
    LIMIT 20
""").fetchdf()

Step 5: Historical Price Trend Analysis

# Simulate historical price data
con.execute("""
CREATE TABLE price_history (
    product_id VARCHAR,
    platform VARCHAR,
    price DOUBLE,
    captured_at TIMESTAMP
);

INSERT INTO price_history VALUES
    ('P001', 'pdd', 8299.00, DATE '2026-06-15'),
    ('P001', 'pdd', 8099.00, DATE '2026-06-16'),
    ('P001', 'pdd', 7999.00, DATE '2026-06-17'),
    ('P001', 'pdd', 7899.00, DATE '2026-06-18'),
    ('P001', 'pdd', 7899.00, DATE '2026-06-19'),
    ('P001', 'pdd', 7899.00, DATE '2026-06-20'),
    ('P001', 'pdd', 7899.00, DATE '2026-06-21'),
    ('P001', 'pdd', 7899.00, DATE '2026-06-22');
""")

# Use LAG to calculate daily changes
trend_analysis = con.execute("""
    SELECT 
        product_id,
        platform,
        captured_at,
        price,
        LAG(price) OVER (PARTITION BY product_id, platform ORDER BY captured_at) as prev_price,
        price - LAG(price) OVER (PARTITION BY product_id, platform ORDER BY captured_at) as daily_change,
        ROUND(price / LAG(price) OVER (PARTITION BY product_id, platform ORDER BY captured_at) * 100 - 100, 2) as change_pct
    FROM price_history
    WHERE product_id = 'P001' AND platform = 'pdd'
    ORDER BY captured_at
""").fetchdf()

Comparison with Traditional Tools

ToolDeployment DifficultyQuery Speed (1M rows)Memory UsageBest For
MySQL + PythonHighMediumHighOnline services
PostgreSQL + PythonHighMediumHighComplex transactions
Pandas + CSVLowSlow (full load)Very HighSmall datasets
DuckDBVery LowVery FastLowAnalytics/Monitoring

DuckDB’s core advantage lies in columnar storage + vectorized execution + zero operations. For price monitoring scenarios that are read-heavy, require frequent aggregation analysis, DuckDB is more efficient than traditional relational databases and more memory-efficient than Pandas.

Three Monetization Paths in Detail

Path 1: Price Monitoring SaaS (Monthly Income: $3000-$10000)

Competitor price monitoring services for e-commerce sellers:

  • Charge 99-299 RMB per month subscription
  • Provide dashboards showing competitor price changes
  • Automatic push notifications for price anomalies

Path 2: Affiliate Cashback (Monthly Income: $2000-$5000)

Find low-priced items → generate recommendations → monetize through affiliate links:

  • Monitor low-priced items on Pinduoduo/Taobao
  • Automatically generate recommendation copy
  • Embed affiliate links, earn commission on purchases

Path 3: E-commerce Data Products (Monthly Income: $5000-$20000)

Package price data and sell to brands, distributors, and investment firms:

  • Regular category price analysis reports
  • Competitor price trend data
  • Customized data products

Summary

Building a price monitoring system with DuckDB, the biggest advantage is simplicity, speed, and zero operations. No need to set up complex database clusters — a single .duckdb file handles all data storage and analysis. Combined with Python’s data collection capabilities, one person can run the complete loop from data collection to monetization.

The core value of this system is: achieving the highest data efficiency at the lowest cost. When you can discover pricing opportunities faster than others, you’ve built a moat around your ability to make money.


📖 The full version of this article is published on duckdblab.org, with more detailed steps and additional case studies. Learn more DuckDB practical experience → duckdblab.org

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.