Featured image of post DuckDB in Action: Time Series Analysis — date_trunc, generate_series & Rolling Aggregations

DuckDB in Action: Time Series Analysis — date_trunc, generate_series & Rolling Aggregations

A deep dive into DuckDB's time series analysis toolkit: date_trunc for timestamp bucketing, generate_series for gap filling, and window-based rolling aggregations with real-world e-commerce and server monitoring scenarios.

Why Time Series Analysis Matters

Time series data is everywhere in data analytics — daily e-commerce sales, server CPU metrics, hourly website traffic. Nearly every business scenario involves a time dimension.

Traditional databases face two major pain points when handling time series:

  1. Inconsistent timestamp granularity — Raw data has millisecond precision, requiring bucketing into minutes/hours/days
  2. Gaps in time series — Some time periods have no data, breaking continuity in visualizations

DuckDB offers three powerful tools to solve these problems elegantly: date_trunc, generate_series, and window-based rolling aggregation functions.

Time Series Analysis Pipeline

Fig: DuckDB Time Series Analysis 3-Stage Pipeline — Bucket → Fill → Roll


1. date_trunc: Timestamp Bucketing

date_trunc truncates a timestamp to a specified precision. It’s the starting point of any time series analysis — normalizing messy timestamps into uniform buckets.

Basic Syntax

date_trunc('unit', timestamp_column)

Supported unit values: 'second', 'minute', 'hour', 'day', 'week', 'month', 'quarter', 'year'.

Example: Truncate Millisecond Timestamps to Hours

SELECT 
    date_trunc('hour', TIMESTAMP '2026-06-03 14:32:18.123') AS hour_bucket,
    date_trunc('day', TIMESTAMP '2026-06-03 14:32:18.123') AS day_bucket,
    date_trunc('week', TIMESTAMP '2026-06-03 14:32:18.123') AS week_bucket,
    date_trunc('month', TIMESTAMP '2026-06-03 14:32:18.123') AS month_bucket;

Result:

┌─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
│     hour_bucket     │     day_bucket      │     week_bucket     │    month_bucket     │
├─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┤
│ 2026-06-03 14:00:00 │ 2026-06-03 00:00:00 │ 2026-06-01 00:00:00 │ 2026-06-01 00:00:00 │
└─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘

Note that the week_bucket starts on Monday (2026-06-01), as DuckDB defaults to ISO week (Monday start).


2. Real-World Scenario: E-Commerce Order Time Series

Let’s analyze hourly order counts and GMV (Gross Merchandise Volume) for an e-commerce platform.

Create Sample Data

CREATE TABLE orders AS
SELECT * FROM (VALUES
    (TIMESTAMP '2026-06-01 08:15:00', 1, 299.00),
    (TIMESTAMP '2026-06-01 08:42:00', 2, 159.00),
    (TIMESTAMP '2026-06-01 09:05:00', 3, 899.00),
    (TIMESTAMP '2026-06-01 09:30:00', 4, 49.90),
    (TIMESTAMP '2026-06-01 09:55:00', 5, 1299.00),
    (TIMESTAMP '2026-06-01 10:10:00', 6, 79.00),
    (TIMESTAMP '2026-06-01 10:45:00', 7, 520.00),
    (TIMESTAMP '2026-06-01 11:20:00', 8, 89.90),
    (TIMESTAMP '2026-06-01 13:00:00', 9, 249.00),
    (TIMESTAMP '2026-06-01 13:35:00', 10, 168.00),
    (TIMESTAMP '2026-06-01 14:10:00', 11, 399.00),
    (TIMESTAMP '2026-06-01 14:50:00', 12, 79.90),
    (TIMESTAMP '2026-06-01 15:25:00', 13, 1899.00),
    (TIMESTAMP '2026-06-01 16:00:00', 14, 45.00),
    (TIMESTAMP '2026-06-01 16:40:00', 15, 599.00),
    (TIMESTAMP '2026-06-02 08:30:00', 16, 129.00),
    (TIMESTAMP '2026-06-02 09:00:00', 17, 799.00),
    (TIMESTAMP '2026-06-02 09:20:00', 18, 39.00),
    (TIMESTAMP '2026-06-02 10:15:00', 19, 2399.00),
    (TIMESTAMP '2026-06-02 11:50:00', 20, 89.00)
) AS t(order_time, order_id, amount);

Hourly Aggregation

SELECT 
    date_trunc('hour', order_time) AS hour_bucket,
    COUNT(*) AS order_count,
    ROUND(SUM(amount), 2) AS total_gmv,
    ROUND(AVG(amount), 2) AS avg_order_value
FROM orders
GROUP BY hour_bucket
ORDER BY hour_bucket;

Result:

┌─────────────────────┬─────────────┬───────────┬─────────────────┐
│     hour_bucket     │ order_count │ total_gmv │ avg_order_value │
├─────────────────────┼─────────────┼───────────┼─────────────────┤
│ 2026-06-01 08:00:00 │     2       │   458.00  │     229.00      │
│ 2026-06-01 09:00:00 │     3       │  2247.90  │     749.30      │
│ 2026-06-01 10:00:00 │     2       │   599.00  │     299.50      │
│ 2026-06-01 11:00:00 │     1       │    89.90  │      89.90      │
│ 2026-06-01 13:00:00 │     2       │   417.00  │     208.50      │
│ 2026-06-01 14:00:00 │     2       │   478.90  │     239.45      │
│ 2026-06-01 15:00:00 │     1       │  1899.00  │    1899.00      │
│ 2026-06-01 16:00:00 │     2       │   644.00  │     322.00      │
│ 2026-06-02 08:00:00 │     1       │   129.00  │     129.00      │
│ 2026-06-02 09:00:00 │     2       │   838.00  │     419.00      │
│ 2026-06-02 10:00:00 │     1       │  2399.00  │    2399.00      │
│ 2026-06-02 11:00:00 │     1       │    89.00  │      89.00      │
└─────────────────────┴─────────────┴───────────┴─────────────────┘

Notice the problem? June 1st has no data at 12:00 and 17:00-23:00 — the time series has gaps. If you were to chart this directly, the graph would break.


3. generate_series: Filling Time Gaps

generate_series creates a continuous time sequence, which can be used with a LEFT JOIN to fill gaps.

Generate a Continuous Hourly Series

SELECT 
    generate_series AS hour_bucket
FROM generate_series(
    TIMESTAMP '2026-06-01 08:00:00',
    TIMESTAMP '2026-06-02 12:00:00',
    INTERVAL '1 hour'
);

LEFT JOIN to Fill Gaps

WITH hours AS (
    SELECT generate_series AS hour_bucket
    FROM generate_series(
        TIMESTAMP '2026-06-01 08:00:00',
        TIMESTAMP '2026-06-02 12:00:00',
        INTERVAL '1 hour'
    )
),
hourly_stats AS (
    SELECT 
        date_trunc('hour', order_time) AS hour_bucket,
        COUNT(*) AS order_count,
        ROUND(SUM(amount), 2) AS total_gmv
    FROM orders
    GROUP BY hour_bucket
)
SELECT 
    h.hour_bucket,
    COALESCE(s.order_count, 0) AS order_count,
    COALESCE(s.total_gmv, 0.00) AS total_gmv
FROM hours h
LEFT JOIN hourly_stats s ON h.hour_bucket = s.hour_bucket
ORDER BY h.hour_bucket;

Result:

┌─────────────────────┬─────────────┬───────────┐
│     hour_bucket     │ order_count │ total_gmv │
├─────────────────────┼─────────────┼───────────┤
│ 2026-06-01 08:00:00 │     2       │   458.00  │
│ 2026-06-01 09:00:00 │     3       │  2247.90  │
│ 2026-06-01 10:00:00 │     2       │   599.00  │
│ 2026-06-01 11:00:00 │     1       │    89.90  │
│ 2026-06-01 12:00:00 │     0       │     0.00  │  ← gap filled
│ 2026-06-01 13:00:00 │     2       │   417.00  │
│ 2026-06-01 14:00:00 │     2       │   478.90  │
│ 2026-06-01 15:00:00 │     1       │  1899.00  │
│ 2026-06-01 16:00:00 │     2       │   644.00  │
│ 2026-06-01 17:00:00 │     0       │     0.00  │  ← gap filled
│ 2026-06-01 18:00:00 │     0       │     0.00  │
│ ...                 │   ...       │    ...    │
│ 2026-06-02 08:00:00 │     1       │   129.00  │
│ 2026-06-02 09:00:00 │     2       │   838.00  │
│ 2026-06-02 10:00:00 │     1       │  2399.00  │
│ 2026-06-02 11:00:00 │     1       │    89.00  │
│ 2026-06-02 12:00:00 │     0       │     0.00  │
└─────────────────────┴─────────────┴───────────┘

DuckDB CLI Output

Fig: Hourly order aggregation — gaps at 12:00 and 17:00-23:00 filled with zeros by generate_series

The time series is now fully continuous, ready for charting.


4. Rolling Aggregations: Moving Averages & Cumulative Statistics

Rolling aggregation is the core technique for time series analysis, smoothing short-term fluctuations and revealing long-term trends.

4.1 3-Hour Moving Average

Use the ROWS BETWEEN window clause for sliding window calculations:

WITH hourly_gmv AS (
    SELECT 
        date_trunc('hour', order_time) AS hour_bucket,
        ROUND(SUM(amount), 2) AS gmv
    FROM orders
    GROUP BY hour_bucket
)
SELECT 
    hour_bucket,
    gmv,
    ROUND(AVG(gmv) OVER (
        ORDER BY hour_bucket 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS moving_avg_3h
FROM hourly_gmv
ORDER BY hour_bucket;

Result:

┌─────────────────────┬────────┬───────────────┐
│     hour_bucket     │  gmv   │ moving_avg_3h │
├─────────────────────┼────────┼───────────────┤
│ 2026-06-01 08:00:00 │ 458.00 │    458.00     │
│ 2026-06-01 09:00:00 │2247.90 │   1352.95     │
│ 2026-06-01 10:00:00 │ 599.00 │   1101.63     │
│ 2026-06-01 11:00:00 │  89.90 │    978.93     │
│ 2026-06-01 13:00:00 │ 417.00 │    368.63     │
│ 2026-06-01 14:00:00 │ 478.90 │    328.60     │
│ 2026-06-01 15:00:00 │1899.00 │    931.63     │
│ 2026-06-01 16:00:00 │ 644.00 │   1007.30     │
│ 2026-06-02 08:00:00 │ 129.00 │    890.67     │
│ 2026-06-02 09:00:00 │ 838.00 │    537.00     │
│ 2026-06-02 10:00:00 │2399.00 │   1122.00     │
│ 2026-06-02 11:00:00 │  89.00 │   1108.67     │
└─────────────────────┴────────┴───────────────┘

The moving average smooths out the data; the large single order of 1899 at 15:00 gets distributed across the surrounding windows.

4.2 Cumulative Values (YTD / MTD)

WITH daily_gmv AS (
    SELECT 
        date_trunc('day', order_time) AS day_bucket,
        ROUND(SUM(amount), 2) AS daily_gmv
    FROM orders
    GROUP BY day_bucket
)
SELECT 
    day_bucket,
    daily_gmv,
    ROUND(SUM(daily_gmv) OVER (
        ORDER BY day_bucket
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ), 2) AS cumulative_gmv
FROM daily_gmv
ORDER BY day_bucket;

Result:

┌─────────────────────┬───────────┬─────────────────┐
│     day_bucket      │ daily_gmv │ cumulative_gmv  │
├─────────────────────┼───────────┼─────────────────┤
│ 2026-06-01 00:00:00 │  7089.70  │    7089.70      │
│ 2026-06-02 00:00:00 │  3455.00  │   10544.70      │
└─────────────────────┴───────────┴─────────────────┘

4.3 Month-over-Month Change Rate

WITH hourly_gmv AS (
    SELECT 
        date_trunc('hour', order_time) AS hour_bucket,
        ROUND(SUM(amount), 2) AS gmv
    FROM orders
    GROUP BY hour_bucket
)
SELECT 
    hour_bucket,
    gmv,
    ROUND(LAG(gmv) OVER (ORDER BY hour_bucket), 2) AS prev_hour_gmv,
    ROUND((gmv - LAG(gmv) OVER (ORDER BY hour_bucket)) / 
          NULLIF(LAG(gmv) OVER (ORDER BY hour_bucket), 0) * 100, 2) AS mom_change_pct
FROM hourly_gmv
ORDER BY hour_bucket;

Result:

┌─────────────────────┬────────┬──────────────┬─────────────────┐
│     hour_bucket     │  gmv   │ prev_hour_gmv│ mom_change_pct  │
├─────────────────────┼────────┼──────────────┼─────────────────┤
│ 2026-06-01 08:00:00 │ 458.00 │     NULL     │      NULL       │
│ 2026-06-01 09:00:00 │2247.90 │    458.00    │     390.81      │
│ 2026-06-01 10:00:00 │ 599.00 │   2247.90    │     -73.35      │
│ 2026-06-01 11:00:00 │  89.90 │    599.00    │     -84.99      │
│ 2026-06-01 13:00:00 │ 417.00 │     89.90    │     363.85      │
│ 2026-06-01 14:00:00 │ 478.90 │    417.00    │      14.84      │
│ 2026-06-01 15:00:00 │1899.00 │    478.90    │     296.53      │
│ 2026-06-01 16:00:00 │ 644.00 │   1899.00    │     -66.09      │
└─────────────────────┴────────┴──────────────┴─────────────────┘

5. Advanced Example: Server Monitoring Metrics

Let’s analyze a server monitoring scenario — CPU usage over the past 7 days.

Create Simulated Monitoring Data

CREATE TABLE cpu_metrics AS
SELECT 
    '2026-06-03 08:00:00'::TIMESTAMP + INTERVAL (i || ' minutes') AS ts,
    30 + random() * 40 AS cpu_usage,
    1024 + random() * 2048 AS memory_mb
FROM generate_series(0, 1439) AS t(i);  -- 24 hours of minute-level data

15-Minute Rolling Window Analysis

SELECT 
    date_trunc('hour', ts) AS hour_bucket,
    ROUND(AVG(cpu_usage), 1) AS avg_cpu,
    ROUND(MAX(cpu_usage), 1) AS max_cpu,
    ROUND(MIN(cpu_usage), 1) AS min_cpu,
    ROUND(AVG(memory_mb), 0) AS avg_memory_mb
FROM cpu_metrics
GROUP BY hour_bucket
ORDER BY hour_bucket;

Sliding Window Anomaly Detection

Use LAG and LEAD for CPU spike detection:

WITH cpu_by_minute AS (
    SELECT 
        ts,
        ROUND(cpu_usage, 1) AS cpu_usage,
        ROUND(AVG(cpu_usage) OVER (
            ORDER BY ts 
            ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
        ), 1) AS baseline
    FROM cpu_metrics
)
SELECT 
    ts,
    cpu_usage,
    baseline,
    ROUND(cpu_usage - baseline, 1) AS deviation
FROM cpu_by_minute
WHERE cpu_usage > baseline * 1.5  -- 50% above baseline = anomaly
ORDER BY deviation DESC
LIMIT 10;

6. Performance Optimization Tips

When processing large-scale time series data, these techniques can significantly improve query performance:

TechniqueDescriptionBest For
Table PartitioningPartition by time range (daily/monthly)Millions+ rows
Materialized AggregationsUse CREATE TABLE AS to pre-aggregate hourly statsFixed-granularity repeated queries
OrderingORDER BY ts on timestamp columnFrequent range scans
Parquet PartitioningStore by year/month/dayHot/cold data separation

Use ORDER BY to Accelerate Timestamp Scans

-- Adding ORDER BY during table creation speeds up range queries
CREATE TABLE orders_partitioned AS
SELECT * FROM read_parquet('orders.parquet')
ORDER BY order_time;

Summary

DuckDB’s time series analysis capabilities can be summarized in three layers:

  1. Bucketing (date_trunc) — Normalize fine-grained timestamps into uniform buckets, the starting point of any time series analysis
  2. Gap Filling (generate_series) — Generate continuous time sequences and LEFT JOIN to fill data gaps, ensuring complete charts
  3. Rolling Analysis (Window Functions) — Moving averages, cumulative values, and MoM changes to reveal underlying trends

Combined, these three tools handle over 90% of time series analysis needs. Whether it’s e-commerce GMV monitoring, server alerting, or IoT sensor data processing, this methodology applies.


💡 Try it yourself! Replace the orders table above with your own business data (logs, transactions, sensor readings) and run the full time series analysis pipeline immediately.

For more DuckDB in Action tips, follow DuckDB Lab (duckdblab.org)

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy