Featured image of post Completing Missing Time Series in DuckDB: generate_series + LAST_VALUE in Action

Completing Missing Time Series in DuckDB: generate_series + LAST_VALUE in Action

Master three time series completion strategies in DuckDB—zero-filling, forward fill, and linear interpolation—to solve missing data problems in reports.

Completing Missing Time Series in DuckDB: generate_series + LAST_VALUE in Action

Your boss wants “daily sales data for last week,” but your table only has dates with transactions. What do you do with the missing days?

This is one of the most classic problems data analysts face: gaps in time series data.

When you export sales records from a business database, typically only dates with actual transactions appear. Days with zero activity simply don’t exist in the table. But your boss wants data for every single day — even days with zero sales need to show up.

This article covers three time series completion strategies in DuckDB, from simple zero-filling to advanced forward fill and linear interpolation.

The Problem Scenario: Missing Sales Data

Suppose you have a sales table recording daily revenue:

CREATE TABLE sales AS
SELECT * FROM VALUES
  ('2024-01-01', 1500),
  ('2024-01-02', 2300),
  ('2024-01-04', 1800),
  ('2024-01-05', 3200)
AS t(day, amount);

Notice: January 3rd has no transaction record, so it doesn’t exist in the table.

If you aggregate directly by date, you get:

day        | amount
-----------+-------
2024-01-01 | 1500
2024-01-02 | 2300
2024-01-04 | 1800
2024-01-05 | 3200

The gap on Jan 3 breaks trend lines, skews moving averages, and creates visual discontinuities in charts.

Step One: Generate a Complete Date Sequence

DuckDB’s generate_series() function is the starting point for solving this problem. It can produce a continuous date sequence:

WITH dates AS (
  SELECT unnest(generate_series(
    DATE '2024-01-01', 
    DATE '2024-01-05', 
    INTERVAL '1' DAY
  )) AS day
)
SELECT * FROM dates;

Output:

        day        
-------------------
 2024-01-01
 2024-01-02
 2024-01-03
 2024-01-04
 2024-01-05

Key points:

  • generate_series(start, end, step) produces a sequence from start to end
  • unnest() expands the generated array into rows
  • Intervals can be INTERVAL '1' DAY, INTERVAL '1' HOUR, etc.

Step Two: LEFT JOIN to Fill Data

After generating a complete date sequence, use LEFT JOIN to attach actual sales data:

WITH dates AS (
  SELECT unnest(generate_series(
    DATE '2024-01-01', 
    DATE '2024-01-05', 
    INTERVAL '1' DAY
  )) AS day
)
SELECT d.day, COALESCE(s.amount, 0) AS amount
FROM dates d
LEFT JOIN sales s ON d.day = s.day;

Output:

        day        | amount
------------------+-------
 2024-01-01       | 1500
 2024-01-02       | 2300
 2024-01-03       |    0  ← filled with zero
 2024-01-04       | 1800
 2024-01-05       | 3200

COALESCE(s.amount, 0) replaces NULL with 0. This is simple and effective for count-type metrics (like order count, visits).

Strategy One: Zero-Filling (COALESCE)

Best for: Order counts, visits, event counts — metrics that can reasonably be assumed to be zero.

WITH dates AS (
  SELECT unnest(generate_series(
    DATE '2024-01-01', 
    DATE '2024-01-05', 
    INTERVAL '1' DAY
  )) AS day
)
SELECT d.day, COALESCE(s.amount, 0) AS amount
FROM dates d
LEFT JOIN sales s ON d.day = s.day;
ProsCons
Simple, one-linerMay mislead trend analysis
Reasonable for count metricsUnsuitable for ratios/averages
Excellent query performance

Strategy Two: Forward Fill (Last Observation Carried Forward)

Best for: Stock prices, inventory levels, device states — metrics that don’t change abruptly.

If the price on Jan 2 was 2300 and on Jan 4 was 1800, the most reasonable estimate for Jan 3 is 2300 (carry forward the last observation), not 0 or 2050 (average).

DuckDB’s LAST_VALUE(...) IGNORE NULLS window function implements this perfectly:

WITH dates AS (
  SELECT unnest(generate_series(
    DATE '2024-01-01', 
    DATE '2024-01-05', 
    INTERVAL '1' DAY
  )) AS day
)
SELECT d.day,
  LAST_VALUE(s.amount IGNORE NULLS) OVER (
    ORDER BY d.day 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS forward_filled
FROM dates d
LEFT JOIN sales s ON d.day = s.day;

Output:

        day        | forward_filled
-------------------+---------------
 2024-01-01       |          1500
 2024-01-02       |          2300
 2024-01-03       |          2300  ← forward filled
 2024-01-04       |          1800
 2024-01-05       |          3200

Key Parameters Explained

  1. IGNORE NULLS: Tells the window function to skip NULL values and take the nearest non-null value
  2. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Defines the window range as “from the first row to the current row”
  3. Without IGNORE NULLS, the window function returns NULL when encountering a NULL value

Why Do You Need IGNORE NULLS?

Standard SQL window functions break accumulation when encountering NULL. Adding IGNORE NULLS makes LAST_VALUE skip the NULLs produced by the JOIN and always take the most recent observed value.

Strategy Three: Linear Interpolation

Best for: Sensor readings, temperature data, traffic metrics — data that changes smoothly over time.

Linear interpolation assumes the change between two known data points is linear, calculating missing values through a formula:

WITH dates AS (
  SELECT unnest(generate_series(
    DATE '2024-01-01', 
    DATE '2024-01-05', 
    INTERVAL '1' DAY
  )) AS day
),
joined AS (
  SELECT d.day, s.amount
  FROM dates d
  LEFT JOIN sales s ON d.day = s.day
),
with_lag_lead AS (
  SELECT 
    day,
    amount,
    LAG(amount) OVER (ORDER BY day) AS prev_amount,
    LEAD(amount) OVER (ORDER BY day) AS next_amount,
    LAG(day) OVER (ORDER BY day) AS prev_day,
    LEAD(day) OVER (ORDER BY day) AS next_day
  FROM joined
)
SELECT 
  day,
  CASE 
    WHEN amount IS NOT NULL THEN amount
    WHEN prev_amount IS NOT NULL AND next_amount IS NOT NULL THEN
      prev_amount + (next_amount - prev_amount) * 
        (day - prev_day::DATE)::INTEGER / 
        ((next_day - prev_day::DATE)::INTEGER + 1)
    WHEN prev_amount IS NOT NULL THEN prev_amount
    WHEN next_amount IS NOT NULL THEN next_amount
    ELSE 0
  END AS interpolated
FROM with_lag_lead;

Output:

        day        | interpolated
-------------------+-------------
 2024-01-01       |        1500
 2024-01-02       |        2300
 2024-01-03       |  2050.0  ← linear interpolation
 2024-01-04       |        1800
 2024-01-05       |        3200

Calculation: (2300 + (1800 - 2300) * 1 / 2) = 2300 - 250 = 2050

Comparison of Three Strategies

StrategySuitable Data TypeProsCons
Zero-Fill (COALESCE)Counts (orders, UV)Simple, fastMay mislead trends
Forward Fill (LAST_VALUE IGNORE NULLS)Stateful (stock price, inventory)Keeps latest observationDoesn’t reflect change
Linear Interpolation (LAG/LEAD)Continuous (sensors)Smooth transition, physically reasonableComplex, heavier computation

Advanced: Grouped Time Series Completion

In real-world scenarios, you often need to complete time series per product, region, or user. DuckDB supports this too:

WITH dates AS (
  SELECT unnest(generate_series(
    DATE '2024-01-01', 
    DATE '2024-01-05', 
    INTERVAL '1' DAY
  )) AS day
),
products AS (
  SELECT * FROM VALUES ('Phone'), ('Laptop'), ('Tablet') AS t(name)
),
date_product AS (
  SELECT d.day, p.name
  FROM dates d
  CROSS JOIN products p
),
joined AS (
  SELECT dp.day, dp.name, s.amount
  FROM date_product dp
  LEFT JOIN sales s ON dp.day = s.day
)
SELECT 
  day,
  name,
  LAST_VALUE(amount IGNORE NULLS) OVER (
    PARTITION BY name ORDER BY day
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS forward_filled
FROM joined;

The key here is PARTITION BY name — each product gets its own independent forward fill timeline.

Comparison with Traditional Tools

FeatureDuckDBPandasExcelSpark
Date sequence gengenerate_series()pd.date_range()Manual fillComplex
Forward fillLAST_VALUE IGNORE NULLSffill()Right-click fillfillna(method='ffill')
Linear interpolationLAG/LEAD + formulainterpolate()Not supportedinterp()
Group operationsPARTITION BYgroupby().transform()Pivot tablesgroupBy().agg()
PerformanceColumnar engine, blazing fastIn-memory, moderateSlowDistributed, great for huge data
Learning curveJust SQLNeeds PythonLowHigh
Zero installation✅ Runs directly❌ Needs Python❌ Needs software❌ Needs cluster

Performance Tips

  • For small datasets (< 1M rows), all methods above are very efficient
  • For large datasets, consider adding partition filters to generate_series to reduce Cartesian product size
  • If the time range is large (e.g., year-long), consider generating weekly or monthly sequences first, then refining as needed

Monetization Advice

1. SaaS Automated Reporting Service

Leverage DuckDB’s time series completion to offer automated daily/weekly reports for SMEs. Clients provide raw transaction data, and you generate complete reports with missing dates filled. Charge ¥500-2000/month per client.

2. Data Cleaning Microservice

Wrap generate_series + LEFT JOIN + forward fill into an API service for other systems. For example, the backend of an e-commerce “daily sales dashboard” that fills daily data before aggregation.

3. Sensor Data Monitoring Platform

For IoT scenarios, use linear interpolation to fill sensor gaps and build real-time monitoring dashboards. These B2B solutions typically range from ¥5,000 to ¥50,000 per deployment.

4. Paid Training Course

Expand this content into a premium course “DuckDB Time Series Mastery,” covering more scenarios (seasonal adjustment, anomaly detection, predictive filling), priced at ¥199-499/person.

5. Open Source Tool Commercialization

Build a CLI tool ts-fill based on DuckDB with YAML-configurable completion strategies. Offer a Pro version (custom interpolation algorithms, multi-source support) using a freemium model.


All code examples in this article can be tested directly in the DuckDB Online Shell.

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.