Completing Missing Time Series in DuckDB: generate_series + LAST_VALUE in Action
Your boss wants “daily sales data for last week,” but your table only has dates with transactions. What do you do with the missing days?
This is one of the most classic problems data analysts face: gaps in time series data.
When you export sales records from a business database, typically only dates with actual transactions appear. Days with zero activity simply don’t exist in the table. But your boss wants data for every single day — even days with zero sales need to show up.
This article covers three time series completion strategies in DuckDB, from simple zero-filling to advanced forward fill and linear interpolation.
The Problem Scenario: Missing Sales Data
Suppose you have a sales table recording daily revenue:
CREATE TABLE sales AS
SELECT * FROM VALUES
('2024-01-01', 1500),
('2024-01-02', 2300),
('2024-01-04', 1800),
('2024-01-05', 3200)
AS t(day, amount);
Notice: January 3rd has no transaction record, so it doesn’t exist in the table.
If you aggregate directly by date, you get:
day | amount
-----------+-------
2024-01-01 | 1500
2024-01-02 | 2300
2024-01-04 | 1800
2024-01-05 | 3200
The gap on Jan 3 breaks trend lines, skews moving averages, and creates visual discontinuities in charts.
Step One: Generate a Complete Date Sequence
DuckDB’s generate_series() function is the starting point for solving this problem. It can produce a continuous date sequence:
WITH dates AS (
SELECT unnest(generate_series(
DATE '2024-01-01',
DATE '2024-01-05',
INTERVAL '1' DAY
)) AS day
)
SELECT * FROM dates;
Output:
day
-------------------
2024-01-01
2024-01-02
2024-01-03
2024-01-04
2024-01-05
Key points:
generate_series(start, end, step)produces a sequence from start to endunnest()expands the generated array into rows- Intervals can be
INTERVAL '1' DAY,INTERVAL '1' HOUR, etc.
Step Two: LEFT JOIN to Fill Data
After generating a complete date sequence, use LEFT JOIN to attach actual sales data:
WITH dates AS (
SELECT unnest(generate_series(
DATE '2024-01-01',
DATE '2024-01-05',
INTERVAL '1' DAY
)) AS day
)
SELECT d.day, COALESCE(s.amount, 0) AS amount
FROM dates d
LEFT JOIN sales s ON d.day = s.day;
Output:
day | amount
------------------+-------
2024-01-01 | 1500
2024-01-02 | 2300
2024-01-03 | 0 ← filled with zero
2024-01-04 | 1800
2024-01-05 | 3200
COALESCE(s.amount, 0) replaces NULL with 0. This is simple and effective for count-type metrics (like order count, visits).
Strategy One: Zero-Filling (COALESCE)
Best for: Order counts, visits, event counts — metrics that can reasonably be assumed to be zero.
WITH dates AS (
SELECT unnest(generate_series(
DATE '2024-01-01',
DATE '2024-01-05',
INTERVAL '1' DAY
)) AS day
)
SELECT d.day, COALESCE(s.amount, 0) AS amount
FROM dates d
LEFT JOIN sales s ON d.day = s.day;
| Pros | Cons |
|---|---|
| Simple, one-liner | May mislead trend analysis |
| Reasonable for count metrics | Unsuitable for ratios/averages |
| Excellent query performance |
Strategy Two: Forward Fill (Last Observation Carried Forward)
Best for: Stock prices, inventory levels, device states — metrics that don’t change abruptly.
If the price on Jan 2 was 2300 and on Jan 4 was 1800, the most reasonable estimate for Jan 3 is 2300 (carry forward the last observation), not 0 or 2050 (average).
DuckDB’s LAST_VALUE(...) IGNORE NULLS window function implements this perfectly:
WITH dates AS (
SELECT unnest(generate_series(
DATE '2024-01-01',
DATE '2024-01-05',
INTERVAL '1' DAY
)) AS day
)
SELECT d.day,
LAST_VALUE(s.amount IGNORE NULLS) OVER (
ORDER BY d.day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS forward_filled
FROM dates d
LEFT JOIN sales s ON d.day = s.day;
Output:
day | forward_filled
-------------------+---------------
2024-01-01 | 1500
2024-01-02 | 2300
2024-01-03 | 2300 ← forward filled
2024-01-04 | 1800
2024-01-05 | 3200
Key Parameters Explained
IGNORE NULLS: Tells the window function to skip NULL values and take the nearest non-null valueROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Defines the window range as “from the first row to the current row”- Without
IGNORE NULLS, the window function returns NULL when encountering a NULL value
Why Do You Need IGNORE NULLS?
Standard SQL window functions break accumulation when encountering NULL. Adding IGNORE NULLS makes LAST_VALUE skip the NULLs produced by the JOIN and always take the most recent observed value.
Strategy Three: Linear Interpolation
Best for: Sensor readings, temperature data, traffic metrics — data that changes smoothly over time.
Linear interpolation assumes the change between two known data points is linear, calculating missing values through a formula:
WITH dates AS (
SELECT unnest(generate_series(
DATE '2024-01-01',
DATE '2024-01-05',
INTERVAL '1' DAY
)) AS day
),
joined AS (
SELECT d.day, s.amount
FROM dates d
LEFT JOIN sales s ON d.day = s.day
),
with_lag_lead AS (
SELECT
day,
amount,
LAG(amount) OVER (ORDER BY day) AS prev_amount,
LEAD(amount) OVER (ORDER BY day) AS next_amount,
LAG(day) OVER (ORDER BY day) AS prev_day,
LEAD(day) OVER (ORDER BY day) AS next_day
FROM joined
)
SELECT
day,
CASE
WHEN amount IS NOT NULL THEN amount
WHEN prev_amount IS NOT NULL AND next_amount IS NOT NULL THEN
prev_amount + (next_amount - prev_amount) *
(day - prev_day::DATE)::INTEGER /
((next_day - prev_day::DATE)::INTEGER + 1)
WHEN prev_amount IS NOT NULL THEN prev_amount
WHEN next_amount IS NOT NULL THEN next_amount
ELSE 0
END AS interpolated
FROM with_lag_lead;
Output:
day | interpolated
-------------------+-------------
2024-01-01 | 1500
2024-01-02 | 2300
2024-01-03 | 2050.0 ← linear interpolation
2024-01-04 | 1800
2024-01-05 | 3200
Calculation: (2300 + (1800 - 2300) * 1 / 2) = 2300 - 250 = 2050
Comparison of Three Strategies
| Strategy | Suitable Data Type | Pros | Cons |
|---|---|---|---|
| Zero-Fill (COALESCE) | Counts (orders, UV) | Simple, fast | May mislead trends |
| Forward Fill (LAST_VALUE IGNORE NULLS) | Stateful (stock price, inventory) | Keeps latest observation | Doesn’t reflect change |
| Linear Interpolation (LAG/LEAD) | Continuous (sensors) | Smooth transition, physically reasonable | Complex, heavier computation |
Advanced: Grouped Time Series Completion
In real-world scenarios, you often need to complete time series per product, region, or user. DuckDB supports this too:
WITH dates AS (
SELECT unnest(generate_series(
DATE '2024-01-01',
DATE '2024-01-05',
INTERVAL '1' DAY
)) AS day
),
products AS (
SELECT * FROM VALUES ('Phone'), ('Laptop'), ('Tablet') AS t(name)
),
date_product AS (
SELECT d.day, p.name
FROM dates d
CROSS JOIN products p
),
joined AS (
SELECT dp.day, dp.name, s.amount
FROM date_product dp
LEFT JOIN sales s ON dp.day = s.day
)
SELECT
day,
name,
LAST_VALUE(amount IGNORE NULLS) OVER (
PARTITION BY name ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS forward_filled
FROM joined;
The key here is PARTITION BY name — each product gets its own independent forward fill timeline.
Comparison with Traditional Tools
| Feature | DuckDB | Pandas | Excel | Spark |
|---|---|---|---|---|
| Date sequence gen | generate_series() | pd.date_range() | Manual fill | Complex |
| Forward fill | LAST_VALUE IGNORE NULLS | ffill() | Right-click fill | fillna(method='ffill') |
| Linear interpolation | LAG/LEAD + formula | interpolate() | Not supported | interp() |
| Group operations | PARTITION BY | groupby().transform() | Pivot tables | groupBy().agg() |
| Performance | Columnar engine, blazing fast | In-memory, moderate | Slow | Distributed, great for huge data |
| Learning curve | Just SQL | Needs Python | Low | High |
| Zero installation | ✅ Runs directly | ❌ Needs Python | ❌ Needs software | ❌ Needs cluster |
Performance Tips
- For small datasets (< 1M rows), all methods above are very efficient
- For large datasets, consider adding partition filters to
generate_seriesto reduce Cartesian product size - If the time range is large (e.g., year-long), consider generating weekly or monthly sequences first, then refining as needed
Monetization Advice
1. SaaS Automated Reporting Service
Leverage DuckDB’s time series completion to offer automated daily/weekly reports for SMEs. Clients provide raw transaction data, and you generate complete reports with missing dates filled. Charge ¥500-2000/month per client.
2. Data Cleaning Microservice
Wrap generate_series + LEFT JOIN + forward fill into an API service for other systems. For example, the backend of an e-commerce “daily sales dashboard” that fills daily data before aggregation.
3. Sensor Data Monitoring Platform
For IoT scenarios, use linear interpolation to fill sensor gaps and build real-time monitoring dashboards. These B2B solutions typically range from ¥5,000 to ¥50,000 per deployment.
4. Paid Training Course
Expand this content into a premium course “DuckDB Time Series Mastery,” covering more scenarios (seasonal adjustment, anomaly detection, predictive filling), priced at ¥199-499/person.
5. Open Source Tool Commercialization
Build a CLI tool ts-fill based on DuckDB with YAML-configurable completion strategies. Offer a Pro version (custom interpolation algorithms, multi-source support) using a freemium model.
All code examples in this article can be tested directly in the DuckDB Online Shell.
