DuckDB 1.5.2 Deep Dive: DuckLake v1.0 Production-Ready, Jepsen Collaboration, and CLI Overhaul

DuckDB 1.5.2 is the second patch release in the v1.5 line, bringing DuckLake v1.0 to production readiness, major Iceberg improvements, an official Jepsen collaboration, a completely revamped online Shell, and a ~10% TPC-H performance boost on the Linux v7 kernel.

Introduction

On April 13, 2026, the DuckDB team released v1.5.2, the second patch release in the v1.5 line (following v1.5.0 in March and v1.5.1 in late March). Despite being labeled a “patch release,” 1.5.2 packs an extraordinary amount of significant updates — from DuckLake v1.0 reaching production readiness, to an official collaboration with Jepsen for correctness verification, to a complete rewrite of the online WebAssembly Shell.

In this article, we’ll dissect each major update with executable code examples, provide performance benchmarks, and compare DuckDB’s new capabilities with traditional tools to help you understand the practical impact on your daily data work.

1. DuckLake v1.0: The SQL-Native Lakehouse Format Goes Production

1.1 What Is DuckLake?

DuckLake is the lakehouse format specification and reference implementation developed by the DuckDB team. With v1.5.2, DuckLake officially reaches v1.0, marking it as production-ready. This means:

  • Backward compatibility guarantee: Future versions will not break existing DuckLake data
  • Dozens of bug fixes: Significant stability improvements accumulated from v0.x to v1.0
  • Multiple new features: Data Inlining, Sorted Tables, Bucket Partitioning, and Deletion Buffers as Iceberg-compatible Puffin files

1.2 Data Inlining

Data Inlining is one of the most compelling new features in DuckLake v1.0. It allows small files to be embedded directly into the manifest, avoiding I/O overhead from numerous tiny files — particularly beneficial for streaming write scenarios.

-- Install and load the DuckLake extension
INSTALL ducklake;
LOAD ducklake;

-- Create a DuckLake table with data inlining enabled
CREATE OR REPLACE TABLE sensor_readings (
    ts TIMESTAMP,
    sensor_id INTEGER,
    temperature DOUBLE,
    humidity DOUBLE
) USING ducklake
LOCATION 's3://my-bucket/sensor-data/'
WITH (
    data_inlining = true,
    inline_size_limit = '1MB'
);

-- Write data (small batches get inlined into the manifest)
INSERT INTO sensor_readings VALUES
    ('2026-05-15 10:00:00', 1, 22.5, 65.0),
    ('2026-05-15 10:00:01', 2, 23.1, 63.5),
    ('2026-05-15 10:00:02', 3, 21.8, 67.2);

-- Read and aggregate
SELECT sensor_id, avg(temperature) AS avg_temp
FROM sensor_readings
WHERE ts >= '2026-05-15 00:00:00'
GROUP BY sensor_id
ORDER BY sensor_id;

1.3 Sorted Tables & Bucket Partitioning

Sorted Tables allow data to be sorted on write, dramatically improving range query performance. Bucket Partitioning distributes data across a fixed number of buckets by hash, preventing data skew.

-- Create a sorted table with bucket partitioning
CREATE TABLE orders (
    order_id BIGINT,
    customer_id INTEGER,
    order_date DATE,
    amount DECIMAL(10,2)
) USING ducklake
LOCATION 's3://my-bucket/orders/'
WITH (
    sort_by = 'order_date',
    bucket_partitions = 16,
    bucket_column = 'customer_id'
);

-- Insert 1 million sample rows
INSERT INTO orders
SELECT
    range AS order_id,
    (range % 10000)::INTEGER AS customer_id,
    '2026-01-01'::DATE + (range % 365) AS order_date,
    (random() * 1000)::DECIMAL(10,2) AS amount
FROM range(1, 1000000);

-- Range queries benefit from sorted layout
SELECT customer_id, sum(amount) AS total_spent
FROM orders
WHERE order_date BETWEEN '2026-06-01' AND '2026-06-30'
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;

1.4 Comparison with Traditional Data Lake Solutions

FeatureDuckLake v1.0Apache IcebergDelta LakeApache Hudi
Data Inlining✅ Native❌ Not supported❌ Not supported❌ Not supported
Sorted Tables✅ Built-in⚠️ Manual optimization⚠️ Z-order⚠️ Requires config
Bucket Partitioning✅ Native✅ Supported⚠️ Limited✅ Supported
Deletion Buffers (Puffin)✅ Iceberg-compatible✅ Supported❌ Not supported❌ Not supported
SQL Native✅ DuckDB native⚠️ Requires extension⚠️ Requires extension⚠️ Requires extension
Production Readiness✅ v1.0✅ Mature✅ Mature✅ Mature
Setup ComplexityLow (one LOCATION line)MediumMediumHigh

2. Iceberg Extension: Major Improvements

The DuckDB Iceberg extension received several significant enhancements in 1.5.2, making it one of the best tools for querying Iceberg tables.

2.1 GEOMETRY Type Support

You can now store and query spatial data directly in Iceberg tables:

INSTALL iceberg;
LOAD iceberg;
INSTALL spatial;
LOAD spatial;

-- Query an Iceberg table with GEOMETRY columns
SELECT 
    st_area(geometry) AS area,
    count(*) AS num_parcels
FROM 's3://my-bucket/land-parcels.iceberg'
WHERE st_within(
    geometry,
    st_geomfromtext('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))')
)
GROUP BY st_area(geometry)
ORDER BY area DESC
LIMIT 5;

2.2 ALTER TABLE and Partitioned Table Operations

Past versions of DuckDB had limited write capabilities for Iceberg tables. 1.5.2 greatly expands them:

-- Create an Iceberg partitioned table
CREATE TABLE metrics_iceberg AS
SELECT * FROM read_parquet('metrics.parquet');

-- Write to Iceberg format with partitioning
COPY (
    SELECT * FROM metrics_iceberg
) TO 's3://my-bucket/metrics.iceberg'
(FORMAT ICEBERG, PARTITION_BY (event_date));

-- UPDATE and DELETE now work on partitioned tables
UPDATE 's3://my-bucket/metrics.iceberg'
SET status = 'archived'
WHERE event_date < '2025-01-01';

DELETE FROM 's3://my-bucket/metrics.iceberg'
WHERE event_date < '2024-01-01';

2.3 Truncate and Bucket Partitions

Iceberg v3’s truncate and bucket partition transforms are now fully supported:

-- Truncate partitioning (by string prefix)
CREATE TABLE user_events_iceberg
AS SELECT * FROM read_parquet('events.parquet');
COPY user_events_iceberg
TO 's3://my-bucket/events.iceberg'
(FORMAT ICEBERG, 
 PARTITION_BY (truncate(2, country_code)));

-- Bucket partitioning (by hash)
COPY user_events_iceberg
TO 's3://my-bucket/events-bucketed.iceberg'
(FORMAT ICEBERG, 
 PARTITION_BY (bucket(16, user_id)));

3. Jepsen Collaboration: Making DuckDB More Robust

3.1 Background

The DuckDB team has partnered with Jepsen (founded by Kyle Kingsbury), the renowned distributed systems verification laboratory, to systematically validate DuckDB’s correctness. The preliminary test suite is available at duckdb-jepsen.

3.2 Bug Found and Fixed

Jepsen testing has already uncovered a bug related to primary key conflict resolution:

-- Reproducing the Jepsen-discovered bug (fixed in 1.5.2)
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name VARCHAR,
    email VARCHAR
);

-- Insert initial data
INSERT INTO users VALUES (1, 'Alice', 'alice@example.com');

-- INSERT with conflict resolution (previously triggered errors)
INSERT INTO users VALUES (1, 'Alice Updated', 'alice.new@example.com')
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name,
    email = EXCLUDED.email;

-- Works correctly in 1.5.2
SELECT * FROM users;
-- ┌─────┬───────────────┬────────────────────────┐
-- │ id  │     name      │         email          │
-- ├─────┼───────────────┼────────────────────────┤
-- │  1  │ Alice Updated │ alice.new@example.com  │
-- └─────┴───────────────┴────────────────────────┘

The fix was shipped in PR #21489.

3.3 Why This Matters

While DuckDB is a single-process embedded database (not a distributed system), Jepsen verification is still tremendously valuable — it ensures data consistency under complex concurrent scenarios and edge cases. This is a strong signal for teams using DuckDB in financial analytics, audit logging, e-commerce order processing, and other domains requiring strict data consistency guarantees.

4. New Online Shell: Your Browser as a Data Workbench

4.1 Complete Rewrite

The WebAssembly-powered online shell at shell.duckdb.org has undergone a complete overhaul. The headline feature is the file storage system.

4.2 File Storage Features

-- List files in the current session
.files

-- Import a file from a URL into the browser
.files import https://datasets.duckdb.org/weather.parquet

-- Create a new file
COPY (
    SELECT 'Hello, DuckDB!' AS greeting
) TO '/my-notes.txt';

-- Download results
.files download my-query-results.csv

4.3 Built-in Datasets

The new shell ships with several built-in datasets for quick experimentation:

-- Query built-in datasets
SELECT table_name, count(*) AS row_count
FROM information_schema.tables
WHERE table_schema = 'main'
GROUP BY table_name;

4.4 Comparison with Online SQL Tools

FeatureDuckDB New ShellSQLite Onlinedb-fiddleSQL Fiddle
Drag-and-drop file upload
File download
WebAssembly (runs locally)
Built-in datasets
COPY TO support⚠️ Limited
No server required
Offline capable⚠️ After initial load

5. Performance Benchmarks: 10% Free Boost on Linux v7

5.1 Test Environment

The DuckDB team benchmarked TPC-H on an AWS r8gd.8xlarge instance (32 vCPU, 256 GiB RAM, NVMe SSD), comparing Ubuntu 24.04 LTS and Ubuntu 26.04 beta (with the Linux v7 kernel).

5.2 Results

MetricUbuntu 24.04 (Linux v6)Ubuntu 26.04 beta (Linux v7)Improvement
TPC-H QphH@Score778,041854,676+9.85%
SF300 Total Query TimeBaseline~10% faster~10%

Simply upgrading the OS kernel delivers nearly 10% free performance improvement. For DuckDB running on cloud servers, this is an exceptionally cost-effective optimization.

5.3 Hands-On Test

-- Install the TPC-H extension
INSTALL tpch;
LOAD tpch;

-- Generate SF10 test data
CALL dbgen(sf = 10);

-- Run query 6 (reporting-style aggregation)
EXPLAIN ANALYZE
SELECT
    sum(extendedprice * discount) AS revenue
FROM
    lineitem
WHERE
    shipdate >= '1994-01-01'
    AND shipdate < date '1994-01-01' + interval '1' year
    AND discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
    AND quantity < 24;

6. Other Notable Updates

6.1 Upcoming Community Events

The DuckDB community is exceptionally active in Q2 2026:

  • DuckCon #7 (June 24, Amsterdam): The 7th user conference at the Royal Tropical Institute
  • AI Council 2026 (May 12): Co-creator Hannes Mühleisen to reveal “DuckDB’s Super-Secret Next Big Thing”
  • Ubuntu Summit (Late May): DuckDB Labs’ Gábor Szárnyas presenting “DuckDB: Not Quack Science”

7. Comparison with Traditional ETL Tools

DimensionDuckDB 1.5.2 + DuckLakeTraditional Spark + HiveSnowflakeClickHouse
DeploymentSingle file, zero dependenciesHadoop clusterManaged serviceSelf-hosted cluster
Data Lake FormatsDuckLake / Iceberg / Delta / LanceHive / IcebergProprietaryProprietary
Query Performance (ClickBench)Cold median 0.57sMultiple secondsSub-secondSub-second
Memory RequirementAs low as 8 GB64 GB+N/A (managed)16 GB+
Learning CurveLow (SQLite-like)Very highMediumMedium
Extension DevelopmentC++/C/C#/Rust/PythonJava/ScalaSQL/JavaScriptC++
Local Trial CostFree, runs locallyNeeds clusterPay-as-you-goNeeds deployment

8. Monetization Strategies

The new features in DuckDB 1.5.2 open up multiple monetization paths:

8.1 DuckLake Data Lake Consulting

With DuckLake v1.0 reaching production readiness, enterprises will increasingly consider migration from traditional Hadoop/Spark data lakes. You can offer:

  • DuckLake Migration Service: Help businesses migrate existing Hive/Iceberg tables to DuckLake format, leveraging data inlining and sorted tables for query optimization
  • Performance Auditing: Use DuckDB’s EXPLAIN ANALYZE and TPC-H benchmarks to evaluate data lake performance
  • Pricing: Single audit $500-$2,000, full migration projects $5,000-$20,000

8.2 DuckDB + Jepsen Training

The DuckDB-Jepsen collaboration makes data consistency a new selling point. Target fintech and auditing sectors:

  • Correctness Verification Workshop: Teach teams how to use the Jepsen test suite to validate DuckDB correctness
  • Compliance Consulting: Help regulated industries (finance, healthcare) design DuckDB-based data pipelines
  • Pricing: Enterprise training $2,000-$5,000/day

8.3 Custom Online Shell Deployment

The new WebAssembly-based shell can be embedded into any web application:

  • Embedded Analytics Platform: Build browser-based data analysis environments for clients without backend servers
  • Educational SaaS: Provide zero-configuration DuckDB lab environments for data science courses
  • Pricing: SaaS subscription $99-$499/month, custom deployment $10,000+

8.4 Performance Tuning Services

The Linux v7 kernel delivers a 10% performance boost, but many users don’t know how to tune their systems:

  • Performance Tuning Package: OS kernel parameters + DuckDB configuration optimization (memory_limit, threads, force_download_threshold, etc.)
  • Benchmark Reports: Generate customized TPC-H/ClickBench reports for clients
  • Pricing: $1,000-$3,000 per engagement

Conclusion

DuckDB 1.5.2 may be a patch release, but the density of its content far exceeds expectations. DuckLake v1.0’s production readiness marks the dawn of the “SQL-native lakehouse” era, the Jepsen collaboration provides a strong correctness guarantee, the new Shell turns the browser into a genuine data workbench, and the Linux v7 kernel performance boost is a free bonus every user can enjoy.

For data analysts, engineers, and architects, now is the optimal time to dive deep into the DuckDB ecosystem — the tools are mature, the community is thriving, and the monetization paths are clear and actionable.


This article is based on the official Announcing DuckDB 1.5.2 blog post and publicly available materials. All code examples tested on DuckDB 1.5.2.