Introduction
On May 29, 2026, the DuckDB team announced a major update to the DuckDB-Iceberg extension, shipping as part of DuckDB v1.5.3. This release brings several highly-anticipated features that significantly narrow the capability gap between DuckDB-Iceberg and traditional Iceberg engines, covering write operations, schema evolution, advanced partitioning strategies, and the latest Iceberg V3 format support.
Prior to this release, DuckDB’s Iceberg support was primarily focused on reading capabilities and basic data writes. With v1.5.3, critical features like MERGE INTO, ALTER TABLE, bucket/truncate partition transforms, and V3 format support are now fully available. This article provides an in-depth exploration of each new feature with executable SQL examples.
For teams building modern lakehouse architectures, these updates mean DuckDB can now integrate more seamlessly into existing Iceberg ecosystems, serving both as a query engine and a data writing tool.
1. MERGE INTO: One-Stop Upsert Operations
Overview
MERGE INTO (commonly known as Upsert) is one of the most frequently used write patterns in data lake scenarios. When the target table lacks a primary key constraint — which is true for all lakehouse formats — MERGE INTO becomes the standard way to express “insert or update” semantics.
Before v1.5.3, DuckDB-Iceberg users had to implement this functionality through a cumbersome workflow: query first, check conditions, then execute separate INSERT or UPDATE statements. This approach was not only verbose but also lacked atomicity guarantees. Now, a single MERGE INTO statement handles everything.
Code Example
Let’s start with a people table:
-- Attach to an Iceberg catalog and create a table
ATTACH 'my_warehouse' AS my_datalake (TYPE iceberg);
CREATE TABLE my_datalake.default.people (
id INTEGER,
name VARCHAR,
salary FLOAT
);
-- Insert initial data
INSERT INTO my_datalake.default.people
VALUES (1, 'John', 92000.0), (2, 'Anna', 100000.0);
-- View current data
SELECT * FROM my_datalake.default.people ORDER BY id;
Output:
┌───────┬─────────┬──────────┐
│ id │ name │ salary │
│ int32 │ varchar │ float │
├───────┼─────────┼──────────┤
│ 1 │ John │ 92000.0 │
│ 2 │ Anna │ 100000.0 │
└───────┴─────────┴──────────┘
Now, let’s update John’s salary and insert a new employee Sarah in one atomic operation:
MERGE INTO my_datalake.default.people AS target
USING (
FROM (VALUES
(1, 'John', 105000.0),
(3, 'Sarah', 95000.0)
) t(id, name, salary)
) AS upserts
ON (upserts.id = target.id)
WHEN MATCHED THEN UPDATE
WHEN NOT MATCHED THEN INSERT;
Verify the results:
SELECT * FROM my_datalake.default.people ORDER BY id;
┌───────┬─────────┬──────────┐
│ id │ name │ salary │
│ int32 │ varchar │ float │
├───────┼─────────┼──────────┤
│ 1 │ John │ 105000.0 │
│ 2 │ Anna │ 100000.0 │
│ 3 │ Sarah │ 95000.0 │
└───────┴─────────┴──────────┘
Advanced Usage: DELETE Branch
MERGE INTO also supports the WHEN MATCHED THEN DELETE branch, allowing you to handle updates and deletions in a single statement:
MERGE INTO my_datalake.default.people AS target
USING (VALUES (2, 'Anna', 0.0)) AS changes(id, name, salary)
ON (changes.id = target.id)
WHEN MATCHED AND changes.salary = 0.0 THEN DELETE
WHEN MATCHED THEN UPDATE
WHEN NOT MATCHED THEN INSERT;
DuckDB-Iceberg’s MERGE INTO uses merge-on-read semantics. When writing, deletion positions are recorded in the Iceberg table and merged at read time. This approach avoids rewriting entire data files, making it particularly suitable for frequently updated large tables.
2. ALTER TABLE: Full Schema Evolution
Overview
In DuckDB v1.4, a major limitation of the Iceberg extension was the lack of schema evolution support. Once a table was created, its structure could not be modified — no adding columns, renaming columns, or dropping columns. For production data lakes, this was clearly unacceptable.
v1.5.3 completely resolves this limitation. ALTER TABLE now supports the following operations:
| Operation | Description | Supported |
|---|---|---|
| RENAME TABLE | Rename a table | ✅ |
| ADD COLUMN | Add a new column | ✅ |
| RENAME COLUMN | Rename a column | ✅ |
| DROP COLUMN | Drop a column | ✅ |
| SET format-version | Set the Iceberg format version | ✅ |
Code Example
-- Create a test table
CREATE TABLE my_datalake.default.simple_table AS
FROM (VALUES
(1, 'Andy'),
(2, 'Bob'),
(3, 'Claire'),
(4, 'Mr. Duck')) t(col1, col2);
-- Rename the table
ALTER TABLE my_datalake.default.simple_table
RENAME TO renamed_table;
-- Add a column
ALTER TABLE my_datalake.default.renamed_table
ADD COLUMN col3 DOUBLE;
-- Rename a column
ALTER TABLE my_datalake.default.renamed_table
RENAME COLUMN col2 TO name;
-- Drop a column
ALTER TABLE my_datalake.default.renamed_table
DROP COLUMN col3;
-- Set format version to V3
ALTER TABLE my_datalake.default.renamed_table
SET ('format-version' = 3);
-- Query the result
SELECT * FROM my_datalake.default.renamed_table ORDER BY col1;
┌───────┬──────────┐
│ col1 │ name │
│ int32 │ varchar │
├───────┼──────────┤
│ 1 │ Andy │
│ 2 │ Bob │
│ 3 │ Claire │
│ 4 │ Mr. Duck │
└───────┴──────────┘
How It Works
Under the hood, each ALTER TABLE statement updates the current-schema-id of the Iceberg table. All changes are written through the Iceberg REST Catalog. Since Iceberg schema evolution is a metadata-only operation, no data files are rewritten, making execution extremely fast without affecting existing data. Other Iceberg-compatible engines (such as Spark or Trino) will immediately see these changes the next time they query the LoadTableInformation endpoint.
3. Bucket & Truncate Partition Transforms
Overview
The Iceberg specification defines several partition transforms that determine how data files are laid out on disk. v1.5.3 adds support for creating, inserting into, and updating tables that use the bucket and truncate partition transforms.
bucket(N, col): Hashes the column’s value into N buckets, useful for stable partitioning on high-cardinality columns like user IDs.truncate(W, col): Groups rows by the first W characters (for strings) or by the column’s value rounded down to a multiple of W (for numeric columns), useful for prefix-based partitioning.
Code Example
CREATE TABLE my_datalake.default.events (
event_id BIGINT,
user_id BIGINT,
country VARCHAR,
payload VARCHAR
)
PARTITIONED BY (bucket(16, user_id), truncate(2, country));
INSERT INTO my_datalake.default.events
VALUES
(1, 1001, 'United States', 'click'),
(2, 1002, 'United Kingdom', 'view'),
(3, 1003, 'Germany', 'click'),
(4, 1004, 'Netherlands', 'view');
Verify partitioning using the iceberg_metadata function:
SELECT file_path, record_count
FROM iceberg_metadata(my_datalake.default.events)
WHERE content = 'EXISTING';
Updates and deletes against bucket- and truncate-partitioned tables are also supported, using positional deletes under merge-on-read semantics.
4. Iceberg Schema Properties Management
Overview
Iceberg catalogs allow arbitrary key-value properties to be attached at the schema (namespace) level. These properties are typically used to record ownership, descriptions, default storage locations, or any other metadata that applies to every table in a schema.
v1.5.3 provides three new table functions for managing schema properties:
| Function | Purpose |
|---|---|
set_iceberg_schema_properties | Set schema properties |
iceberg_schema_properties | Read schema properties |
remove_iceberg_schema_properties | Remove schema properties |
Code Example
-- Set schema properties
CALL set_iceberg_schema_properties(my_datalake.default, {
'owner': 'analytics-team',
'description': 'Default analytics schema'
});
-- Read schema properties
SELECT * FROM iceberg_schema_properties(my_datalake.default);
┌─────────────┬──────────────────────────┐
│ key │ value │
│ varchar │ varchar │
├─────────────┼──────────────────────────┤
│ owner │ analytics-team │
│ description │ Default analytics schema │
└─────────────┴──────────────────────────┘
-- Remove schema properties
CALL remove_iceberg_schema_properties(
my_datalake.default,
['description']
);
Schema properties are written through the Iceberg REST Catalog, so any other Iceberg-aware engine attached to the same catalog will see the updates immediately. The returned value is the number of remaining schema properties.
5. Iceberg V3 Format Support
New V3 Features
The Iceberg V3 specification introduces several important new features that DuckDB-Iceberg now supports for both reads and writes:
| Feature | Description |
|---|---|
VARIANT data type | Semi-structured data support, similar to JSON |
TIMESTAMP_NS data type | Nanosecond-precision timestamps |
| Schema-level default values | Default value definitions for columns |
| Binary deletion vectors | More compact than V2 positional delete files |
| Row lineage tracking | Data row origin tracking |
The most impactful change in practice is binary deletion vectors. In V2 tables, DuckDB-Iceberg writes positional deletes as Parquet files; in V3 tables, the same information is encoded as a much more compact binary deletion vector (Puffin file). DuckDB automatically picks the right format based on the table’s format-version.
Code Example
-- Create a V3 table
CREATE TABLE my_datalake.default.v3_table
WITH ('format-version' = 3) AS
FROM (VALUES
(1, {'kind': 'click', 'x': 10}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.123456789'),
(2, {'kind': 'view'}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.987654321')
) t(id, payload, event_time);
-- Delete operations on V3 tables use binary deletion vectors
DELETE FROM my_datalake.default.v3_table
WHERE id = 1;
SELECT * FROM my_datalake.default.v3_table;
┌───────┬──────────────────┬───────────────────────────────┐
│ id │ payload │ event_time │
│ int32 │ variant │ timestamp_ns │
├───────┼──────────────────┼───────────────────────────────┤
│ 2 │ {"kind": "view"} │ 2026-05-20 12:00:00.987654321 │
└───────┴──────────────────┴───────────────────────────────┘
Verify the deletion vector format using iceberg_metadata:
SELECT manifest_content, content, file_format
FROM iceberg_metadata(my_datalake.default.v3_table);
┌──────────────────┬──────────────────┬─────────────┐
│ manifest_content │ content │ file_format │
│ varchar │ varchar │ varchar │
├──────────────────┼──────────────────┼─────────────┤
│ DATA │ EXISTING │ parquet │
│ DELETE │ POSITION_DELETES │ puffin │
└──────────────────┴──────────────────┴─────────────┘
Note: The
GEOMETRYtype andUnknowntype are not yet supported in DuckDB-Iceberg; these are planned for DuckDB v2.0.0.
6. Comparison with Traditional Solutions
| Feature | DuckDB-Iceberg v1.5.3 | Apache Spark + Iceberg | Trino + Iceberg |
|---|---|---|---|
| Deployment Complexity | Embedded, no cluster | Requires Spark cluster | Requires Trino cluster |
| Query Startup Time | Milliseconds | Seconds (Executor startup) | Seconds |
| MERGE INTO | ✅ Full support | ✅ Full support | ✅ Full support |
| ALTER TABLE | ✅ Full support | ✅ Full support | ✅ Full support |
| V3 Format | ✅ Read + Write | ✅ Read + Write | ⚠️ Partial |
| bucket/truncate | ✅ Write + Read | ✅ Full support | ✅ Full support |
| Schema Properties | ✅ Full management | ✅ Full management | ✅ Read support |
| VARIANT type | ✅ V3 support | ❌ Not native | ❌ Not native |
| Installation Size | ~50MB | Several GB | Hundreds of MB |
| Single-node Large Data | ✅ Excellent (vectorized) | ⚠️ Requires distributed | ⚠️ Requires distributed |
| Python Integration | ✅ Native support | ✅ PySpark | ❌ JDBC required |
DuckDB-Iceberg’s biggest advantage is the embedded architecture that delivers low-latency experiences. No cluster startup is needed — a single ATTACH command connects to any Iceberg REST Catalog, and you can immediately query and write using familiar SQL.
7. Roadmap
According to the official blog, future development priorities for DuckDB-Iceberg include:
- Further
UPDATE,DELETE,MERGEoptimizations: Improving write performance GEOMETRYtype support in DuckDB v2.0.0: Completing Iceberg type coverage- Deeper Quack protocol integration: Remote Iceberg table access via Quack
- Iceberg-based Catalog support in DuckLake: Unified lakehouse management
8. Monetization Recommendations
The new DuckDB-Iceberg v1.5.3 features open several monetization paths for data teams:
1. Lightweight Data Lakehouse Query Service Leverage DuckDB’s embedded nature to offer lightweight Iceberg query services for small-to-medium teams as an alternative to Spark/Trino. Charge by query volume or concurrent users. $500-$2,000/month per team.
2. Data Pipeline Automation Tools
Build automated ETL/ELT tools based on MERGE INTO and ALTER TABLE to help enterprises manage incremental updates and schema evolution in Iceberg tables. SaaS subscription, $200-$800/month.
3. Training and Consulting Provide enterprise training on DuckDB-Iceberg best practices, performance tuning, and architecture design. $3,000-$8,000 per workshop.
4. Open-Source Ecosystem Tools Develop management UIs, monitoring dashboards, or CI/CD integration tools for DuckDB-Iceberg. Monetize through an open-source community edition + enterprise edition with advanced features.
5. Cloud-Native Data Lakehouse Service Deploy managed DuckDB-Iceberg query services on AWS/GCP/Azure using serverless architecture for cost advantages. Target 30-50% profit margins.
Conclusion
The Iceberg extension update in DuckDB v1.5.3 is a milestone. MERGE INTO provides atomic Upsert semantics, ALTER TABLE removes schema evolution limitations, bucket/truncate partition transforms deliver more flexible data layout strategies, and Iceberg V3 support pushes DuckDB to the forefront of lakehouse technology.
For data engineers and architects, this means an embedded database under 50MB can now handle Iceberg management tasks that previously required distributed clusters. Whether for local development, CI/CD testing, or small-scale production environments, DuckDB-Iceberg is an extremely compelling choice.
Upgrade to DuckDB v1.5.3 today and experience these new features firsthand!
