DuckDB-Iceberg v1.5.3 Deep Dive: MERGE INTO, ALTER TABLE, and V3 Support

Data Lake Formats

DuckDB-Iceberg v1.5.3 Deep Dive: MERGE INTO, ALTER TABLE, and V3 Support

A comprehensive deep dive into DuckDB v1.5.3's Iceberg extension updates: MERGE INTO upserts, ALTER TABLE schema evolution, bucket/truncate partition transforms, Iceberg V3 format support, and Schema Properties management with executable SQL examples.

Introduction

On May 29, 2026, the DuckDB team announced a major update to the DuckDB-Iceberg extension, shipping as part of DuckDB v1.5.3. This release brings several highly-anticipated features that significantly narrow the capability gap between DuckDB-Iceberg and traditional Iceberg engines, covering write operations, schema evolution, advanced partitioning strategies, and the latest Iceberg V3 format support.

Prior to this release, DuckDB’s Iceberg support was primarily focused on reading capabilities and basic data writes. With v1.5.3, critical features like MERGE INTO, ALTER TABLE, bucket/truncate partition transforms, and V3 format support are now fully available. This article provides an in-depth exploration of each new feature with executable SQL examples.

For teams building modern lakehouse architectures, these updates mean DuckDB can now integrate more seamlessly into existing Iceberg ecosystems, serving both as a query engine and a data writing tool.

1. MERGE INTO: One-Stop Upsert Operations

Overview

MERGE INTO (commonly known as Upsert) is one of the most frequently used write patterns in data lake scenarios. When the target table lacks a primary key constraint — which is true for all lakehouse formats — MERGE INTO becomes the standard way to express “insert or update” semantics.

Before v1.5.3, DuckDB-Iceberg users had to implement this functionality through a cumbersome workflow: query first, check conditions, then execute separate INSERT or UPDATE statements. This approach was not only verbose but also lacked atomicity guarantees. Now, a single MERGE INTO statement handles everything.

Code Example

Let’s start with a people table:

-- Attach to an Iceberg catalog and create a table
ATTACH 'my_warehouse' AS my_datalake (TYPE iceberg);
CREATE TABLE my_datalake.default.people (
    id INTEGER,
    name VARCHAR,
    salary FLOAT
);

-- Insert initial data
INSERT INTO my_datalake.default.people
    VALUES (1, 'John', 92000.0), (2, 'Anna', 100000.0);

-- View current data
SELECT * FROM my_datalake.default.people ORDER BY id;

Output:

┌───────┬─────────┬──────────┐
│  id   │  name   │  salary  │
│ int32 │ varchar │  float   │
├───────┼─────────┼──────────┤
│     1 │ John    │  92000.0 │
│     2 │ Anna    │ 100000.0 │
└───────┴─────────┴──────────┘

Now, let’s update John’s salary and insert a new employee Sarah in one atomic operation:

MERGE INTO my_datalake.default.people AS target
    USING (
        FROM (VALUES
            (1, 'John', 105000.0),
            (3, 'Sarah', 95000.0)
        ) t(id, name, salary)
    ) AS upserts
    ON (upserts.id = target.id)
    WHEN MATCHED THEN UPDATE
    WHEN NOT MATCHED THEN INSERT;

Verify the results:

SELECT * FROM my_datalake.default.people ORDER BY id;

┌───────┬─────────┬──────────┐
│  id   │  name   │  salary  │
│ int32 │ varchar │  float   │
├───────┼─────────┼──────────┤
│     1 │ John    │ 105000.0 │
│     2 │ Anna    │ 100000.0 │
│     3 │ Sarah   │  95000.0 │
└───────┴─────────┴──────────┘

Advanced Usage: DELETE Branch

MERGE INTO also supports the WHEN MATCHED THEN DELETE branch, allowing you to handle updates and deletions in a single statement:

MERGE INTO my_datalake.default.people AS target
    USING (VALUES (2, 'Anna', 0.0)) AS changes(id, name, salary)
    ON (changes.id = target.id)
    WHEN MATCHED AND changes.salary = 0.0 THEN DELETE
    WHEN MATCHED THEN UPDATE
    WHEN NOT MATCHED THEN INSERT;

DuckDB-Iceberg’s MERGE INTO uses merge-on-read semantics. When writing, deletion positions are recorded in the Iceberg table and merged at read time. This approach avoids rewriting entire data files, making it particularly suitable for frequently updated large tables.

2. ALTER TABLE: Full Schema Evolution

Overview

In DuckDB v1.4, a major limitation of the Iceberg extension was the lack of schema evolution support. Once a table was created, its structure could not be modified — no adding columns, renaming columns, or dropping columns. For production data lakes, this was clearly unacceptable.

v1.5.3 completely resolves this limitation. ALTER TABLE now supports the following operations:

Operation	Description	Supported
RENAME TABLE	Rename a table	✅
ADD COLUMN	Add a new column	✅
RENAME COLUMN	Rename a column	✅
DROP COLUMN	Drop a column	✅
SET format-version	Set the Iceberg format version	✅

Code Example

-- Create a test table
CREATE TABLE my_datalake.default.simple_table AS
    FROM (VALUES
        (1, 'Andy'),
        (2, 'Bob'),
        (3, 'Claire'),
        (4, 'Mr. Duck')) t(col1, col2);

-- Rename the table
ALTER TABLE my_datalake.default.simple_table
    RENAME TO renamed_table;

-- Add a column
ALTER TABLE my_datalake.default.renamed_table
    ADD COLUMN col3 DOUBLE;

-- Rename a column
ALTER TABLE my_datalake.default.renamed_table
    RENAME COLUMN col2 TO name;

-- Drop a column
ALTER TABLE my_datalake.default.renamed_table
    DROP COLUMN col3;

-- Set format version to V3
ALTER TABLE my_datalake.default.renamed_table
    SET ('format-version' = 3);

-- Query the result
SELECT * FROM my_datalake.default.renamed_table ORDER BY col1;

┌───────┬──────────┐
│ col1  │   name   │
│ int32 │ varchar  │
├───────┼──────────┤
│     1 │ Andy     │
│     2 │ Bob      │
│     3 │ Claire   │
│     4 │ Mr. Duck │
└───────┴──────────┘

How It Works

Under the hood, each ALTER TABLE statement updates the current-schema-id of the Iceberg table. All changes are written through the Iceberg REST Catalog. Since Iceberg schema evolution is a metadata-only operation, no data files are rewritten, making execution extremely fast without affecting existing data. Other Iceberg-compatible engines (such as Spark or Trino) will immediately see these changes the next time they query the LoadTableInformation endpoint.

3. Bucket & Truncate Partition Transforms

Overview

The Iceberg specification defines several partition transforms that determine how data files are laid out on disk. v1.5.3 adds support for creating, inserting into, and updating tables that use the bucket and truncate partition transforms.

bucket(N, col): Hashes the column’s value into N buckets, useful for stable partitioning on high-cardinality columns like user IDs.
truncate(W, col): Groups rows by the first W characters (for strings) or by the column’s value rounded down to a multiple of W (for numeric columns), useful for prefix-based partitioning.

Code Example

CREATE TABLE my_datalake.default.events (
    event_id BIGINT,
    user_id BIGINT,
    country VARCHAR,
    payload VARCHAR
)
PARTITIONED BY (bucket(16, user_id), truncate(2, country));

INSERT INTO my_datalake.default.events
    VALUES
        (1, 1001, 'United States', 'click'),
        (2, 1002, 'United Kingdom', 'view'),
        (3, 1003, 'Germany', 'click'),
        (4, 1004, 'Netherlands', 'view');

Verify partitioning using the iceberg_metadata function:

SELECT file_path, record_count
FROM iceberg_metadata(my_datalake.default.events)
WHERE content = 'EXISTING';

Updates and deletes against bucket- and truncate-partitioned tables are also supported, using positional deletes under merge-on-read semantics.

4. Iceberg Schema Properties Management

Overview

Iceberg catalogs allow arbitrary key-value properties to be attached at the schema (namespace) level. These properties are typically used to record ownership, descriptions, default storage locations, or any other metadata that applies to every table in a schema.

v1.5.3 provides three new table functions for managing schema properties:

Function	Purpose
`set_iceberg_schema_properties`	Set schema properties
`iceberg_schema_properties`	Read schema properties
`remove_iceberg_schema_properties`	Remove schema properties

Code Example

-- Set schema properties
CALL set_iceberg_schema_properties(my_datalake.default, {
    'owner': 'analytics-team',
    'description': 'Default analytics schema'
});

-- Read schema properties
SELECT * FROM iceberg_schema_properties(my_datalake.default);

┌─────────────┬──────────────────────────┐
│     key     │          value           │
│   varchar   │         varchar          │
├─────────────┼──────────────────────────┤
│ owner       │ analytics-team           │
│ description │ Default analytics schema │
└─────────────┴──────────────────────────┘

-- Remove schema properties
CALL remove_iceberg_schema_properties(
    my_datalake.default,
    ['description']
);

Schema properties are written through the Iceberg REST Catalog, so any other Iceberg-aware engine attached to the same catalog will see the updates immediately. The returned value is the number of remaining schema properties.

5. Iceberg V3 Format Support

New V3 Features

The Iceberg V3 specification introduces several important new features that DuckDB-Iceberg now supports for both reads and writes:

Feature	Description
`VARIANT` data type	Semi-structured data support, similar to JSON
`TIMESTAMP_NS` data type	Nanosecond-precision timestamps
Schema-level default values	Default value definitions for columns
Binary deletion vectors	More compact than V2 positional delete files
Row lineage tracking	Data row origin tracking

The most impactful change in practice is binary deletion vectors. In V2 tables, DuckDB-Iceberg writes positional deletes as Parquet files; in V3 tables, the same information is encoded as a much more compact binary deletion vector (Puffin file). DuckDB automatically picks the right format based on the table’s format-version.

Code Example

-- Create a V3 table
CREATE TABLE my_datalake.default.v3_table
WITH ('format-version' = 3) AS
    FROM (VALUES
        (1, {'kind': 'click', 'x': 10}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.123456789'),
        (2, {'kind': 'view'}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.987654321')
    ) t(id, payload, event_time);

-- Delete operations on V3 tables use binary deletion vectors
DELETE FROM my_datalake.default.v3_table
WHERE id = 1;

SELECT * FROM my_datalake.default.v3_table;

┌───────┬──────────────────┬───────────────────────────────┐
│  id   │     payload      │          event_time           │
│ int32 │     variant      │         timestamp_ns          │
├───────┼──────────────────┼───────────────────────────────┤
│     2 │ {"kind": "view"} │ 2026-05-20 12:00:00.987654321 │
└───────┴──────────────────┴───────────────────────────────┘

Verify the deletion vector format using iceberg_metadata:

SELECT manifest_content, content, file_format
FROM iceberg_metadata(my_datalake.default.v3_table);

┌──────────────────┬──────────────────┬─────────────┐
│ manifest_content │     content      │ file_format │
│     varchar      │     varchar      │   varchar   │
├──────────────────┼──────────────────┼─────────────┤
│ DATA             │ EXISTING         │ parquet     │
│ DELETE           │ POSITION_DELETES │ puffin      │
└──────────────────┴──────────────────┴─────────────┘

Note: The GEOMETRY type and Unknown type are not yet supported in DuckDB-Iceberg; these are planned for DuckDB v2.0.0.

6. Comparison with Traditional Solutions

Feature	DuckDB-Iceberg v1.5.3	Apache Spark + Iceberg	Trino + Iceberg
Deployment Complexity	Embedded, no cluster	Requires Spark cluster	Requires Trino cluster
Query Startup Time	Milliseconds	Seconds (Executor startup)	Seconds
MERGE INTO	✅ Full support	✅ Full support	✅ Full support
ALTER TABLE	✅ Full support	✅ Full support	✅ Full support
V3 Format	✅ Read + Write	✅ Read + Write	⚠️ Partial
bucket/truncate	✅ Write + Read	✅ Full support	✅ Full support
Schema Properties	✅ Full management	✅ Full management	✅ Read support
VARIANT type	✅ V3 support	❌ Not native	❌ Not native
Installation Size	~50MB	Several GB	Hundreds of MB
Single-node Large Data	✅ Excellent (vectorized)	⚠️ Requires distributed	⚠️ Requires distributed
Python Integration	✅ Native support	✅ PySpark	❌ JDBC required

DuckDB-Iceberg’s biggest advantage is the embedded architecture that delivers low-latency experiences. No cluster startup is needed — a single ATTACH command connects to any Iceberg REST Catalog, and you can immediately query and write using familiar SQL.

7. Roadmap

According to the official blog, future development priorities for DuckDB-Iceberg include:

Further UPDATE, DELETE, MERGE optimizations: Improving write performance
GEOMETRY type support in DuckDB v2.0.0: Completing Iceberg type coverage
Deeper Quack protocol integration: Remote Iceberg table access via Quack
Iceberg-based Catalog support in DuckLake: Unified lakehouse management

8. Monetization Recommendations

The new DuckDB-Iceberg v1.5.3 features open several monetization paths for data teams:

1. Lightweight Data Lakehouse Query Service Leverage DuckDB’s embedded nature to offer lightweight Iceberg query services for small-to-medium teams as an alternative to Spark/Trino. Charge by query volume or concurrent users. $500-$2,000/month per team.

2. Data Pipeline Automation Tools Build automated ETL/ELT tools based on MERGE INTO and ALTER TABLE to help enterprises manage incremental updates and schema evolution in Iceberg tables. SaaS subscription, $200-$800/month.

3. Training and Consulting Provide enterprise training on DuckDB-Iceberg best practices, performance tuning, and architecture design. $3,000-$8,000 per workshop.

4. Open-Source Ecosystem Tools Develop management UIs, monitoring dashboards, or CI/CD integration tools for DuckDB-Iceberg. Monetize through an open-source community edition + enterprise edition with advanced features.

5. Cloud-Native Data Lakehouse Service Deploy managed DuckDB-Iceberg query services on AWS/GCP/Azure using serverless architecture for cost advantages. Target 30-50% profit margins.

Conclusion

The Iceberg extension update in DuckDB v1.5.3 is a milestone. MERGE INTO provides atomic Upsert semantics, ALTER TABLE removes schema evolution limitations, bucket/truncate partition transforms deliver more flexible data layout strategies, and Iceberg V3 support pushes DuckDB to the forefront of lakehouse technology.

For data engineers and architects, this means an embedded database under 50MB can now handle Iceberg management tasks that previously required distributed clusters. Whether for local development, CI/CD testing, or small-scale production environments, DuckDB-Iceberg is an extremely compelling choice.

Upgrade to DuckDB v1.5.3 today and experience these new features firsthand!

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.