Featured image of post DuckDB-Iceberg v1.5.3 Advanced Features: MERGE INTO, ALTER TABLE, and Full Iceberg V3 Support

DuckDB-Iceberg v1.5.3 Advanced Features: MERGE INTO, ALTER TABLE, and Full Iceberg V3 Support

DuckDB v1.5.3 brings major DuckDB-Iceberg extension updates: full MERGE INTO support, ALTER TABLE schema evolution, bucket/truncate partition transforms, Iceberg V3 spec compliance, and schema properties. Deep dive with executable SQL examples.

Introduction

On May 20, 2026, DuckDB released version v1.5.3, accompanied by the announcement of DuckCon #7 in Amsterdam. While the core DuckDB release focused primarily on bug fixes, the accompanying DuckDB-Iceberg extension delivered a stunning set of new features — described by the official team as “Part 2 of our Iceberg Writes in DuckDB series.”

In the v1.4 era, DuckDB’s support for Iceberg centered mainly on read capabilities. From v1.5.0’s DuckLake v1.0 standard to v1.5.3, DuckDB-Iceberg has evolved from a “read extension” into a fully-featured lakehouse format write engine.

This article provides an in-depth exploration of all core new features in DuckDB-Iceberg v1.5.3, with complete SQL examples to help you get started quickly.

1. Quick Start: Connecting to Iceberg REST Catalogs

Before exploring the new features, you need to connect to your Iceberg REST Catalog. DuckDB-Iceberg supports multiple catalog backends:

  • Apache Polaris — Apache Foundation’s recommended unified catalog service
  • Lakekeeper — High-performance cloud-native Iceberg catalog
  • Amazon S3 Tables — AWS’s native S3 table support

The connection command looks like this:

ATTACH 'warehouse_name' AS my_datalake (
    TYPE iceberg,
    other options
);

Once connected, you can operate on Iceberg tables using standard SQL syntax — no new API to learn.

2. Full MERGE INTO Support — The Ultimate UPSERT Solution for Lakehouse Formats

DuckDB’s MERGE INTO statement is the recommended way to handle UPSERT (insert or update) operations. For lakehouse formats like Iceberg — which have no primary key constraintsMERGE INTO is especially critical.

2.1 Basic Usage

-- Create the target table
CREATE TABLE my_datalake.default.people (
    id INTEGER,
    name VARCHAR,
    salary FLOAT
);

INSERT INTO my_datalake.default.people
    VALUES (1, 'John', 92_000.0), (2, 'Anna', 100_000.0);

Result:

┌───────┬─────────┬──────────┐
│  id   │  name   │  salary  │
│ int32 │ varchar │  float   │
├───────┼─────────┼──────────┤
│     1 │ John    │  92000.0 │
│     2 │ Anna    │ 100000.0 │
└───────┴─────────┴──────────┘

2.2 Executing UPSERT Operations

MERGE INTO my_datalake.default.people AS target
    USING (
        FROM (VALUES
            (1, 'John', 105_000.0),
            (3, 'Sarah', 95_000.0)
        ) t(id, name, salary)
    ) AS upserts
    ON (upserts.id = target.id)
    WHEN MATCHED THEN UPDATE
    WHEN NOT MATCHED THEN INSERT;

Result after MERGE:

┌───────┬─────────┬──────────┐
│  id   │  name   │  salary  │
│ int32 │ varchar │  float   │
├───────┼─────────┼──────────┤
│     1 │ John    │ 105000.0 │
│     2 │ Anna    │ 100000.0 │
│     3 │ Sarah   │  95000.0 │
└───────┴─────────┴──────────┘

2.3 Combined Delete Operations

MERGE INTO can also include deletion logic within the same statement:

MERGE INTO my_datalake.default.people AS target
    USING (VALUES (1, NULL, NULL)) AS src(id, name, salary)
    ON (src.id = target.id)
    WHEN MATCHED THEN DELETE;

Under the hood, MERGE INTO uses Merge-on-Read (MoR) semantics. For UPDATE and DELETE operations, it writes positional delete files without rewriting entire data files.

3. ALTER TABLE — Embrace Dynamic Schema Evolution

In v1.4, schema evolution of Iceberg tables was a documented but unimplemented limitation. v1.5.3 fills this gap, supporting the most common schema operations.

3.1 Complete Operation Example

-- Create initial table
CREATE TABLE my_datalake.default.simple_table AS
    FROM (VALUES
        (1, 'Andy'),
        (2, 'Bob'),
        (3, 'Claire'),
        (4, 'Mr. Duck')) t(col1, col2);

-- Rename the table
ALTER TABLE my_datalake.default.simple_table
    RENAME TO renamed_table;

-- Add a column
ALTER TABLE my_datalake.default.renamed_table
    ADD COLUMN col3 DOUBLE;

-- Rename a column
ALTER TABLE my_datalake.default.renamed_table
    RENAME COLUMN col2 TO name;

-- Drop a column
ALTER TABLE my_datalake.default.renamed_table
    DROP COLUMN col3;

-- Set the format version
ALTER TABLE my_datalake.default.renamed_table
    SET ('format-version' = 3);

3.2 How It Works Under the Hood

Each ALTER TABLE operation updates the Iceberg table’s current-schema-id. Since Iceberg schema evolution is a pure metadata operation, no data files are rewritten, making it extremely fast with zero impact on query performance.

Changes become immediately visible to other Iceberg-aware engines connected to the same catalog.

4. Partition Transforms — bucket and truncate Support

The Iceberg specification defines several partition transforms that determine how data files are laid out on disk. v1.5.3 adds support for bucket and truncate transforms.

4.1 Bucket Transform

bucket(N, col) hashes a column’s value into N buckets, ideal for stable partitioning on high-cardinality columns:

CREATE TABLE my_datalake.default.events (
    event_id BIGINT,
    user_id BIGINT,
    country VARCHAR,
    payload VARCHAR
)
PARTITIONED BY (bucket(16, user_id), truncate(2, country));

INSERT INTO my_datalake.default.events
    VALUES
        (1, 1001, 'United States', 'click'),
        (2, 1002, 'United Kingdom', 'view'),
        (3, 1003, 'Germany', 'click'),
        (4, 1004, 'Netherlands', 'view');

4.2 Truncate Transform

truncate(W, col) groups rows by the first W characters (or rounds numeric columns down to multiples of W), ideal for prefix-based partitioning scenarios.

4.3 Verify Partition Results

SELECT file_path, record_count
FROM iceberg_metadata(my_datalake.default.events)
WHERE content = 'EXISTING';

Updates and deletes against bucket/truncate-partitioned tables are also supported, using positional deletes under MoR semantics.

5. Iceberg Schema Properties — Namespace-Level Metadata Management

Iceberg catalogs allow arbitrary key-value properties to be attached at the namespace (schema) level. These properties are typically used for ownership records, descriptions, default storage locations, and other metadata.

5.1 Core Functions

DuckDB-Iceberg v1.5.3 provides three dedicated functions:

FunctionPurpose
iceberg_schema_properties(ns)Read namespace properties
set_iceberg_schema_properties(ns, props)Set/update properties
remove_iceberg_schema_properties(ns, keys)Remove specified properties

5.2 Usage Example

-- Set namespace properties
CALL set_iceberg_schema_properties(my_datalake.default, {
    'owner': 'analytics-team',
    'description': 'Default analytics schema'
});

-- View properties
SELECT * FROM iceberg_schema_properties(my_datalake.default);
┌─────────────┬──────────────────────────┐
│     key     │          value           │
│   varchar   │         varchar          │
├─────────────┼──────────────────────────┤
│ owner       │ analytics-team           │
│ description │ Default analytics schema │
└─────────────┴──────────────────────────┘
-- Remove a property
CALL remove_iceberg_schema_properties(
    my_datalake.default,
    ['description']
);

Properties are written through the Iceberg REST Catalog, so any other Iceberg-aware engine attached to the same catalog will see the updates immediately.

6. Iceberg V3 Spec Support — The Next-Gen Lakehouse Format

The Iceberg v3 specification introduces several major improvements. DuckDB-Iceberg v1.5.3 fully supports reading and writing V3 tables.

6.1 V3 Core Features

FeatureDescription
VARIANT typeNative support for semi-structured data storage
TIMESTAMP_NS typeNanosecond-level timestamp precision
Column default valuesSchema-level default value definitions
Binary deletion vectorsPuffin format replaces Parquet delete files
Row lineage trackingTrace data row origins

6.2 Binary Deletion Vectors

This is V3’s most significant improvement. In V2 tables, DuckDB-Iceberg writes deletions as Parquet files; in V3 tables, the same information is encoded as a much more compact binary deletion vector (Puffin file).

-- Create a V3 table
CREATE TABLE my_datalake.default.v3_table
WITH ('format-version' = 3) AS
    FROM (VALUES
        (1, {'kind': 'click', 'x': 10}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.123456789'),
        (2, {'kind': 'view'}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.987654321')
    ) t(id, payload, event_time);

-- Delete data (V3 tables auto-write binary deletion vectors)
DELETE FROM my_datalake.default.v3_table
WHERE id = 1;

SELECT * FROM my_datalake.default.v3_table;
┌───────┬──────────────────┬───────────────────────────────┐
│  id   │     payload      │          event_time           │
│ int32 │     variant      │         timestamp_ns          │
├───────┼──────────────────┼───────────────────────────────┤
│     2 │ {"kind": "view"} │ 2026-05-20 12:00:00.987654321 │
└───────┴──────────────────┴───────────────────────────────┘

Query the metadata to see that the delete was written as a Puffin file:

SELECT manifest_content, content, file_format
FROM iceberg_metadata(my_datalake.default.v3_table);
┌──────────────────┬──────────────────┬─────────────┐
│ manifest_content │     content      │ file_format │
│     varchar      │     varchar      │   varchar   │
├──────────────────┼──────────────────┼─────────────┤
│ DATA             │ EXISTING         │ parquet     │
│ DELETE           │ POSITION_DELETES │ puffin      │
└──────────────────┴──────────────────┴─────────────┘

DuckDB automatically selects the correct write format based on the table’s format-version.

7. Comparison with Traditional Tools

FeatureDuckDB-Iceberg v1.5.3Apache SparkDelta Lake (Spark)AWS Glue
Deployment Complexity⭐ Zero cluster⭐⭐⭐⭐⭐ YARN/K8s⭐⭐⭐⭐⭐ Spark cluster⭐⭐⭐⭐ Cloud-managed
MERGE INTO✅ Full support✅ DataFrame API✅ Full support⚠️ Limited
ALTER TABLE✅ Schema evolution⚠️ Requires rewrite✅ Full support⚠️ Limited
Partition Transforms✅ bucket/truncate✅ All transforms✅ All transforms⚠️ Limited
V3 Support✅ Read & write⚠️ Partial⚠️ Partial
Binary Deletion Vectors✅ Puffin
Schema Properties✅ Native functions⚠️ API⚠️ Delta props
Query Performance⭐⭐⭐⭐⭐ Native⭐⭐ Startup overhead⭐⭐ Startup overhead⭐⭐ Cloud latency
Learning Curve⭐ Pure SQL⭐⭐⭐ DataFrame⭐⭐⭐ Spark SQL⭐⭐⭐ Console

8. Monetization Strategies

8.1 Data Service Product Lines

Leveraging DuckDB-Iceberg’s low-ops-cost characteristics, you can build the following high-margin data services:

  1. Enterprise Data Lake Managed Service — Deploy and manage Iceberg data lakes for clients, using DuckDB’s zero-ops model to reduce costs to under 30% of Spark-based solutions
  2. Real-Time Data Sync SaaS — Build real-time data sync pipelines for e-commerce/finance using MERGE INTO upsert capabilities
  3. Data Lake Migration Consulting — Help enterprises migrate from traditional Hive/Parquet to Iceberg V3, using DuckDB’s compatibility for lossless migration

8.2 Tech Stack Recommendations

Business ScenarioDuckDB + Iceberg +Target Customer
Real-Time Data LakeDebezium + KafkaE-commerce Platforms
BI Analysis LayerMetabase/SupersetSmall/Medium Businesses
AI Data PipelineDuckDB-Variant + LanceAI Startups
Data GovernanceApache Polaris + RangerFinancial Institutions

8.3 Monetization Roadmap

Phase 1 (0-3 months): Build technical demos + blog-driven traffic
    └── Publish DuckDB-Iceberg tutorial series to build SEO traffic

Phase 2 (3-6 months): Launch standardized data lake solutions
    └── Productize deployment packages based on DuckDB-Iceberg

Phase 3 (6-12 months): Build data lake management platform SaaS
    └── Wrap DuckDB-Iceberg management operations into a Web platform

Conclusion

DuckDB-Iceberg in v1.5.3 has evolved from a “read-first” extension into a fully-featured lakehouse format write engine. MERGE INTO, ALTER TABLE, bucket/truncate partition transforms, V3 spec support, and Schema Properties fill the previously critical gaps with the Spark/Delta ecosystem.

With the official release of the DuckLake v1.0 standard and the Quack client-server protocol, DuckDB is building a complete self-hosted lakehouse ecosystem. For teams seeking low operational overhead and high query performance, DuckDB-Iceberg is now a mature choice.

⚠️ Note: GEOGRAPHY and UNKNOWN types are not yet supported in DuckDB-Iceberg and are planned for DuckDB v2.0.0.

If you want specific features prioritized, reach out in the DuckDB-Iceberg GitHub repository or contact DuckLabs engineering team directly.

DuckDB-Iceberg v1.5.3 Architecture

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy