Featured image of post New DuckDB-Iceberg Features: MERGE INTO, ALTER TABLE, and Partition Transforms in v1.5.3

New DuckDB-Iceberg Features: MERGE INTO, ALTER TABLE, and Partition Transforms in v1.5.3

DuckDB v1.5.3 brings a comprehensive set of DuckDB-Iceberg features including MERGE INTO support, ALTER TABLE schema evolution, bucket/truncate partition transforms, Iceberg schema properties, and V3 specification support.

Overview

On May 29, 2026, the DuckDB team published a significant blog post titled “New DuckDB-Iceberg Features in v1.5.3.” While the team is heavily invested in developing DuckLake v1.0 and the Quack protocol, the DuckDB-Iceberg extension has been receiving a steady stream of powerful new features.

This post is essentially “Part 2” to the November 2025 article “Writes in DuckDB-Iceberg.” It demonstrates a suite of new capabilities in v1.5.3 that bring DuckDB-Iceberg significantly closer to a fully-featured data lake engine. In this article, we’ll explore each new feature hands-on and see how they transform your data lake workflow.

Prerequisites: Connecting to an Iceberg REST Catalog

Before diving into the new features, you need to connect to an Iceberg REST Catalog. DuckDB supports multiple catalog backends including Apache Polaris, Lakekeeper, and Amazon S3 Tables.

The connection setup is straightforward:

ATTACH 'warehouse_name' AS my_datalake (
    TYPE iceberg,
    other options
);

This single statement mounts your data lake warehouse as a logical catalog in DuckDB. All subsequent Iceberg operations access tables via the my_datalake.default.table_name naming convention.

Feature 1: MERGE INTO — Full Upsert Support for Data Lakes

In data engineering, merge operations (upserts) are among the most common and complex operations. Unlike traditional databases, Iceberg tables do not support primary key constraints, making DuckDB’s MERGE INTO statement the recommended approach for manipulating Iceberg tables.

Before v1.5.3, you had to simulate upserts using INSERT + DELETE combinations. Now, you can do it in a single, clean statement:

-- Create sample table
CREATE TABLE my_datalake.default.people (
    id INTEGER,
    name VARCHAR,
    salary FLOAT
);

INSERT INTO my_datalake.default.people
    VALUES (1, 'John', 92_000.0), (2, 'Anna', 100_000.0);

Now, execute a merge operation that updates John’s salary and adds a new employee Sarah:

MERGE INTO my_datalake.default.people AS target
USING (
    FROM (VALUES
        (1, 'John', 105_000.0),
        (3, 'Sarah', 95_000.0)
    ) t(id, name, salary)
) AS upserts
ON (upserts.id = target.id)
WHEN MATCHED THEN UPDATE
WHEN NOT MATCHED THEN INSERT;

The result:

┌───────┬─────────┬──────────┐
│  id   │  name   │  salary  │
│ int32 │ varchar │  float   │
├───────┼─────────┼──────────┤
│     1 │ John    │ 105000.0 │
│     2 │ Anna    │ 100000.0 │
│     3 │ Sarah   │  95000.0 │
└───────┴─────────┴──────────┘

Even more powerful, you can combine delete operations in the same statement:

MERGE INTO my_datalake.default.people AS target
USING (VALUES (2)) AS deletes(id)
ON (deletes.id = target.id)
WHEN MATCHED THEN DELETE;

Under the hood, MERGE INTO uses merge-on-read semantics and writes positional deletes to the Iceberg table, ensuring data consistency across all reads.

Feature 2: ALTER TABLE — Schema Evolution for Iceberg Tables

One of Iceberg’s core advantages is schema evolution. Prior to v1.5.3, DuckDB had no ALTER TABLE support for Iceberg tables, which was a significant limitation in data engineering workflows where schema changes are inevitable.

v1.5.3 brings comprehensive ALTER TABLE support:

-- Create initial table
CREATE TABLE my_datalake.default.simple_table AS
    FROM (VALUES
        (1, 'Andy'),
        (2, 'Bob'),
        (3, 'Claire'),
        (4, 'Mr. Duck')
    ) t(col1, col2);

-- Rename the table
ALTER TABLE my_datalake.default.simple_table
    RENAME TO renamed_table;

-- Add a column
ALTER TABLE my_datalake.default.renamed_table
    ADD COLUMN col3 DOUBLE;

-- Rename a column
ALTER TABLE my_datalake.default.renamed_table
    RENAME COLUMN col2 TO name;

-- Drop a column
ALTER TABLE my_datalake.default.renamed_table
    DROP COLUMN col3;

-- Set the format-version
ALTER TABLE my_datalake.default.renamed_table
    SET ('format-version' = 3);

Behind the scenes, each ALTER TABLE statement updates the Iceberg table’s current-schema-id. Other Iceberg-aware engines see the changes on their next LoadTableInformation call. Since Iceberg schema evolution is a metadata-only operation, no data files are rewritten, making it extremely efficient.

Feature 3: Bucket and Truncate Partition Transforms

The Iceberg specification defines several partition transforms that determine how data files are laid out on disk. v1.5.3 adds full support for bucket and truncate partition transforms for creation, insertion, and updates.

Bucket Partitioning

The bucket(N, col) transform hashes a column’s value into N buckets, ideal for stable partitioning on high-cardinality columns:

CREATE TABLE my_datalake.default.events (
    event_id BIGINT,
    user_id BIGINT,
    country VARCHAR,
    payload VARCHAR
)
PARTITIONED BY (bucket(16, user_id), truncate(2, country));

INSERT INTO my_datalake.default.events
    VALUES
        (1, 1001, 'United States', 'click'),
        (2, 1002, 'United Kingdom', 'view'),
        (3, 1003, 'Germany', 'click'),
        (4, 1004, 'Netherlands', 'view');

Truncate Partitioning

The truncate(W, col) transform groups rows by the first W characters (or by a column’s value rounded down to a multiple of W for numeric columns), useful for prefix-based partitioning strategies.

You can inspect the resulting data files using the iceberg_metadata() function:

SELECT file_path, record_count
FROM iceberg_metadata(my_datalake.default.events)
WHERE content = 'EXISTING';

Updates and deletes against bucket- and truncate-partitioned tables are also supported using positional deletes under merge-on-read semantics.

Feature 4: Iceberg Schema Properties

Iceberg catalogs allow arbitrary key-value properties to be attached at the schema (namespace) level. These are typically used to record ownership, descriptions, default storage locations, or any other metadata that applies to every table in a schema.

DuckDB provides three built-in functions for managing these properties:

-- Set schema properties
CALL set_iceberg_schema_properties(my_datalake.default, {
    'owner': 'analytics-team',
    'description': 'Default analytics schema'
});

-- Read schema properties
SELECT * FROM iceberg_schema_properties(my_datalake.default);
-- ┌─────────────┬──────────────────────────┐
-- │     key     │          value           │
-- │   varchar   │         varchar          │
-- ├─────────────┼──────────────────────────┤
-- │ owner       │ analytics-team           │
-- │ description │ Default analytics schema │
-- └─────────────┴──────────────────────────┘

-- Remove schema properties
CALL remove_iceberg_schema_properties(
    my_datalake.default,
    ['description']
);

Schema properties are written through the Iceberg REST Catalog, so any other Iceberg-aware engine connected to the same catalog sees the updates immediately.

Feature 5: Iceberg V3 Specification Support

The Iceberg V3 specification introduces several new features, and DuckDB-Iceberg now provides full read and write support for them. This includes:

  • Enhanced Schema Metadata: More comprehensive table metadata management
  • Identity Partitioning: More straightforward partitioning strategies
  • V3-Specific Features: Additional advanced data lake management capabilities

These additions make DuckDB one of the most comprehensive open-source tools supporting the Iceberg V3 specification.

Comparison with Traditional Data Lake Tools

FeatureDuckDB-Iceberg v1.5.3Apache SparkPresto/Trinodbt
MERGE INTO✅ Native support✅ Supported⚠️ Limited⚠️ SQL-only
ALTER TABLE Schema Evolution✅ Full support⚠️ Partial⚠️ Read-only❌ Not supported
Bucket Partition Transforms✅ Supported✅ Supported✅ Supported❌ Not supported
Truncate Partition Transforms✅ Supported✅ Supported⚠️ Limited❌ Not supported
Schema Properties✅ Supported❌ Not supported❌ Not supported❌ Not supported
Installation⭐ Zero-dependency❌ Heavy cluster⚠️ Requires cluster⚠️ Requires engine
Small Data Performance⭐ Extremely fast❌ High startup cost⚠️ Moderate⚠️ Depends on engine
Memory Usage⭐ 100MB-1GB❌ GB-level⚠️ Hundreds MB⚠️ Depends on engine

Real-World Example: Incremental Data Pipeline

Here’s a complete incremental data pipeline example demonstrating these new features in a production context:

-- 1. Mount Iceberg warehouse
ATTACH 's3://my-warehouse' AS warehouse (
    TYPE iceberg,
    S3_REGION 'us-east-1',
    AWS_ACCESS_KEY_ID '...',
    AWS_SECRET_ACCESS_KEY '...'
);

-- 2. Create partitioned incremental table
CREATE TABLE IF NOT EXISTS warehouse.default.user_events (
    event_id BIGINT,
    user_id BIGINT,
    event_type VARCHAR,
    payload VARCHAR,
    created_at TIMESTAMP
)
PARTITIONED BY (bucket(8, user_id), truncate(2, event_type));

-- 3. Daily incremental merge
MERGE INTO warehouse.default.user_events AS target
USING (
    SELECT * FROM read_parquet('/data/daily/2026-06-07/*.parquet')
) AS source
ON (source.event_id = target.event_id)
WHEN MATCHED THEN UPDATE SET
    event_type = source.event_type,
    payload = source.payload
WHEN NOT MATCHED THEN INSERT;

-- 4. Verify data
SELECT
    user_id,
    COUNT(*) AS event_count,
    SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchases
FROM warehouse.default.user_events
WHERE created_at >= '2026-06-01'
GROUP BY user_id
ORDER BY purchases DESC;

Monetization Strategies

After mastering these DuckDB-Iceberg features, here are several commercialization paths to consider:

  1. Data Consulting Services: Help enterprises build lightweight data lakes using DuckDB + Iceberg, replacing expensive Spark clusters. Many small and medium businesses lack dedicated big data engineers, and DuckDB’s low barrier to entry solves this pain point. Charge 50,000-150,000 RMB per project.

  2. Analytics SaaS Products: Leverage DuckDB’s MERGE INTO and incremental write capabilities to rapidly build real-time analytics dashboards for e-commerce, finance, and other sectors. Subscription model at 99-499 RMB/user/month.

  3. Enterprise Training: Offer DuckDB-Iceberg training courses covering MERGE INTO operations, partition strategies, and performance tuning. Online courses priced at 299-999 RMB, with enterprise on-site training at 20,000-50,000 RMB per session.

  4. Open Source Commercialization: Build an enterprise toolchain around DuckDB-Iceberg (data quality monitoring, automated pipeline orchestration) using an open-core + commercial addon model, following paths like DataCater and dbt Labs.

  5. Data Integration Middleware: Use DuckDB’s multi-format read capabilities and Iceberg’s standardized writes to build ETL middleware from CSV/Parquet/JSON to Iceberg, charged per invocation or subscription.

Summary

The DuckDB-Iceberg extension updates in v1.5.3 mark a pivotal moment in DuckDB’s evolution from a “query engine” to a “complete data lake platform.” MERGE INTO completes the upsert capability gap, ALTER TABLE enables true schema evolution, and partition transforms with schema properties make fine-grained data lake management possible.

For individual developers and small-to-medium businesses looking for a lightweight data lake solution, the DuckDB-Iceberg combination is arguably the best choice available today — zero configuration, high performance, and completely open source.


Based on the DuckDB official blog post New DuckDB-Iceberg Features in v1.5.3, published May 29, 2026.

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy