Introduction
On May 20, 2026, DuckDB released version v1.5.3, accompanied by the announcement of DuckCon #7 in Amsterdam. While the core DuckDB release focused primarily on bug fixes, the accompanying DuckDB-Iceberg extension delivered a stunning set of new features — described by the official team as “Part 2 of our Iceberg Writes in DuckDB series.”
In the v1.4 era, DuckDB’s support for Iceberg centered mainly on read capabilities. From v1.5.0’s DuckLake v1.0 standard to v1.5.3, DuckDB-Iceberg has evolved from a “read extension” into a fully-featured lakehouse format write engine.
This article provides an in-depth exploration of all core new features in DuckDB-Iceberg v1.5.3, with complete SQL examples to help you get started quickly.
1. Quick Start: Connecting to Iceberg REST Catalogs
Before exploring the new features, you need to connect to your Iceberg REST Catalog. DuckDB-Iceberg supports multiple catalog backends:
- Apache Polaris — Apache Foundation’s recommended unified catalog service
- Lakekeeper — High-performance cloud-native Iceberg catalog
- Amazon S3 Tables — AWS’s native S3 table support
The connection command looks like this:
ATTACH 'warehouse_name' AS my_datalake (
TYPE iceberg,
other options
);
Once connected, you can operate on Iceberg tables using standard SQL syntax — no new API to learn.
2. Full MERGE INTO Support — The Ultimate UPSERT Solution for Lakehouse Formats
DuckDB’s MERGE INTO statement is the recommended way to handle UPSERT (insert or update) operations. For lakehouse formats like Iceberg — which have no primary key constraints — MERGE INTO is especially critical.
2.1 Basic Usage
-- Create the target table
CREATE TABLE my_datalake.default.people (
id INTEGER,
name VARCHAR,
salary FLOAT
);
INSERT INTO my_datalake.default.people
VALUES (1, 'John', 92_000.0), (2, 'Anna', 100_000.0);
Result:
┌───────┬─────────┬──────────┐
│ id │ name │ salary │
│ int32 │ varchar │ float │
├───────┼─────────┼──────────┤
│ 1 │ John │ 92000.0 │
│ 2 │ Anna │ 100000.0 │
└───────┴─────────┴──────────┘
2.2 Executing UPSERT Operations
MERGE INTO my_datalake.default.people AS target
USING (
FROM (VALUES
(1, 'John', 105_000.0),
(3, 'Sarah', 95_000.0)
) t(id, name, salary)
) AS upserts
ON (upserts.id = target.id)
WHEN MATCHED THEN UPDATE
WHEN NOT MATCHED THEN INSERT;
Result after MERGE:
┌───────┬─────────┬──────────┐
│ id │ name │ salary │
│ int32 │ varchar │ float │
├───────┼─────────┼──────────┤
│ 1 │ John │ 105000.0 │
│ 2 │ Anna │ 100000.0 │
│ 3 │ Sarah │ 95000.0 │
└───────┴─────────┴──────────┘
2.3 Combined Delete Operations
MERGE INTO can also include deletion logic within the same statement:
MERGE INTO my_datalake.default.people AS target
USING (VALUES (1, NULL, NULL)) AS src(id, name, salary)
ON (src.id = target.id)
WHEN MATCHED THEN DELETE;
Under the hood, MERGE INTO uses Merge-on-Read (MoR) semantics. For UPDATE and DELETE operations, it writes positional delete files without rewriting entire data files.
3. ALTER TABLE — Embrace Dynamic Schema Evolution
In v1.4, schema evolution of Iceberg tables was a documented but unimplemented limitation. v1.5.3 fills this gap, supporting the most common schema operations.
3.1 Complete Operation Example
-- Create initial table
CREATE TABLE my_datalake.default.simple_table AS
FROM (VALUES
(1, 'Andy'),
(2, 'Bob'),
(3, 'Claire'),
(4, 'Mr. Duck')) t(col1, col2);
-- Rename the table
ALTER TABLE my_datalake.default.simple_table
RENAME TO renamed_table;
-- Add a column
ALTER TABLE my_datalake.default.renamed_table
ADD COLUMN col3 DOUBLE;
-- Rename a column
ALTER TABLE my_datalake.default.renamed_table
RENAME COLUMN col2 TO name;
-- Drop a column
ALTER TABLE my_datalake.default.renamed_table
DROP COLUMN col3;
-- Set the format version
ALTER TABLE my_datalake.default.renamed_table
SET ('format-version' = 3);
3.2 How It Works Under the Hood
Each ALTER TABLE operation updates the Iceberg table’s current-schema-id. Since Iceberg schema evolution is a pure metadata operation, no data files are rewritten, making it extremely fast with zero impact on query performance.
Changes become immediately visible to other Iceberg-aware engines connected to the same catalog.
4. Partition Transforms — bucket and truncate Support
The Iceberg specification defines several partition transforms that determine how data files are laid out on disk. v1.5.3 adds support for bucket and truncate transforms.
4.1 Bucket Transform
bucket(N, col) hashes a column’s value into N buckets, ideal for stable partitioning on high-cardinality columns:
CREATE TABLE my_datalake.default.events (
event_id BIGINT,
user_id BIGINT,
country VARCHAR,
payload VARCHAR
)
PARTITIONED BY (bucket(16, user_id), truncate(2, country));
INSERT INTO my_datalake.default.events
VALUES
(1, 1001, 'United States', 'click'),
(2, 1002, 'United Kingdom', 'view'),
(3, 1003, 'Germany', 'click'),
(4, 1004, 'Netherlands', 'view');
4.2 Truncate Transform
truncate(W, col) groups rows by the first W characters (or rounds numeric columns down to multiples of W), ideal for prefix-based partitioning scenarios.
4.3 Verify Partition Results
SELECT file_path, record_count
FROM iceberg_metadata(my_datalake.default.events)
WHERE content = 'EXISTING';
Updates and deletes against bucket/truncate-partitioned tables are also supported, using positional deletes under MoR semantics.
5. Iceberg Schema Properties — Namespace-Level Metadata Management
Iceberg catalogs allow arbitrary key-value properties to be attached at the namespace (schema) level. These properties are typically used for ownership records, descriptions, default storage locations, and other metadata.
5.1 Core Functions
DuckDB-Iceberg v1.5.3 provides three dedicated functions:
| Function | Purpose |
|---|---|
iceberg_schema_properties(ns) | Read namespace properties |
set_iceberg_schema_properties(ns, props) | Set/update properties |
remove_iceberg_schema_properties(ns, keys) | Remove specified properties |
5.2 Usage Example
-- Set namespace properties
CALL set_iceberg_schema_properties(my_datalake.default, {
'owner': 'analytics-team',
'description': 'Default analytics schema'
});
-- View properties
SELECT * FROM iceberg_schema_properties(my_datalake.default);
┌─────────────┬──────────────────────────┐
│ key │ value │
│ varchar │ varchar │
├─────────────┼──────────────────────────┤
│ owner │ analytics-team │
│ description │ Default analytics schema │
└─────────────┴──────────────────────────┘
-- Remove a property
CALL remove_iceberg_schema_properties(
my_datalake.default,
['description']
);
Properties are written through the Iceberg REST Catalog, so any other Iceberg-aware engine attached to the same catalog will see the updates immediately.
6. Iceberg V3 Spec Support — The Next-Gen Lakehouse Format
The Iceberg v3 specification introduces several major improvements. DuckDB-Iceberg v1.5.3 fully supports reading and writing V3 tables.
6.1 V3 Core Features
| Feature | Description |
|---|---|
VARIANT type | Native support for semi-structured data storage |
TIMESTAMP_NS type | Nanosecond-level timestamp precision |
| Column default values | Schema-level default value definitions |
| Binary deletion vectors | Puffin format replaces Parquet delete files |
| Row lineage tracking | Trace data row origins |
6.2 Binary Deletion Vectors
This is V3’s most significant improvement. In V2 tables, DuckDB-Iceberg writes deletions as Parquet files; in V3 tables, the same information is encoded as a much more compact binary deletion vector (Puffin file).
-- Create a V3 table
CREATE TABLE my_datalake.default.v3_table
WITH ('format-version' = 3) AS
FROM (VALUES
(1, {'kind': 'click', 'x': 10}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.123456789'),
(2, {'kind': 'view'}::VARIANT, TIMESTAMP_NS '2026-05-20 12:00:00.987654321')
) t(id, payload, event_time);
-- Delete data (V3 tables auto-write binary deletion vectors)
DELETE FROM my_datalake.default.v3_table
WHERE id = 1;
SELECT * FROM my_datalake.default.v3_table;
┌───────┬──────────────────┬───────────────────────────────┐
│ id │ payload │ event_time │
│ int32 │ variant │ timestamp_ns │
├───────┼──────────────────┼───────────────────────────────┤
│ 2 │ {"kind": "view"} │ 2026-05-20 12:00:00.987654321 │
└───────┴──────────────────┴───────────────────────────────┘
Query the metadata to see that the delete was written as a Puffin file:
SELECT manifest_content, content, file_format
FROM iceberg_metadata(my_datalake.default.v3_table);
┌──────────────────┬──────────────────┬─────────────┐
│ manifest_content │ content │ file_format │
│ varchar │ varchar │ varchar │
├──────────────────┼──────────────────┼─────────────┤
│ DATA │ EXISTING │ parquet │
│ DELETE │ POSITION_DELETES │ puffin │
└──────────────────┴──────────────────┴─────────────┘
DuckDB automatically selects the correct write format based on the table’s format-version.
7. Comparison with Traditional Tools
| Feature | DuckDB-Iceberg v1.5.3 | Apache Spark | Delta Lake (Spark) | AWS Glue |
|---|---|---|---|---|
| Deployment Complexity | ⭐ Zero cluster | ⭐⭐⭐⭐⭐ YARN/K8s | ⭐⭐⭐⭐⭐ Spark cluster | ⭐⭐⭐⭐ Cloud-managed |
| MERGE INTO | ✅ Full support | ✅ DataFrame API | ✅ Full support | ⚠️ Limited |
| ALTER TABLE | ✅ Schema evolution | ⚠️ Requires rewrite | ✅ Full support | ⚠️ Limited |
| Partition Transforms | ✅ bucket/truncate | ✅ All transforms | ✅ All transforms | ⚠️ Limited |
| V3 Support | ✅ Read & write | ⚠️ Partial | ⚠️ Partial | ❌ |
| Binary Deletion Vectors | ✅ Puffin | ❌ | ❌ | ❌ |
| Schema Properties | ✅ Native functions | ⚠️ API | ⚠️ Delta props | ❌ |
| Query Performance | ⭐⭐⭐⭐⭐ Native | ⭐⭐ Startup overhead | ⭐⭐ Startup overhead | ⭐⭐ Cloud latency |
| Learning Curve | ⭐ Pure SQL | ⭐⭐⭐ DataFrame | ⭐⭐⭐ Spark SQL | ⭐⭐⭐ Console |
8. Monetization Strategies
8.1 Data Service Product Lines
Leveraging DuckDB-Iceberg’s low-ops-cost characteristics, you can build the following high-margin data services:
- Enterprise Data Lake Managed Service — Deploy and manage Iceberg data lakes for clients, using DuckDB’s zero-ops model to reduce costs to under 30% of Spark-based solutions
- Real-Time Data Sync SaaS — Build real-time data sync pipelines for e-commerce/finance using MERGE INTO upsert capabilities
- Data Lake Migration Consulting — Help enterprises migrate from traditional Hive/Parquet to Iceberg V3, using DuckDB’s compatibility for lossless migration
8.2 Tech Stack Recommendations
| Business Scenario | DuckDB + Iceberg + | Target Customer |
|---|---|---|
| Real-Time Data Lake | Debezium + Kafka | E-commerce Platforms |
| BI Analysis Layer | Metabase/Superset | Small/Medium Businesses |
| AI Data Pipeline | DuckDB-Variant + Lance | AI Startups |
| Data Governance | Apache Polaris + Ranger | Financial Institutions |
8.3 Monetization Roadmap
Phase 1 (0-3 months): Build technical demos + blog-driven traffic
└── Publish DuckDB-Iceberg tutorial series to build SEO traffic
Phase 2 (3-6 months): Launch standardized data lake solutions
└── Productize deployment packages based on DuckDB-Iceberg
Phase 3 (6-12 months): Build data lake management platform SaaS
└── Wrap DuckDB-Iceberg management operations into a Web platform
Conclusion
DuckDB-Iceberg in v1.5.3 has evolved from a “read-first” extension into a fully-featured lakehouse format write engine. MERGE INTO, ALTER TABLE, bucket/truncate partition transforms, V3 spec support, and Schema Properties fill the previously critical gaps with the Spark/Delta ecosystem.
With the official release of the DuckLake v1.0 standard and the Quack client-server protocol, DuckDB is building a complete self-hosted lakehouse ecosystem. For teams seeking low operational overhead and high query performance, DuckDB-Iceberg is now a mature choice.
⚠️ Note:
GEOGRAPHYandUNKNOWNtypes are not yet supported in DuckDB-Iceberg and are planned for DuckDB v2.0.0.
If you want specific features prioritized, reach out in the DuckDB-Iceberg GitHub repository or contact DuckLabs engineering team directly.
