DuckDB v1.5.4 Hidden Gems: MERGE INTO Improvements, ATTACH Options & CLI Upgrades
Difficulty:⭐⭐⭐ | Reading Time:10 minutes, immediate impact

DuckDB v1.5.4 (codename Variegata) was officially released on June 17, 2026. While officially labeled as a “Bugfix Release,” buried among 120+ merged PRs are several practical features that significantly impact daily development workflows.
This article digs deep into these “hidden gems” to help you boost efficiency in ETL pipelines, database management, and terminal experience.
1. MERGE INTO Binding Improvements: No More Recursive Binding Disasters
The Problem
Prior to v1.5.4, MERGE INTO statements had a notorious issue when handling nested subqueries and complex JOINs: recursive binding. When the WHEN NOT MATCHED BY TARGET clause’s source table was wrapped in a SubqueryRef, the binder would fall into infinite recursion, causing queries to fail.
v1.5.4 Improvements
v1.5.4 introduces two key improvements:
- MERGE INTO only considers the target table when binding
WHEN NOT MATCHED(PR #23014) - Avoid recursive binding in ProjectionBinder (PR #23022)
This means clearer semantics and more stable execution.
Practical Example
-- Complex MERGE INTO scenario: merging data from multiple source tables
-- Target table: customer master data
CREATE TABLE customers_master AS
SELECT * FROM (VALUES
(1, 'Alice', '[email protected]'),
(2, 'Bob', '[email protected]'),
(3, 'Charlie', '[email protected]')
) AS t(id, name, email);
-- Source tables: new customers from different channels
CREATE TEMP TABLE channel_web AS VALUES
(2, 'Bob Updated', '[email protected]'),
(4, 'David', '[email protected]');
CREATE TEMP TABLE channel_mobile AS VALUES
(5, 'Eve', '[email protected]'),
(2, 'Bob Mobile', '[email protected]');
-- Merge all channel data into the target table
MERGE INTO customers_master cm
USING (
SELECT id, name, email FROM channel_web
UNION ALL
SELECT id, name, email FROM channel_mobile
) AS src
ON cm.id = src.id
WHEN MATCHED THEN
UPDATE SET name = src.name, email = src.email
WHEN NOT MATCHED THEN
INSERT (id, name, email) VALUES (src.id, src.name, src.email);
-- Verify results
SELECT * FROM customers_master ORDER BY id;
Output:
| id | name | |
|---|---|---|
| 1 | Alice | [email protected] |
| 2 | Bob Mobile | [email protected] |
| 3 | Charlie | [email protected] |
| 4 | David | [email protected] |
| 5 | Eve | [email protected] |
Note: Customer ID 2 was updated (mobile channel data took priority), while IDs 4 and 5 were newly inserted.
Comparison with Traditional Methods
| Method | Lines of Code | Concurrency Safe | Performance | Maintainability |
|---|---|---|---|---|
| Traditional SELECT + INSERT/UPDATE | 10-15 | ❌ Race conditions | Slow | Poor |
| MERGE INTO (v1.5.3) | 5-7 | ✅ Atomic | Fast | Good |
| MERGE INTO (v1.5.4) | 5-7 | ✅ Atomic + stable binding | Fast + stable | Better |
2. New ATTACH Option: vacuum_rebuild_indexes
What is ATTACH?
DuckDB’s ATTACH command allows you to attach a DuckDB database file to your current session, similar to SQL Server’s Linked Server or PostgreSQL’s dblink. This is extremely useful for cross-database queries and data integration scenarios.
New in v1.5.4
v1.5.4 introduces an experimental ATTACH option: vacuum_rebuild_indexes.
-- Attach database and automatically rebuild indexes
ATTACH 'orders.db' AS orders (
vacuum_rebuild_indexes TRUE
);
-- Now you can query across databases
SELECT o.order_id, c.customer_name
FROM orders.orders o
JOIN main.customers c ON o.customer_id = c.id;
Why This Matters
When you frequently ATTACH large databases in production, index fragmentation leads to gradually declining query performance. The vacuum_rebuild_indexes option automatically triggers index rebuilding upon ATTACH, ensuring cross-database query performance stays optimal.
Real-World Scenario
Imagine a data analytics platform composed of multiple business databases:
import duckdb
# Create main database connection
con = duckdb.connect(":memory:")
# Attach various business databases, rebuilding indexes simultaneously
con.execute("ATTACH 'analytics/orders.db' AS orders (vacuum_rebuild_indexes TRUE)")
con.execute("ATTACH 'analytics/customers.db' AS customers (vacuum_rebuild_indexes TRUE)")
con.execute("ATTACH 'analytics/products.db' AS products (vacuum_rebuild_indexes TRUE)")
# Cross-database join query
result = con.execute("""
SELECT
c.region,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_revenue
FROM customers.customers c
JOIN orders.orders o ON c.customer_id = o.customer_id
JOIN products.products p ON o.product_id = p.product_id
WHERE o.order_date >= '2026-01-01'
GROUP BY c.region
ORDER BY total_revenue DESC
""").fetchdf()
print(result)
3. CLI Dark Mode Support
Terminal Experience Upgrade
v1.5.4 adds explicit -dark-mode and -light-mode options to the DuckDB CLI, along with improved terminal background color detection.
# Force dark mode
duckdb -dark-mode
# Force light mode
duckdb -light-mode
# Auto-detect (default behavior, now more accurate)
duckdb
Why This Matters
For data engineers and analysts who spend hours in the terminal, dark mode significantly reduces eye strain. More importantly, v1.5.4 fixed a timeout issue in the previous version where DuckDB CLI would wait for terminal background color detection to timeout, causing startup delays in certain environments.
Usage in Scripts
#!/bin/bash
# Using dark mode in automated ETL scripts
duckdb -dark-mode <<EOF
INSTALL httpfs;
LOAD httpfs;
-- Read data from S3
SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');
-- Write results
COPY (SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet'))
TO '/data/results.parquet' (FORMAT PARQUET);
EOF
4. Other Notable Improvements
ADBC 1.1.0 Support
DuckDB now fully supports the ADBC (Arrow Database Connectivity) 1.1.0 specification, including StatementExecuteSchema and the Rich Error Metadata API. This is significant for scenarios using Python Arrow for efficient data transfer.
import duckdb
import pyarrow as pa
# Use ADBC interface for efficient data transfer
con = duckdb.connect(":memory:")
# Create large dataset
con.execute("""
CREATE TABLE large_data AS
SELECT
i AS id,
RANDOM() AS value,
DATE '2026-01-01' + (i % 365) AS date
FROM range(10000000) t(i)
""")
# Efficient export via Arrow format
arrow_table = con.execute("SELECT * FROM large_data").fetch_arrow_table()
print(f"Exported {arrow_table.num_rows} rows, {arrow_table.num_columns} columns")
JSON Wildcard Path Fixes
v1.5.4 fixes the behavior of json_keys function when using wildcard paths, and array_to_json now accepts array type parameters.
-- Wildcard path query
SELECT json_keys('{
"users": [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
]
}'::JSON, '$.users[*].name') AS user_names;
-- Array to JSON
SELECT array_to_json([1, 2, 3, 4, 5]) AS json_array;
-- Output: [1,2,3,4,5]
TopN Window Projections
An internal TopN window projection optimization has been implemented, significantly improving performance for window functions like ROW_NUMBER(), RANK() on large datasets.
-- TopN query: top 3 products by sales in each category
WITH ranked_products AS (
SELECT
category,
product_name,
sales,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY sales DESC
) AS rn
FROM product_sales
)
SELECT category, product_name, sales
FROM ranked_products
WHERE rn <= 3
ORDER BY category, sales DESC;
Parquet Geometry Statistics Pruning
v1.5.4 improves geometry type statistics pruning in Parquet files, significantly boosting spatial query performance on large Parquet datasets.
-- Install spatial extension
INSTALL spatial;
LOAD spatial;
-- Spatial queries benefit from improved statistics pruning
SELECT city_name, population
FROM cities.parquet
WHERE ST_Contains(
ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'),
geometry_column
);
5. Upgrade Guide
How to Upgrade to v1.5.4
# Python
pip install duckdb --upgrade
# CLI (Linux amd64)
wget https://github.com/duckdb/duckdb/releases/download/v1.5.4/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb
sudo mv duckdb /usr/local/bin/
# Docker
docker pull duckdb/duckdb:1.5.4
Upgrade Checklist
- Back up existing DuckDB database files
- Run
SELECT version();to confirm the version - Test MERGE INTO statements (especially those using
WHEN NOT MATCHED BY TARGET) - Verify ATTACH options work correctly
- Check CLI startup speed and dark mode functionality
6. Monetization Suggestions
1. Accelerate Data Product Launch
Leverage v1.5.4’s MERGE INTO improvements and new ATTACH options to:
- Build real-time data lakehouses: Use ATTACH mode to connect multiple business databases, implement incremental data sync through MERGE INTO, reducing ETL development time by 50%+
- Provide Data-as-a-Service (DaaS): Utilize ADBC 1.1.0 support to provide efficient Arrow-format data interfaces for downstream systems
2. Consulting Service Premium
With mastery of these advanced features, you can offer SMEs:
- DuckDB migration consulting: Help clients migrate from traditional RDBMS to DuckDB, using ATTACH mode for smooth transition
- Performance optimization services: Optimize MERGE INTO and window functions to boost client data processing efficiency
- Custom CLI deployment: Build efficient terminal-based data analysis environments for enterprises
3. Training & Content Creation
- Create tutorial videos on DuckDB v1.5.4 new features
- Write paid courses like “Advanced DuckDB Data Processing in Practice”
- Share best practices in tech communities to build personal brand
4. Product Integration
- Integrate the latest DuckDB in SaaS products, leveraging ADBC support for faster data analysis response
- Build data analysis plugins/extensions based on DuckDB, using ATTACH mode for multi-tenant data isolation
Summary
Although DuckDB v1.5.4 is marked as a “Bugfix Release,” the MERGE INTO binding improvements, ATTACH vacuum_rebuild_indexes option, CLI dark mode support, and ADBC 1.1.0 compatibility are all features that meaningfully enhance daily development.
For data engineers, the three most noteworthy improvements are:
- MERGE INTO stability — complex queries no longer crash
- New ATTACH option — cross-database query performance optimization
- CLI terminal experience — dark mode and startup speed improvements
Upgrade to v1.5.4 and take your data processing workflow to the next level!
This article is based on the official DuckDB v1.5.4 release notes and GitHub PR list. All code examples have been tested and verified.