Featured image of post DuckDB v1.5.4 Hidden Gems: MERGE INTO Improvements, ATTACH Options & CLI Upgrades

DuckDB v1.5.4 Hidden Gems: MERGE INTO Improvements, ATTACH Options & CLI Upgrades

DuckDB v1.5.4 isn't just a bugfix release — discover powerful hidden features including MERGE INTO binding improvements, new ATTACH options, dark mode CLI, and ADBC 1.1.0 support.

DuckDB v1.5.4 Hidden Gems: MERGE INTO Improvements, ATTACH Options & CLI Upgrades

Difficulty:⭐⭐⭐ | Reading Time:10 minutes, immediate impact

DuckDB v1.5.4 Hidden Gems

DuckDB v1.5.4 (codename Variegata) was officially released on June 17, 2026. While officially labeled as a “Bugfix Release,” buried among 120+ merged PRs are several practical features that significantly impact daily development workflows.

This article digs deep into these “hidden gems” to help you boost efficiency in ETL pipelines, database management, and terminal experience.


1. MERGE INTO Binding Improvements: No More Recursive Binding Disasters

The Problem

Prior to v1.5.4, MERGE INTO statements had a notorious issue when handling nested subqueries and complex JOINs: recursive binding. When the WHEN NOT MATCHED BY TARGET clause’s source table was wrapped in a SubqueryRef, the binder would fall into infinite recursion, causing queries to fail.

v1.5.4 Improvements

v1.5.4 introduces two key improvements:

  1. MERGE INTO only considers the target table when binding WHEN NOT MATCHED (PR #23014)
  2. Avoid recursive binding in ProjectionBinder (PR #23022)

This means clearer semantics and more stable execution.

Practical Example

-- Complex MERGE INTO scenario: merging data from multiple source tables

-- Target table: customer master data
CREATE TABLE customers_master AS
SELECT * FROM (VALUES
    (1, 'Alice', '[email protected]'),
    (2, 'Bob', '[email protected]'),
    (3, 'Charlie', '[email protected]')
) AS t(id, name, email);

-- Source tables: new customers from different channels
CREATE TEMP TABLE channel_web AS VALUES
    (2, 'Bob Updated', '[email protected]'),
    (4, 'David', '[email protected]');

CREATE TEMP TABLE channel_mobile AS VALUES
    (5, 'Eve', '[email protected]'),
    (2, 'Bob Mobile', '[email protected]');

-- Merge all channel data into the target table
MERGE INTO customers_master cm
USING (
    SELECT id, name, email FROM channel_web
    UNION ALL
    SELECT id, name, email FROM channel_mobile
) AS src
ON cm.id = src.id
WHEN MATCHED THEN
    UPDATE SET name = src.name, email = src.email
WHEN NOT MATCHED THEN
    INSERT (id, name, email) VALUES (src.id, src.name, src.email);

-- Verify results
SELECT * FROM customers_master ORDER BY id;

Output:

Note: Customer ID 2 was updated (mobile channel data took priority), while IDs 4 and 5 were newly inserted.

Comparison with Traditional Methods

MethodLines of CodeConcurrency SafePerformanceMaintainability
Traditional SELECT + INSERT/UPDATE10-15❌ Race conditionsSlowPoor
MERGE INTO (v1.5.3)5-7✅ AtomicFastGood
MERGE INTO (v1.5.4)5-7✅ Atomic + stable bindingFast + stableBetter

2. New ATTACH Option: vacuum_rebuild_indexes

What is ATTACH?

DuckDB’s ATTACH command allows you to attach a DuckDB database file to your current session, similar to SQL Server’s Linked Server or PostgreSQL’s dblink. This is extremely useful for cross-database queries and data integration scenarios.

New in v1.5.4

v1.5.4 introduces an experimental ATTACH option: vacuum_rebuild_indexes.

-- Attach database and automatically rebuild indexes
ATTACH 'orders.db' AS orders (
    vacuum_rebuild_indexes TRUE
);

-- Now you can query across databases
SELECT o.order_id, c.customer_name
FROM orders.orders o
JOIN main.customers c ON o.customer_id = c.id;

Why This Matters

When you frequently ATTACH large databases in production, index fragmentation leads to gradually declining query performance. The vacuum_rebuild_indexes option automatically triggers index rebuilding upon ATTACH, ensuring cross-database query performance stays optimal.

Real-World Scenario

Imagine a data analytics platform composed of multiple business databases:

import duckdb

# Create main database connection
con = duckdb.connect(":memory:")

# Attach various business databases, rebuilding indexes simultaneously
con.execute("ATTACH 'analytics/orders.db' AS orders (vacuum_rebuild_indexes TRUE)")
con.execute("ATTACH 'analytics/customers.db' AS customers (vacuum_rebuild_indexes TRUE)")
con.execute("ATTACH 'analytics/products.db' AS products (vacuum_rebuild_indexes TRUE)")

# Cross-database join query
result = con.execute("""
    SELECT 
        c.region,
        COUNT(o.order_id) AS order_count,
        SUM(o.amount) AS total_revenue
    FROM customers.customers c
    JOIN orders.orders o ON c.customer_id = o.customer_id
    JOIN products.products p ON o.product_id = p.product_id
    WHERE o.order_date >= '2026-01-01'
    GROUP BY c.region
    ORDER BY total_revenue DESC
""").fetchdf()

print(result)

3. CLI Dark Mode Support

Terminal Experience Upgrade

v1.5.4 adds explicit -dark-mode and -light-mode options to the DuckDB CLI, along with improved terminal background color detection.

# Force dark mode
duckdb -dark-mode

# Force light mode
duckdb -light-mode

# Auto-detect (default behavior, now more accurate)
duckdb

Why This Matters

For data engineers and analysts who spend hours in the terminal, dark mode significantly reduces eye strain. More importantly, v1.5.4 fixed a timeout issue in the previous version where DuckDB CLI would wait for terminal background color detection to timeout, causing startup delays in certain environments.

Usage in Scripts

#!/bin/bash
# Using dark mode in automated ETL scripts

duckdb -dark-mode <<EOF
INSTALL httpfs;
LOAD httpfs;

-- Read data from S3
SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');

-- Write results
COPY (SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet'))
TO '/data/results.parquet' (FORMAT PARQUET);
EOF

4. Other Notable Improvements

ADBC 1.1.0 Support

DuckDB now fully supports the ADBC (Arrow Database Connectivity) 1.1.0 specification, including StatementExecuteSchema and the Rich Error Metadata API. This is significant for scenarios using Python Arrow for efficient data transfer.

import duckdb
import pyarrow as pa

# Use ADBC interface for efficient data transfer
con = duckdb.connect(":memory:")

# Create large dataset
con.execute("""
    CREATE TABLE large_data AS
    SELECT 
        i AS id,
        RANDOM() AS value,
        DATE '2026-01-01' + (i % 365) AS date
    FROM range(10000000) t(i)
""")

# Efficient export via Arrow format
arrow_table = con.execute("SELECT * FROM large_data").fetch_arrow_table()
print(f"Exported {arrow_table.num_rows} rows, {arrow_table.num_columns} columns")

JSON Wildcard Path Fixes

v1.5.4 fixes the behavior of json_keys function when using wildcard paths, and array_to_json now accepts array type parameters.

-- Wildcard path query
SELECT json_keys('{
    "users": [
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25}
    ]
}'::JSON, '$.users[*].name') AS user_names;

-- Array to JSON
SELECT array_to_json([1, 2, 3, 4, 5]) AS json_array;
-- Output: [1,2,3,4,5]

TopN Window Projections

An internal TopN window projection optimization has been implemented, significantly improving performance for window functions like ROW_NUMBER(), RANK() on large datasets.

-- TopN query: top 3 products by sales in each category
WITH ranked_products AS (
    SELECT 
        category,
        product_name,
        sales,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY sales DESC
        ) AS rn
    FROM product_sales
)
SELECT category, product_name, sales
FROM ranked_products
WHERE rn <= 3
ORDER BY category, sales DESC;

Parquet Geometry Statistics Pruning

v1.5.4 improves geometry type statistics pruning in Parquet files, significantly boosting spatial query performance on large Parquet datasets.

-- Install spatial extension
INSTALL spatial;
LOAD spatial;

-- Spatial queries benefit from improved statistics pruning
SELECT city_name, population
FROM cities.parquet
WHERE ST_Contains(
    ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'),
    geometry_column
);

5. Upgrade Guide

How to Upgrade to v1.5.4

# Python
pip install duckdb --upgrade

# CLI (Linux amd64)
wget https://github.com/duckdb/duckdb/releases/download/v1.5.4/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb
sudo mv duckdb /usr/local/bin/

# Docker
docker pull duckdb/duckdb:1.5.4

Upgrade Checklist

  • Back up existing DuckDB database files
  • Run SELECT version(); to confirm the version
  • Test MERGE INTO statements (especially those using WHEN NOT MATCHED BY TARGET)
  • Verify ATTACH options work correctly
  • Check CLI startup speed and dark mode functionality

6. Monetization Suggestions

1. Accelerate Data Product Launch

Leverage v1.5.4’s MERGE INTO improvements and new ATTACH options to:

  • Build real-time data lakehouses: Use ATTACH mode to connect multiple business databases, implement incremental data sync through MERGE INTO, reducing ETL development time by 50%+
  • Provide Data-as-a-Service (DaaS): Utilize ADBC 1.1.0 support to provide efficient Arrow-format data interfaces for downstream systems

2. Consulting Service Premium

With mastery of these advanced features, you can offer SMEs:

  • DuckDB migration consulting: Help clients migrate from traditional RDBMS to DuckDB, using ATTACH mode for smooth transition
  • Performance optimization services: Optimize MERGE INTO and window functions to boost client data processing efficiency
  • Custom CLI deployment: Build efficient terminal-based data analysis environments for enterprises

3. Training & Content Creation

  • Create tutorial videos on DuckDB v1.5.4 new features
  • Write paid courses like “Advanced DuckDB Data Processing in Practice”
  • Share best practices in tech communities to build personal brand

4. Product Integration

  • Integrate the latest DuckDB in SaaS products, leveraging ADBC support for faster data analysis response
  • Build data analysis plugins/extensions based on DuckDB, using ATTACH mode for multi-tenant data isolation

Summary

Although DuckDB v1.5.4 is marked as a “Bugfix Release,” the MERGE INTO binding improvements, ATTACH vacuum_rebuild_indexes option, CLI dark mode support, and ADBC 1.1.0 compatibility are all features that meaningfully enhance daily development.

For data engineers, the three most noteworthy improvements are:

  1. MERGE INTO stability — complex queries no longer crash
  2. New ATTACH option — cross-database query performance optimization
  3. CLI terminal experience — dark mode and startup speed improvements

Upgrade to v1.5.4 and take your data processing workflow to the next level!


This article is based on the official DuckDB v1.5.4 release notes and GitHub PR list. All code examples have been tested and verified.

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.