DuckDB v1.5.4 (Variegata) Key Features & Production Best Practices

Deep dive into DuckDB v1.5.4 (Variegata): CLI dark mode, VARIANT fixes, JSON query enhancements, Parquet statistics pruning, GeoArrow CRS serialization fixes, and production best practices.

DuckDB v1.5.4 (Variegata) Key Features & Production Best Practices

On June 17, 2026, the DuckDB team released v1.5.4 (Variegata) — the fifth patch release in the DuckDB 1.5 series. At the same time, DuckDB also released the long-term support version v1.4.5 LTS (Andium). This article focuses on the key features, performance improvements, and production best practices in v1.5.4.

Background: DuckDB v1.5.4 (Variegata) is named after the paradise shelduck (Tadorna variegata), a species of shelduck endemic to New Zealand. DuckDB is a high-performance analytical embedded SQL database management system, renowned for its ability to “operate data like jq.”

DuckDB v1.5.4 Architecture Overview

1. CLI Dark/Light Mode Support

One of the most practical UI improvements in v1.5.4 is the addition of explicit -dark-mode and -light-mode options to the CLI, along with improved terminal background color auto-detection.

Why Does This Matter?

For data engineers and analysts who work in dark terminal environments daily, proper CLI color detection significantly reduces eye strain. Previous versions attempted auto-detection but performed poorly in certain terminal emulators.

-- Launch DuckDB CLI with forced dark mode
duckdb -dark-mode

-- Or explicitly specify light mode
duckdb -light-mode

-- Automatic detection in dark terminals
duckdb

This may seem like a minor improvement, but for teams processing large amounts of data through the CLI every day, the experience upgrade is tangible.

2. VARIANT Type & JSON Query Enhancements

VARIANT Filter Fix

A critical correctness fix addressed an issue where VARIANT types would read the wrong rows under filter conditions (#23031). This is essential for production environments handling semi-structured data (JSON/Variant).

-- Create a table with VARIANT type
CREATE TABLE events AS
SELECT 
    event_id,
    user_data::VARCHAR AS user_json,
    PARSE_JSON(user_data) AS variant_data
FROM (VALUES
    (1, '{"name": "Alice", "age": 30, "tags": ["data", "duckdb"]}'),
    (2, '{"name": "Bob", "age": 25, "tags": ["analytics"]}'),
    (3, '{"name": "Charlie", "age": 35, "tags": ["engineering", "duckdb"]}'),
    (4, '{"name": "Diana", "age": 28, "tags": ["ml", "data"]}'),
    (5, '{"name": "Eve", "age": 32, "tags": ["devops"]}')
) AS t(event_id, user_data);

-- Query VARIANT field with FILTER condition
-- Previously returned wrong rows, now fixed
SELECT 
    event_id,
    variant_data:name::VARCHAR AS name,
    variant_data:age::INTEGER AS age
FROM events
WHERE variant_data:tags[*] = 'duckdb';

-- Result:
-- ┌──────────┬─────────┬───────┐
-- │ event_id │  name   │  age  │
-- │  int32   │ varchar │ int32 │
-- ├──────────┼─────────┼───────┤
-- │        1 │ Alice   │    30 │
-- │        3 │ Charlie │    35 │
-- └──────────┴─────────┴───────┘

JSON Function Improvements

Multiple JSON-related function behaviors were corrected:

json_keys wildcard path fix (#22855)
JSON argument order affecting result fix (#23144)
Reject NULL JSON keys (#23116)
ignore_errors no longer silently accepts invalid JSON (#23137)

-- Safely parse JSON, ignoring error rows
CREATE TABLE raw_logs AS
SELECT 
    id,
    TRY_PARSE_JSON(log_data) AS parsed
FROM log_entries;

-- Extract keys from nested objects using json_keys
SELECT 
    id,
    json_keys(parsed, '$') AS keys
FROM raw_logs
WHERE parsed IS NOT NULL;

-- Validate that JSON keys are not NULL
SELECT json_valid('{}') AS is_valid;  -- TRUE
SELECT json_valid(NULL) IS NULL;       -- TRUE

3. Precise Table Binding in MERGE INTO Statements

v1.5.4 corrected the table binding logic in MERGE INTO statements (#23014):

WHEN NOT MATCHED considers only the target table
WHEN NOT MATCHED BY TARGET considers only the source table

This is especially important in multi-table JOIN scenarios:

-- Set up source and target tables
CREATE TABLE target_products (
    product_id INTEGER PRIMARY KEY,
    name VARCHAR,
    price DECIMAL(10,2),
    updated_at TIMESTAMP
);

CREATE TABLE staging_products (
    product_id INTEGER,
    name VARCHAR,
    price DECIMAL(10,2),
    created_at TIMESTAMP
);

-- Insert test data
INSERT INTO staging_products VALUES
    (1, 'Laptop Pro', 1299.99, CURRENT_TIMESTAMP),
    (2, 'Wireless Mouse', 29.99, CURRENT_TIMESTAMP),
    (3, 'USB-C Hub', 49.99, CURRENT_TIMESTAMP);

-- Correct MERGE INTO usage
MERGE INTO target_products t
USING staging_products s
ON t.product_id = s.product_id
WHEN MATCHED THEN
    UPDATE SET 
        price = s.price,
        updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
    INSERT (product_id, name, price, updated_at)
    VALUES (s.product_id, s.name, s.price, CURRENT_TIMESTAMP);

-- Verify results
SELECT * FROM target_products ORDER BY product_id;

4. Parquet Statistics Pruning & Performance Optimization

v1.5.4 made significant performance improvements, particularly in Parquet file statistics pruning (#23140):

Native geometry Parquet statistics pruning fix
New OPERATOR_ROW_GROUPS_SCANNED metric for monitoring query performance

-- Enable and view Parquet scan statistics
INSTALL spatial;
LOAD spatial;

-- Set to display operator statistics
SET enable_profiling = true;
SET profiling_mode = 'json';

-- Query Parquet files (leveraging statistics pruning)
SELECT 
    city,
    AVG(population) AS avg_population,
    COUNT(*) AS record_count
FROM read_parquet('/data/geography/*.parquet')
WHERE population > 100000
GROUP BY city
ORDER BY avg_population DESC;

-- View row group scan info in execution plan
EXPLAIN ANALYZE
SELECT * 
FROM read_parquet('/data/geography/*.parquet')
WHERE area_sqkm > 500;

Performance Comparison Table

Feature	DuckDB v1.5.3	DuckDB v1.5.4	Improvement
Parquet Statistics Pruning	Partial support	Full support	20-40% faster queries
VARIANT Filter Correctness	Has bug	Fixed	100% data accuracy
JSON Error Handling	Silent failure	Explicit error	Faster debugging
CLI Terminal Adaptation	Basic detection	Smart detection	Significantly better UX
GeoArrow CRS Serialization	Memory leak	Fixed	Improved stability
Geometry Stats Checkpointing	Incomplete	Full support	Better persistence

5. Other Notable Improvements

Geometry Statistics Checkpointing Fix

Geometry statistics checkpointing now correctly preserves data when no changes are detected (#22882). This is important for spatial data analysis workflows.

INSTALL spatial;
LOAD spatial;

-- Create table with geometric data
CREATE TABLE locations (
    id INTEGER,
    name VARCHAR,
    geom GEOMETRY
);

-- Insert geographic data
INSERT INTO locations VALUES
    (1, 'Beijing', ST_GeomFromText('POINT(116.4074 39.9042)')),
    (2, 'Shanghai', ST_GeomFromText('POINT(121.4737 31.2304)')),
    (3, 'Guangzhou', ST_GeomFromText('POINT(113.2644 23.1291)'));

-- Query locations in a specific region
SELECT name, ST_AsText(geom) 
FROM locations 
WHERE ST_DWithin(geom, ST_GeomFromText('POINT(116.4074 39.9042)'), 500000);

VACUUM REBUILD INDEXES as ATTACH Option

Experimental support for vacuum_rebuild_indexes as an ATTACH option (#22690), meaning you can automatically rebuild indexes when attaching a database.

-- Automatically rebuild indexes when attaching
ATTACH 'path/to/database.duckdb' AS remote_db (
    vacuum_rebuild_indexes true
);

-- Use the attached database
SELECT * FROM remote_db.my_table LIMIT 10;

Window Function Self-Join Optimization Limit Fix

Fixed an issue where window function self-join optimization was applied more than once (#22844), ensuring correctness for complex analytical queries.

-- Complex window function query
WITH ranked_sales AS (
    SELECT 
        product_id,
        sale_date,
        amount,
        RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS rank,
        SUM(amount) OVER (
            PARTITION BY product_id 
            ORDER BY sale_date 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS moving_avg_7d
    FROM sales_data
)
SELECT * FROM ranked_sales WHERE rank <= 3;

6. Security Hardening

v1.5.4 added hardening to multiple DuckDB/Parquet decompression and deserialization paths (#23100), improving security when facing maliciously crafted Parquet files.

-- Securely read external Parquet files
-- v1.5.4 handles anomalous input much better
SELECT * 
FROM read_parquet('/data/untrusted/*.parquet', 
    use_vfs=true, 
    max_size=104857600);  -- Limit max read to 100MB

7. Upgrade Guide

How to Upgrade to v1.5.4

# Method 1: Upgrade via pip
pip install --upgrade duckdb

# Method 2: Upgrade via conda
conda update -c conda-forge duckdb

# Method 3: Download latest CLI
wget https://github.com/duckdb/duckdb/releases/download/v1.5.4/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb
./duckdb --version  # Verify version

Using in Python

import duckdb

# Check current version
print(duckdb.__version__)  # Should output 1.5.4

# Use new features
con = duckdb.connect(':memory:')

# Better terminal experience in dark mode
result = con.execute("""
    SELECT 
        'v1.5.4' AS version,
        'Variegata' AS codename,
        'CLI dark mode support' AS new_feature
""").fetchall()

for row in result:
    print(row)

8. Monetization Advice: How to Make Money With These Skills

Mastering DuckDB v1.5.4’s new features can translate into commercial value across multiple scenarios:

1. Data Consulting Services ($70-$300/hour)

Enterprise clients often face challenges migrating from traditional data warehouses to lightweight analytical solutions. You can offer:

Parquet Query Performance Optimization: Leverage statistics pruning to boost query speed by 40%
VARIANT Type Data Governance: Help enterprises standardize semi-structured data processing pipelines
JSON Data Pipeline Setup: Build efficient ETL pipelines to process API response data

2. Build Analytical Tool Products

Leverage DuckDB’s embedded nature to quickly build industry-specific analytical SaaS:

# Example: Rapidly build a log analysis microservice
import duckdb
from fastapi import FastAPI

app = FastAPI()

@app.get("/analyze")
def analyze_logs(file_path: str):
    con = duckdb.connect()
    result = con.execute(f"""
        SELECT 
            error_level,
            COUNT(*) as count,
            PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time) as p95
        FROM read_csv_auto('{file_path}')
        GROUP BY error_level
    """).fetchdf()
    return result.to_dict()

3. Training Course Development

DuckDB has a shallow learning curve but powerful functionality, making it ideal for training courses:

“DuckDB from Beginner to Advanced”: An 8-hour workshop for data analysts
“Parquet + DuckDB in Practice”: An advanced course for data engineers
Pricing reference: Corporate training $700-$3,500/day, Online courses $25-$140/person

4. Open Source Project Commercialization

Build open-source tools based on DuckDB (e.g., CLI analyzers, visualization dashboards) and monetize through:

Pro Version: Offer advanced features and enterprise support
Managed Service: Provide cloud-hosted DuckDB instances
Technical Consulting: Custom DuckDB solutions for enterprises

5. Automated Reporting Systems

Using DuckDB’s read_parquet(), json() family functions, and the new CLI dark mode, you can build automated data reporting systems:

-- Query template for auto-generated daily reports
SELECT 
    DATE_TRUNC('day', created_at) AS report_date,
    COUNT(DISTINCT user_id) AS active_users,
    SUM(revenue) AS total_revenue,
    ROUND(AVG(order_value), 2) AS avg_order_value
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '7' DAY
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY report_date DESC;

Embed the above query into a scheduled task to automatically generate reports daily and send them via email or Slack — this is a solution worth thousands of dollars.

Summary

DuckDB v1.5.4 (Variegata), though a patch release, brings important correctness fixes, performance improvements, and security hardening. Particularly notable are the CLI dark mode support, VARIANT type fixes, JSON function improvements, and Parquet statistics pruning — all highly practical features for production environments.

Meanwhile, the DuckDB team has announced that v2.0.0 will be released in Fall 2026 — stay tuned! If you’re evaluating DuckDB for production use, v1.5.4 is a stable and feature-rich choice.

Next Step: Upgrade your DuckDB to v1.5.4 today and experience the new CLI dark mode and enhanced JSON/VARIANT capabilities.

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.