DuckDB v1.5.4 (Variegata) Key Features & Production Best Practices
On June 17, 2026, the DuckDB team released v1.5.4 (Variegata) — the fifth patch release in the DuckDB 1.5 series. At the same time, DuckDB also released the long-term support version v1.4.5 LTS (Andium). This article focuses on the key features, performance improvements, and production best practices in v1.5.4.
Background: DuckDB v1.5.4 (Variegata) is named after the paradise shelduck (Tadorna variegata), a species of shelduck endemic to New Zealand. DuckDB is a high-performance analytical embedded SQL database management system, renowned for its ability to “operate data like jq.”

1. CLI Dark/Light Mode Support
One of the most practical UI improvements in v1.5.4 is the addition of explicit -dark-mode and -light-mode options to the CLI, along with improved terminal background color auto-detection.
Why Does This Matter?
For data engineers and analysts who work in dark terminal environments daily, proper CLI color detection significantly reduces eye strain. Previous versions attempted auto-detection but performed poorly in certain terminal emulators.
-- Launch DuckDB CLI with forced dark mode
duckdb -dark-mode
-- Or explicitly specify light mode
duckdb -light-mode
-- Automatic detection in dark terminals
duckdb
This may seem like a minor improvement, but for teams processing large amounts of data through the CLI every day, the experience upgrade is tangible.
2. VARIANT Type & JSON Query Enhancements
VARIANT Filter Fix
A critical correctness fix addressed an issue where VARIANT types would read the wrong rows under filter conditions (#23031). This is essential for production environments handling semi-structured data (JSON/Variant).
-- Create a table with VARIANT type
CREATE TABLE events AS
SELECT
event_id,
user_data::VARCHAR AS user_json,
PARSE_JSON(user_data) AS variant_data
FROM (VALUES
(1, '{"name": "Alice", "age": 30, "tags": ["data", "duckdb"]}'),
(2, '{"name": "Bob", "age": 25, "tags": ["analytics"]}'),
(3, '{"name": "Charlie", "age": 35, "tags": ["engineering", "duckdb"]}'),
(4, '{"name": "Diana", "age": 28, "tags": ["ml", "data"]}'),
(5, '{"name": "Eve", "age": 32, "tags": ["devops"]}')
) AS t(event_id, user_data);
-- Query VARIANT field with FILTER condition
-- Previously returned wrong rows, now fixed
SELECT
event_id,
variant_data:name::VARCHAR AS name,
variant_data:age::INTEGER AS age
FROM events
WHERE variant_data:tags[*] = 'duckdb';
-- Result:
-- ┌──────────┬─────────┬───────┐
-- │ event_id │ name │ age │
-- │ int32 │ varchar │ int32 │
-- ├──────────┼─────────┼───────┤
-- │ 1 │ Alice │ 30 │
-- │ 3 │ Charlie │ 35 │
-- └──────────┴─────────┴───────┘
JSON Function Improvements
Multiple JSON-related function behaviors were corrected:
json_keyswildcard path fix (#22855)- JSON argument order affecting result fix (#23144)
- Reject NULL JSON keys (#23116)
ignore_errorsno longer silently accepts invalid JSON (#23137)
-- Safely parse JSON, ignoring error rows
CREATE TABLE raw_logs AS
SELECT
id,
TRY_PARSE_JSON(log_data) AS parsed
FROM log_entries;
-- Extract keys from nested objects using json_keys
SELECT
id,
json_keys(parsed, '$') AS keys
FROM raw_logs
WHERE parsed IS NOT NULL;
-- Validate that JSON keys are not NULL
SELECT json_valid('{}') AS is_valid; -- TRUE
SELECT json_valid(NULL) IS NULL; -- TRUE
3. Precise Table Binding in MERGE INTO Statements
v1.5.4 corrected the table binding logic in MERGE INTO statements (#23014):
WHEN NOT MATCHEDconsiders only the target tableWHEN NOT MATCHED BY TARGETconsiders only the source table
This is especially important in multi-table JOIN scenarios:
-- Set up source and target tables
CREATE TABLE target_products (
product_id INTEGER PRIMARY KEY,
name VARCHAR,
price DECIMAL(10,2),
updated_at TIMESTAMP
);
CREATE TABLE staging_products (
product_id INTEGER,
name VARCHAR,
price DECIMAL(10,2),
created_at TIMESTAMP
);
-- Insert test data
INSERT INTO staging_products VALUES
(1, 'Laptop Pro', 1299.99, CURRENT_TIMESTAMP),
(2, 'Wireless Mouse', 29.99, CURRENT_TIMESTAMP),
(3, 'USB-C Hub', 49.99, CURRENT_TIMESTAMP);
-- Correct MERGE INTO usage
MERGE INTO target_products t
USING staging_products s
ON t.product_id = s.product_id
WHEN MATCHED THEN
UPDATE SET
price = s.price,
updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
INSERT (product_id, name, price, updated_at)
VALUES (s.product_id, s.name, s.price, CURRENT_TIMESTAMP);
-- Verify results
SELECT * FROM target_products ORDER BY product_id;
4. Parquet Statistics Pruning & Performance Optimization
v1.5.4 made significant performance improvements, particularly in Parquet file statistics pruning (#23140):
- Native geometry Parquet statistics pruning fix
- New
OPERATOR_ROW_GROUPS_SCANNEDmetric for monitoring query performance
-- Enable and view Parquet scan statistics
INSTALL spatial;
LOAD spatial;
-- Set to display operator statistics
SET enable_profiling = true;
SET profiling_mode = 'json';
-- Query Parquet files (leveraging statistics pruning)
SELECT
city,
AVG(population) AS avg_population,
COUNT(*) AS record_count
FROM read_parquet('/data/geography/*.parquet')
WHERE population > 100000
GROUP BY city
ORDER BY avg_population DESC;
-- View row group scan info in execution plan
EXPLAIN ANALYZE
SELECT *
FROM read_parquet('/data/geography/*.parquet')
WHERE area_sqkm > 500;
Performance Comparison Table
| Feature | DuckDB v1.5.3 | DuckDB v1.5.4 | Improvement |
|---|---|---|---|
| Parquet Statistics Pruning | Partial support | Full support | 20-40% faster queries |
| VARIANT Filter Correctness | Has bug | Fixed | 100% data accuracy |
| JSON Error Handling | Silent failure | Explicit error | Faster debugging |
| CLI Terminal Adaptation | Basic detection | Smart detection | Significantly better UX |
| GeoArrow CRS Serialization | Memory leak | Fixed | Improved stability |
| Geometry Stats Checkpointing | Incomplete | Full support | Better persistence |
5. Other Notable Improvements
Geometry Statistics Checkpointing Fix
Geometry statistics checkpointing now correctly preserves data when no changes are detected (#22882). This is important for spatial data analysis workflows.
INSTALL spatial;
LOAD spatial;
-- Create table with geometric data
CREATE TABLE locations (
id INTEGER,
name VARCHAR,
geom GEOMETRY
);
-- Insert geographic data
INSERT INTO locations VALUES
(1, 'Beijing', ST_GeomFromText('POINT(116.4074 39.9042)')),
(2, 'Shanghai', ST_GeomFromText('POINT(121.4737 31.2304)')),
(3, 'Guangzhou', ST_GeomFromText('POINT(113.2644 23.1291)'));
-- Query locations in a specific region
SELECT name, ST_AsText(geom)
FROM locations
WHERE ST_DWithin(geom, ST_GeomFromText('POINT(116.4074 39.9042)'), 500000);
VACUUM REBUILD INDEXES as ATTACH Option
Experimental support for vacuum_rebuild_indexes as an ATTACH option (#22690), meaning you can automatically rebuild indexes when attaching a database.
-- Automatically rebuild indexes when attaching
ATTACH 'path/to/database.duckdb' AS remote_db (
vacuum_rebuild_indexes true
);
-- Use the attached database
SELECT * FROM remote_db.my_table LIMIT 10;
Window Function Self-Join Optimization Limit Fix
Fixed an issue where window function self-join optimization was applied more than once (#22844), ensuring correctness for complex analytical queries.
-- Complex window function query
WITH ranked_sales AS (
SELECT
product_id,
sale_date,
amount,
RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS rank,
SUM(amount) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM sales_data
)
SELECT * FROM ranked_sales WHERE rank <= 3;
6. Security Hardening
v1.5.4 added hardening to multiple DuckDB/Parquet decompression and deserialization paths (#23100), improving security when facing maliciously crafted Parquet files.
-- Securely read external Parquet files
-- v1.5.4 handles anomalous input much better
SELECT *
FROM read_parquet('/data/untrusted/*.parquet',
use_vfs=true,
max_size=104857600); -- Limit max read to 100MB
7. Upgrade Guide
How to Upgrade to v1.5.4
# Method 1: Upgrade via pip
pip install --upgrade duckdb
# Method 2: Upgrade via conda
conda update -c conda-forge duckdb
# Method 3: Download latest CLI
wget https://github.com/duckdb/duckdb/releases/download/v1.5.4/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb
./duckdb --version # Verify version
Using in Python
import duckdb
# Check current version
print(duckdb.__version__) # Should output 1.5.4
# Use new features
con = duckdb.connect(':memory:')
# Better terminal experience in dark mode
result = con.execute("""
SELECT
'v1.5.4' AS version,
'Variegata' AS codename,
'CLI dark mode support' AS new_feature
""").fetchall()
for row in result:
print(row)
8. Monetization Advice: How to Make Money With These Skills
Mastering DuckDB v1.5.4’s new features can translate into commercial value across multiple scenarios:
1. Data Consulting Services ($70-$300/hour)
Enterprise clients often face challenges migrating from traditional data warehouses to lightweight analytical solutions. You can offer:
- Parquet Query Performance Optimization: Leverage statistics pruning to boost query speed by 40%
- VARIANT Type Data Governance: Help enterprises standardize semi-structured data processing pipelines
- JSON Data Pipeline Setup: Build efficient ETL pipelines to process API response data
2. Build Analytical Tool Products
Leverage DuckDB’s embedded nature to quickly build industry-specific analytical SaaS:
# Example: Rapidly build a log analysis microservice
import duckdb
from fastapi import FastAPI
app = FastAPI()
@app.get("/analyze")
def analyze_logs(file_path: str):
con = duckdb.connect()
result = con.execute(f"""
SELECT
error_level,
COUNT(*) as count,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time) as p95
FROM read_csv_auto('{file_path}')
GROUP BY error_level
""").fetchdf()
return result.to_dict()
3. Training Course Development
DuckDB has a shallow learning curve but powerful functionality, making it ideal for training courses:
- “DuckDB from Beginner to Advanced”: An 8-hour workshop for data analysts
- “Parquet + DuckDB in Practice”: An advanced course for data engineers
- Pricing reference: Corporate training $700-$3,500/day, Online courses $25-$140/person
4. Open Source Project Commercialization
Build open-source tools based on DuckDB (e.g., CLI analyzers, visualization dashboards) and monetize through:
- Pro Version: Offer advanced features and enterprise support
- Managed Service: Provide cloud-hosted DuckDB instances
- Technical Consulting: Custom DuckDB solutions for enterprises
5. Automated Reporting Systems
Using DuckDB’s read_parquet(), json() family functions, and the new CLI dark mode, you can build automated data reporting systems:
-- Query template for auto-generated daily reports
SELECT
DATE_TRUNC('day', created_at) AS report_date,
COUNT(DISTINCT user_id) AS active_users,
SUM(revenue) AS total_revenue,
ROUND(AVG(order_value), 2) AS avg_order_value
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '7' DAY
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY report_date DESC;
Embed the above query into a scheduled task to automatically generate reports daily and send them via email or Slack — this is a solution worth thousands of dollars.
Summary
DuckDB v1.5.4 (Variegata), though a patch release, brings important correctness fixes, performance improvements, and security hardening. Particularly notable are the CLI dark mode support, VARIANT type fixes, JSON function improvements, and Parquet statistics pruning — all highly practical features for production environments.
Meanwhile, the DuckDB team has announced that v2.0.0 will be released in Fall 2026 — stay tuned! If you’re evaluating DuckDB for production use, v1.5.4 is a stable and feature-rich choice.
Next Step: Upgrade your DuckDB to v1.5.4 today and experience the new CLI dark mode and enhanced JSON/VARIANT capabilities.