Featured image of post DuckDB in Action: Database Backup & Migration

DuckDB in Action: Database Backup & Migration

Learn DuckDB's database backup and migration strategies including EXPORT DATABASE, cross-version upgrade procedures, and compressed archiving best practices for production environments.

Introduction

When using DuckDB in production, data safety and portability are unavoidable concerns. Unlike traditional client-server databases, DuckDB is an embedded analytical engine — its data is stored as files. This is both an advantage and a challenge.

DuckDB Backup & Migration Architecture

Figure: DuckDB backup and migration architecture overview

This article systematically covers three core backup and migration capabilities of DuckDB through real-world business scenarios:

  1. EXPORT DATABASE — Export the database as a standardized SQL script
  2. Cross-version upgrades — Smoothly upgrade the DuckDB kernel without data loss
  3. Compressed archiving — Efficiently store historical data and save disk space

1. EXPORT DATABASE: Structured Backups

1.1 What is EXPORT DATABASE?

EXPORT DATABASE is a built-in DuckDB command that exports the entire database (including table schemas, data, and indexes) into a complete SQL script file. The advantages of this approach include:

  • Standardized: The exported SQL can run on any compatible SQL engine
  • Readable: Exported files are plain text, easy to review and version-control
  • Highly portable: Does not depend on a specific file format

1.2 Full Backup Example

Assume we have an e-commerce analytics database ecommerce.duckdb:

DuckDB EXPORT Terminal Example

Figure: DuckDB EXPORT DATABASE terminal operation

-- Connect to DuckDB
$ duckdb ecommerce.duckdb

-- Inspect the current database structure
DuckDB> DESCRIBE ecommerce;

┌───────────┬──────────────────────┬─────────┐
  column          type             null   
├───────────┼──────────────────────┼─────────┤
 order_id   BIGINT                       
 user_id    BIGINT                       
 amount     DECIMAL(10,2)                
 status     VARCHAR                      
 created    TIMESTAMP                    
└───────────┴──────────────────────┴─────────┘

-- Perform a full export
EXPORT DATABASE '/backup/ecommerce_backup' (FORMAT CSV);

After exporting, you’ll see multiple files:

$ ls -lh /backup/ecommerce_backup/
total 1.2G
-rw-r--r-- 1 user user  45K orders.sql      -- DDL statements
-rw-r--r-- 1 user user 890M orders.csv      -- Order data
-rw-r--r-- 1 user user 230M users.csv       -- User data

1.3 Restoring the Database

-- Create a new database and import
$ duckdb restored_db.duckdb

DuckDB> INSTALL csv;
DuckDB> LOAD csv;

-- Restore by executing the exported SQL script
DuckDB> \i /backup/ecommerce_backup/orders.sql

1.4 Incremental Backup Strategy

For large databases, full exports can be expensive. We can leverage DuckDB’s time travel feature for incremental backups:

-- View historical versions
SELECT * FROM orders AS OF VERSION (
    SELECT MAX(version) - 1 FROM duckdb_tables()
) WHERE created > '2026-06-18';

-- Export only changed records
EXPORT (
    SELECT * FROM orders 
    WHERE updated_at > '2026-06-18 00:00:00'
) TO '/backup/incremental/orders_delta.csv' (FORMAT CSV);

2. Cross-Version Upgrades: Seamless Transitions

2.1 DuckDB Version Compatibility

DuckDB performs well in version compatibility, but note the following:

Upgrade DirectionSafe?Notes
Minor version (0.8.x → 0.9.x)✅ Usually safeBackup recommended before upgrade
Major version (0.9.x → 1.0.x)⚠️ Test requiredCheck breaking changes
Downgrade❌ Not supportedCannot read files from newer versions

2.2 Pre-Upgrade Checklist

-- 1. Check current version
SELECT version();
-- ┌──────────────────────────┐
-- │         version()        │
-- ├──────────────────────────┤
-- │ v0.10.3-native           │
-- └──────────────────────────┘

-- 2. Export database as insurance
EXPORT DATABASE '/backup/pre_upgrade_check';

-- 3. Record current extensions
SELECT name, version FROM duckdb_extensions();

2.3 Upgrade Procedure

# 1. Stop all write operations
# 2. Backup the database file
cp ecommerce.duckdb ecommerce.duckdb.bak

# 3. Download the new DuckDB version
wget https://github.com/duckdb/duckdb/releases/download/v1.1.0/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb

# 4. Open the database with the new version (auto-migrates)
./duckdb ecommerce.duckdb

# 5. Verify data integrity
SELECT COUNT(*) FROM orders;
SELECT COUNT(*) FROM users;

Key Tip: The .duckdb file format may not be compatible across major versions. Always follow the principle of backup first, upgrade second.


3. Compressed Archiving: Saving Storage Space

3.1 Using Parquet Compression

DuckDB natively supports the Parquet format, which is the preferred solution for compressed archiving:

-- Export historical data as compressed Parquet
EXPORT (
    SELECT * FROM orders 
    WHERE created < '2026-01-01'
) TO '/archive/orders_2025.parquet' (FORMAT PARQUET, COMPRESSION ZSTD);

-- Verify compression ratio
SELECT 
    ROUND(SUM(file_size) / 1024 / 1024, 2) AS total_mb,
    COUNT(*) AS file_count
FROM (
    SELECT file_size, file_name 
    FROM read_csv_auto('/archive/orders_2025.parquet')
);

3.2 Separating Archive from Query

In production, it’s recommended to separate cold data from hot data:

-- Create an archive view that transparently reads compressed files
CREATE VIEW archived_orders AS
SELECT * FROM read_parquet('/archive/orders_2025_*.parquet');

-- Queries automatically decompress without manual intervention
SELECT 
    DATE_TRUNC('month', created) AS month,
    SUM(amount) AS revenue,
    COUNT(*) AS order_count
FROM archived_orders
GROUP BY month
ORDER BY month;

3.3 Automated Archival Script

import duckdb
import json
from pathlib import Path
from datetime import datetime, timedelta

# Connect to DuckDB
con = duckdb.connect('ecommerce.duckdb')

# Define archival policy: keep 90 days of hot data
cutoff_date = datetime.now() - timedelta(days=90)

# Export old data as Parquet
con.execute(f"""
    EXPORT (
        SELECT * FROM orders 
        WHERE created < '{cutoff_date.isoformat()}'
    ) TO '/archive/orders_pre_{cutoff_date.strftime('%Y%m%d')}.parquet'
    (FORMAT PARQUET, COMPRESSION ZSTD, ROW_GROUP_SIZE 100000)
""")

# Clean up archived data (optional)
# con.execute(f"DELETE FROM orders WHERE created < '{cutoff_date.isoformat()}'")

# Record archival metadata
meta = {
    "timestamp": datetime.now().isoformat(),
    "archived_records": con.execute(
        "SELECT COUNT(*) FROM orders WHERE created < ?",
        [cutoff_date.isoformat()]
    ).fetchone()[0],
    "archive_path": f"/archive/orders_pre_{cutoff_date.strftime('%Y%m%d')}.parquet"
}

with open(f'/archive/meta_{cutoff_date.strftime("%Y%m%d")}.json', 'w') as f:
    json.dump(meta, f, indent=2)

print(f"Archive complete: {meta['archived_records']} records")

4. Real-World Scenario: E-Commerce Data Lifecycle Management

Scenario Description

  • Hot data: Orders from the last 30 days, queried in real-time
  • Warm data: Orders from 30–180 days ago, queried daily
  • Cold data: Orders older than 180 days, used only for monthly reports

Implementation

-- 1. Create layered storage structure
CREATE TABLE hot_orders AS 
SELECT * FROM orders WHERE created >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY);

CREATE TABLE warm_orders AS 
SELECT * FROM orders 
WHERE created >= DATE_SUB(CURRENT_DATE, INTERVAL 180 DAY)
  AND created < DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY);

-- 2. Compress and archive cold data
EXPORT (
    SELECT * FROM orders 
    WHERE created < DATE_SUB(CURRENT_DATE, INTERVAL 180 DAY)
) TO '/cold_archive/orders_legacy.parquet'
(FORMAT PARQUET, COMPRESSION ZSTD);

-- 3. Delete cold data to free space
DELETE FROM orders 
WHERE created < DATE_SUB(CURRENT_DATE, INTERVAL 180 DAY);

-- 4. Create a unified query view
CREATE VIEW all_orders AS
SELECT *, 'hot' AS tier FROM hot_orders
UNION ALL
SELECT *, 'warm' AS tier FROM warm_orders
UNION ALL
SELECT *, 'cold' AS tier FROM read_parquet('/cold_archive/orders_legacy.parquet');

Query Performance Comparison

Query RangeData SourceQuery Time
Last 7 dayshot_orders (in-memory)~50ms
Last 90 dayshot + warm (on-disk)~200ms
All historyIncludes cold archive~1.5s

5. Best Practices Summary

  1. Regular backups: Use EXPORT DATABASE for weekly full backups
  2. Pre-upgrade validation: Test with the new version in a staging environment before production upgrades
  3. Compress first: Always use Parquet + ZSTD for archived data — typically reduces storage by 60%–80%
  4. Monitor disk space: DuckDB is embedded; a full disk causes write failures — set up alerts
  5. Version pinning: Lock the DuckDB version in production to avoid unexpected compatibility issues

Conclusion

While DuckDB’s backup and migration capabilities may not have the complex WAL and replication mechanisms of traditional databases, its simple file-based storage model makes backup intuitive and reliable. Mastering EXPORT DATABASE, version upgrade procedures, and compressed archiving techniques will give you confidence to use DuckDB in production environments.

For more DuckDB in-action tips, follow DuckDB Lab (duckdblab.org).

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy