DuckDB Java 集成概述
DuckDB 提供了原生的 JDBC 驱动,使得 Java 项目可以无缝集成 DuckDB。无论你是做数据分析、ETL 处理,还是构建嵌入式分析应用,DuckDB 的 Java 集成都能提供极佳的开发体验。
DuckDB Java 的核心优势:
- 零配置:不需要安装数据库服务器,Java 应用中直接嵌入
- 标准 JDBC:完全兼容 JDBC 4.0 规范,学习成本极低
- 列式存储:分析查询比 H2 和 SQLite 快 10-100 倍
- 全功能 SQL:支持窗口函数、CTE、Pivot 等高级分析功能
第一步:Maven / Gradle 配置
Maven 依赖
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>1.2.0</version>
</dependency>
Gradle 依赖
implementation 'org.duckdb:duckdb_jdbc:1.2.0'
验证依赖是否正确
# Maven
mvn dependency:tree | grep duckdb
# Gradle
gradle dependencies | grep duckdb
第二步:JDBC 连接与基本操作
建立连接
DuckDB Java 支持两种连接模式:内存数据库和持久化文件数据库。
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class DuckDBConnect {
public static void main(String[] args) throws Exception {
// 方式一:内存数据库(数据不持久化)
Connection inMemConn = DriverManager.getConnection("jdbc:duckdb:");
// 方式二:持久化文件数据库
Connection fileConn = DriverManager.getConnection(
"jdbc:duckdb:/path/to/mydb.duckdb"
);
System.out.println("DuckDB Java 连接成功!");
}
}
创建表和插入数据
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:");
Statement stmt = conn.createStatement()) {
// 创建表
stmt.execute("CREATE TABLE employees (" +
"id INTEGER, " +
"name VARCHAR, " +
"department VARCHAR, " +
"salary DECIMAL(10,2)" +
")");
// 插入数据
stmt.executeUpdate("INSERT INTO employees VALUES " +
"(1, 'Alice', 'Engineering', 120000), " +
"(2, 'Bob', 'Marketing', 95000), " +
"(3, 'Charlie', 'Engineering', 110000), " +
"(4, 'Diana', 'Sales', 85000)");
System.out.println("数据插入成功!");
}
查询数据
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:");
Statement stmt = conn.createStatement()) {
// 创建测试数据
stmt.execute("CREATE TABLE sales AS " +
"SELECT * FROM (VALUES " +
"('2026-01-01'::DATE, 'Product A', 1200.00), " +
"('2026-01-02'::DATE, 'Product B', 850.00), " +
"('2026-01-03'::DATE, 'Product A', 1500.00)" +
") AS t(sale_date, product, amount)");
// 执行分析查询
ResultSet rs = stmt.executeQuery(
"SELECT product, COUNT(*) AS orders, SUM(amount) AS total " +
"FROM sales GROUP BY product ORDER BY total DESC"
);
while (rs.next()) {
System.out.printf(
"Product: %s, Orders: %d, Total: $%.2f%n",
rs.getString("product"),
rs.getInt("orders"),
rs.getDouble("total")
);
}
}
第三步:高级 JDBC 用法
使用 PreparedStatement
String sql = "SELECT department, AVG(salary) AS avg_salary " +
"FROM employees WHERE salary > ? GROUP BY department";
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:");
PreparedStatement pstmt = conn.prepareStatement(sql)) {
pstmt.setDouble(1, 90000);
ResultSet rs = pstmt.executeQuery();
while (rs.next()) {
System.out.printf("%s: $%.2f%n",
rs.getString("department"),
rs.getDouble("avg_salary"));
}
}
直接查询文件
DuckDB Java 最大的优势是直接查询外部文件,无需导入。
// 查询 CSV 文件
ResultSet rs = stmt.executeQuery(
"SELECT region, SUM(revenue) AS total " +
"FROM read_csv_auto('/data/sales_2026.csv') " +
"GROUP BY region"
);
// 查询 Parquet 文件
ResultSet rs2 = stmt.executeQuery(
"SELECT date_trunc('month', order_date) AS month, " +
" COUNT(*) AS orders " +
"FROM '/data/orders.parquet' " +
"GROUP BY month"
);
批量插入
conn.setAutoCommit(false);
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO logs VALUES (?, ?, ?)"
);
for (LogEntry entry : logBatch) {
pstmt.setInt(1, entry.getId());
pstmt.setString(2, entry.getLevel());
pstmt.setString(3, entry.getMessage());
pstmt.addBatch();
}
int[] results = pstmt.executeBatch();
conn.commit();
第四步:DuckDB vs H2 对比分析
对于 Java 开发者来说,DuckDB 和 H2 是最常见的两个嵌入式数据库。以下是关键对比:
性能对比
| 特性 | DuckDB | H2 |
|---|---|---|
| 存储引擎 | 列式存储 | 行式存储 |
| 分析查询 | ⚡ 快 10-100x | 较慢 |
| 单行查询 | 较慢 | ⚡ 快 |
| 并发写入 | 单写入器 | 多写入器 |
| 文件查询 | 原生支持 CSV/Parquet/JSON | 需要导入 |
适用场景对比
// DuckDB 适合:分析查询、大文件处理
String analyticsSQL = """
SELECT category,
SUM(amount) AS total,
AVG(amount) AS avg,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median
FROM read_csv_auto('sales_large.csv')
GROUP BY category
""";
// H2 适合:OLTP 场景、Web 应用事务
String oltpSQL = """
UPDATE users SET last_login = NOW()
WHERE user_id = ?
""";
迁移建议
- 如果你的应用需要复杂分析查询、大文件处理、列式存储 → 选择 DuckDB
- 如果你的应用需要高并发事务、行级更新、Web 应用后端 → 选择 H2
- 最佳实践:H2 处理事务 + DuckDB 处理分析,两者可以共存
完整示例:数据分析应用
import java.sql.*;
import java.util.Properties;
public class DuckDBAnalytics {
public static void main(String[] args) throws Exception {
// 配置 DuckDB
Properties props = new Properties();
props.setProperty("threads", "4"); // 并行度
try (Connection conn = DriverManager.getConnection(
"jdbc:duckdb:", props);
Statement stmt = conn.createStatement()) {
// 1. 加载扩展
stmt.execute("INSTALL httpfs");
stmt.execute("LOAD httpfs");
// 2. 从 S3 查询 Parquet 文件
ResultSet rs = stmt.executeQuery("""
SELECT
date_trunc('month', order_date) AS month,
product_category,
SUM(order_amount) AS revenue,
COUNT(*) AS transactions
FROM read_parquet('s3://my-bucket/orders/*.parquet')
WHERE order_date >= '2026-01-01'
GROUP BY ALL
ORDER BY month, revenue DESC
LIMIT 20
""");
while (rs.next()) {
System.out.printf("%s | %s | $%.2f | %d%n",
rs.getDate("month"),
rs.getString("product_category"),
rs.getDouble("revenue"),
rs.getInt("transactions"));
}
}
}
}
常见问题
1. 找不到 JDBC 驱动类
确保依赖版本正确:
<!-- 检查 Maven 依赖是否已下载 -->
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>1.2.0</version>
</dependency>
2. DuckDB 线程安全
DuckDB 的 JDBC 连接是线程安全的,但建议每个线程使用独立的连接或使用连接池。
3. 内存限制
// 设置最大内存
props.setProperty("memory_limit", "4GB");

相关文章
📘 博客: https://duckdblab.org #DuckDB #Java #JDBC #数据库