Introduction
If you work with data, GROUP BY and aggregation are the backbone of your analytical queries. Whether you’re calculating total sales per region, counting users per day, or finding average transaction values, DuckDB provides a rich set of aggregation features that go far beyond the SQL standard.
DuckDB is an in-process analytical database designed for data science and analytical workloads. Its GROUP BY implementation includes modern extensions like GROUP BY ALL, GROUPING SETS, CUBE, and ROLLUP that make complex reporting queries dramatically simpler.
In this guide, you’ll learn:
- Basic GROUP BY syntax and common aggregate functions
- Multi-column grouping and the HAVING clause
- DuckDB-specific features: GROUPING SETS, CUBE, ROLLUP
- The convenient
GROUP BY ALLsyntax - Practical examples with real-world data scenarios
This guide builds on concepts from the DuckDB SQL Syntax Guide and the DuckDB Beginner’s Guide 2026. If you’re new to DuckDB, start there first.
1. Basic GROUP BY Syntax
The GROUP BY clause groups rows that have the same values in specified columns, then applies aggregate functions to each group.
SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
GROUP BY column_name;
Let’s start with a sample dataset. Create a sales table and insert some data:
CREATE TABLE sales AS
SELECT * FROM (VALUES
('Electronics', 'North', 1200, '2026-01-15'),
('Clothing', 'North', 450, '2026-01-16'),
('Electronics', 'South', 1800, '2026-01-17'),
('Clothing', 'South', 600, '2026-01-18'),
('Electronics', 'North', 900, '2026-01-19'),
('Clothing', 'North', 300, '2026-02-01'),
('Electronics', 'South', 2100, '2026-02-02'),
('Clothing', 'South', 750, '2026-02-03')
) AS t(category, region, amount, sale_date);
Now, a simple GROUP BY to get total sales by category:
SELECT category, SUM(amount) AS total_sales
FROM sales
GROUP BY category;
Result:
| category | total_sales |
|---|---|
| Electronics | 6000 |
| Clothing | 2100 |
2. Common Aggregate Functions
DuckDB supports all standard SQL aggregate functions. Here are the most commonly used ones:
COUNT — Count Rows
SELECT category, COUNT(*) AS num_orders, COUNT(DISTINCT region) AS regions
FROM sales
GROUP BY category;
| category | num_orders | regions |
|---|---|---|
| Electronics | 4 | 2 |
| Clothing | 4 | 2 |
SUM — Calculate Total
SELECT region, SUM(amount) AS total_revenue
FROM sales
GROUP BY region;
| region | total_revenue |
|---|---|
| North | 2850 |
| South | 5250 |
AVG — Calculate Average
SELECT category, AVG(amount) AS avg_order_value
FROM sales
GROUP BY category;
| category | avg_order_value |
|---|---|
| Electronics | 1500.0 |
| Clothing | 525.0 |
MIN / MAX — Find Extremes
SELECT category,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM sales
GROUP BY category;
| category | smallest_order | largest_order |
|---|---|---|
| Electronics | 900 | 2100 |
| Clothing | 300 | 750 |
Combining Multiple Aggregates
You can mix multiple aggregate functions in a single query:
SELECT category,
COUNT(*) AS num_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MIN(amount) AS min_order,
MAX(amount) AS max_order
FROM sales
GROUP BY category;
3. GROUP BY with Multiple Columns
Grouping by multiple columns creates a separate group for each unique combination of values:
SELECT category, region,
SUM(amount) AS total_sales,
COUNT(*) AS num_orders
FROM sales
GROUP BY category, region;
| category | region | total_sales | num_orders |
|---|---|---|---|
| Electronics | North | 2100 | 2 |
| Electronics | South | 3900 | 2 |
| Clothing | North | 750 | 2 |
| Clothing | South | 1350 | 2 |
This is extremely useful for hierarchical reporting — you can see performance broken down by multiple dimensions in a single query.
4. The HAVING Clause
HAVING filters groups after aggregation, similar to how WHERE filters rows before aggregation.
SELECT category, SUM(amount) AS total_sales
FROM sales
GROUP BY category
HAVING SUM(amount) > 2500;
| category | total_sales |
|---|---|
| Electronics | 6000 |
Key difference between WHERE and HAVING:
WHEREfilters rows before grouping (can’t use aggregate functions)HAVINGfilters groups after grouping (can use aggregate functions)
-- WHERE filters rows before aggregation
-- HAVING filters groups after aggregation
SELECT region, SUM(amount) AS total_sales
FROM sales
WHERE amount > 500 -- Exclude small orders before grouping
GROUP BY region
HAVING SUM(amount) > 2000; -- Only show regions over $2K total
| region | total_sales |
|---|---|
| South | 4650 |
(Note: North’s small orders over $500 total only $1350, so North is excluded by HAVING.)
5. GROUPING SETS, CUBE, and ROLLUP
This is where DuckDB’s GROUP BY capabilities really shine. These features generate subtotals and grand totals in a single query.
GROUPING SETS
GROUPING SETS lets you specify multiple grouping levels explicitly:
SELECT category, region, SUM(amount) AS total_sales
FROM sales
GROUP BY GROUPING SETS (
(category, region), -- detail level
(category), -- subtotal by category
(region), -- subtotal by region
() -- grand total
);
| category | region | total_sales |
|---|---|---|
| Electronics | North | 2100 |
| Electronics | South | 3900 |
| Clothing | North | 750 |
| Clothing | South | 1350 |
| Electronics | NULL | 6000 |
| Clothing | NULL | 2100 |
| NULL | North | 2850 |
| NULL | South | 5250 |
| NULL | NULL | 8100 |
The GROUPING() function helps distinguish NULLs from actual data values vs. subtotal markers:
SELECT
CASE WHEN GROUPING(category) = 0 THEN category ELSE 'ALL' END AS category,
CASE WHEN GROUPING(region) = 0 THEN region ELSE 'ALL' END AS region,
SUM(amount) AS total_sales
FROM sales
GROUP BY GROUPING SETS ((category, region), (category), (region), ());
ROLLUP
ROLLUP creates a hierarchy of subtotals — perfect for time-series and hierarchical data:
-- Hierarchical: category → region → grand total
SELECT category, region, SUM(amount) AS total_sales
FROM sales
GROUP BY ROLLUP (category, region);
This is equivalent to:
GROUP BY GROUPING SETS ((category, region), (category), ())
ROLLUP is ideal for reporting by year → quarter → month, or department → team → employee.
CUBE
CUBE generates all possible combination subtotals:
-- All combinations: category × region
SELECT category, region, SUM(amount) AS total_sales
FROM sales
GROUP BY CUBE (category, region);
This is equivalent to:
GROUP BY GROUPING SETS ((category, region), (category), (region), ())
For N dimensions, CUBE generates 2^N grouping sets, while ROLLUP generates N+1.
6. GROUP BY ALL — DuckDB’s Productivity Booster
Introduced in DuckDB 0.8.0, GROUP BY ALL is a game-changer for writing quick analytical queries. It automatically groups by all non-aggregated columns in the SELECT list — you don’t need to manually list them.
Without GROUP BY ALL:
SELECT category, region, SUM(amount) AS total_sales
FROM sales
GROUP BY category, region; -- Must repeat column names
With GROUP BY ALL:
SELECT category, region, SUM(amount) AS total_sales
FROM sales
GROUP BY ALL; -- Automatically groups by category and region
This seems simple, but it saves enormous time in exploratory analysis where you’re rapidly iterating on queries:
-- Add more dimensions — GROUP BY ALL handles it automatically
SELECT category, region, sale_date, SUM(amount) AS daily_sales
FROM sales
GROUP BY ALL;
-- Complex expressions also work
SELECT category,
year(sale_date) AS sale_year,
SUM(amount) AS total_sales
FROM sales
GROUP BY ALL;
GROUP BY ALL follows a simple rule: every column in the SELECT list that is not wrapped in an aggregate function becomes a grouping column. This is especially powerful when you have many columns and don’t want to type them all twice.
Pro tip:
GROUP BY ALLis unique to DuckDB among major databases. It’s one of the features that makes DuckDB exceptionally pleasant for interactive data exploration.
7. Practical Examples with Real Data
Example 1: E-commerce Order Analysis
-- Create an orders table representing real e-commerce data
CREATE TABLE orders AS
SELECT * FROM (VALUES
('ORD-001', 'Alice', 'Widget', 3, 15.99, '2026-01-05'),
('ORD-002', 'Bob', 'Gadget', 1, 49.99, '2026-01-06'),
('ORD-003', 'Alice', 'Widget', 2, 15.99, '2026-01-10'),
('ORD-004', 'Charlie', 'Gadget', 5, 49.99, '2026-01-12'),
('ORD-005', 'Bob', 'Widget', 10, 15.99, '2026-01-15'),
('ORD-006', 'Alice', 'Gadget', 1, 49.99, '2026-01-20'),
('ORD-007', 'Charlie', 'Widget', 4, 15.99, '2026-01-22'),
('ORD-008', 'Bob', 'Premium', 2, 199.99,'2026-02-01'),
('ORD-009', 'Alice', 'Premium', 1, 199.99,'2026-02-05'),
('ORD-010', 'Charlie', 'Premium', 3, 199.99,'2026-02-10')
) AS t(order_id, customer, product, quantity, price, order_date);
-- Total revenue per customer (with order count and average value)
SELECT customer,
COUNT(*) AS num_orders,
SUM(quantity * price) AS total_spent,
AVG(quantity * price) AS avg_order_value,
MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer
ORDER BY total_spent DESC;
-- Monthly revenue with subtotals using ROLLUP
SELECT year(order_date) AS yr,
month(order_date) AS mo,
SUM(quantity * price) AS revenue
FROM orders
GROUP BY ROLLUP (yr, mo)
ORDER BY yr NULLS LAST, mo NULLS LAST;
Example 2: Customer Segmentation
-- Classify customers by spending behavior
SELECT customer,
SUM(quantity * price) AS total_spent,
COUNT(*) AS order_count,
CASE
WHEN SUM(quantity * price) >= 500 THEN 'Premium'
WHEN SUM(quantity * price) >= 200 THEN 'Regular'
ELSE 'Budget'
END AS segment
FROM orders
GROUP BY customer
ORDER BY total_spent DESC;
Example 3: Finding Top Categories Per Region
-- Using GROUP BY with filtering
SELECT region, category, SUM(amount) AS total_sales
FROM sales
GROUP BY region, category
HAVING SUM(amount) > 1000
ORDER BY region, total_sales DESC;
Example 4: Statistical Aggregations
DuckDB also supports statistical aggregate functions:
SELECT category,
AVG(amount) AS mean,
STDDEV(amount) AS std_dev,
VARIANCE(amount) AS variance,
MEDIAN(amount) AS median_value,
MODE(amount) AS most_common_value
FROM sales
GROUP BY category;
8. Performance Tips for GROUP BY in DuckDB
- Order of columns matters — Put high-cardinality columns first in GROUP BY for better hash-table performance.
- Use GROUP BY ALL in exploration — It reduces typos and speeds up iterative querying.
- Prefer ROLLUP over multiple UNION ALL queries — ROLLUP computes subtotals in a single pass.
- Filter early — Use WHERE to reduce row count before aggregation for faster queries.
- Consider materialized views — For repeated aggregations on large datasets, DuckDB’s materialized views can cache results.
Conclusion
DuckDB’s GROUP BY and aggregation capabilities are among the most powerful in any database system. From basic COUNT and SUM to advanced multi-dimensional analysis with GROUPING SETS, CUBE, and ROLLUP, DuckDB provides everything you need for data summarization and reporting.
The GROUP BY ALL syntax is a standout feature that dramatically improves productivity during exploratory data analysis — no other major database offers this convenience. Combined with DuckDB’s excellent performance for analytical workloads, it makes DuckDB an ideal choice for data scientists, analysts, and engineers who need to aggregate and analyze data quickly.
To continue your DuckDB journey, check out the DuckDB SQL Syntax Guide and the DuckDB Beginner’s Guide 2026 for more foundational knowledge.
Last updated: May 30, 2026
