The Pain: Why Does Window Function Filtering Need Two Levels of Nesting?
Every SQL developer has been here — you want to find the top 3 salaries per department, and your code ends up looking like this:
SELECT dept, name, salary
FROM (
SELECT *,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
FROM employees
) sub
WHERE rnk <= 3;
Two levels of nesting. One misplaced parenthesis and it breaks. And this is the simple case — throw in a JOIN, a GROUP BY, or a HAVING clause, and your query becomes a tangled mess.
DuckDB’s answer: the QUALIFY clause.
In one sentence: QUALIFY is a SQL standard extension (introduced in SQL:1999, but rarely implemented) that lets you filter directly on window function results before the final result set is returned — no subquery needed.
SELECT dept, name, salary
FROM employees
QUALIFY RANK() OVER (PARTITION BY dept ORDER BY salary DESC) <= 3;
The difference? QUALIFY is 3 clean lines. The subquery version is 8. Readability improves by at least 2x.
The diagram below shows where QUALIFY fits in SQL execution order:

What Exactly Is QUALIFY?
QUALIFY is syntactic sugar — it sits after WHERE/GROUP BY/HAVING and before ORDER BY/LIMIT in the SQL execution pipeline. The standard SQL execution order is:
FROM+JOINWHEREGROUP BY+ aggregate functionsHAVING- Window function computation ← here
QUALIFY← DuckDB filters hereSELECT(projection)DISTINCTUNION/INTERSECT/EXCEPTORDER BYLIMIT/OFFSET
Key insight: QUALIFY executes after window functions compute and before SELECT projection. This means you can reference window function expressions in QUALIFY, but you cannot reference SELECT aliases.
Core Rules
QUALIFYcan only reference window function expressions (RANK, ROW_NUMBER, SUM OVER, LAG, etc.)- It cannot reference plain columns directly (though the window function itself can)
WHEREandQUALIFYare complementary — WHERE filters rows before aggregation, QUALIFY filters after window computation- Performance-wise, QUALIFY is equivalent to a subquery (the optimizer generates the same plan), but readability is dramatically better
Real-World Scenario 1: Top N Per Group
This is the most classic QUALIFY use case. You have sales data and need the top performers per region:
-- Create sample sales data
CREATE TABLE sales AS
SELECT * FROM (VALUES
('East', 'Alice', '2026-01', 85000),
('East', 'Bob', '2026-01', 92000),
('East', 'Carol', '2026-01', 78000),
('East', 'Dave', '2026-01', 105000),
('East', 'Eve', '2026-02', 88000),
('South', 'Frank', '2026-01', 95000),
('South', 'Grace', '2026-01', 72000),
('South', 'Hank', '2026-01', 110000),
('South', 'Ivy', '2026-02', 87000),
('North', 'Jack', '2026-01', 65000),
('North', 'Kate', '2026-01', 89000),
('North', 'Leo', '2026-01', 92000)
) AS t(region, salesperson, month, amount);
-- Top 2 per region
SELECT region, salesperson, month, amount
FROM sales
QUALIFY RANK() OVER (
PARTITION BY region
ORDER BY amount DESC
) <= 2
ORDER BY region, amount DESC;
Results:
| region | salesperson | month | amount |
|---|---|---|---|
| East | Dave | 2026-01 | 105000 |
| East | Bob | 2026-01 | 92000 |
| South | Hank | 2026-01 | 110000 |
| South | Frank | 2026-01 | 95000 |
| North | Leo | 2026-01 | 92000 |
| North | Kate | 2026-01 | 89000 |
The subquery version would be twice as long and require mental stack-tracing — you’d have to mentally “unfold” the inner query before reading the outer filter. QUALIFY reads linearly, top to bottom.
Real-World Scenario 2: Dedup — Keep Latest Record Per User
A common pattern in data lakes: incremental data produces multiple records per user, and you only want the latest one:
-- User event data
CREATE TABLE user_events AS
SELECT * FROM (VALUES
('user_001', 'login', '2026-05-29 10:30:00'),
('user_001', 'purchase', '2026-05-29 10:35:00'),
('user_001', 'logout', '2026-05-29 11:00:00'),
('user_002', 'login', '2026-05-29 09:00:00'),
('user_002', 'view_item', '2026-05-29 09:15:00'),
('user_002', 'purchase', '2026-05-29 09:20:00'),
('user_003', 'login', '2026-05-28 22:00:00')
) AS t(user_id, event, event_time);
-- Latest event per user
SELECT user_id, event, event_time
FROM user_events
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_time DESC
) = 1;
Results:
| user_id | event | event_time |
|---|---|---|
| user_001 | logout | 2026-05-29 11:00:00 |
| user_002 | purchase | 2026-05-29 09:20:00 |
| user_003 | login | 2026-05-28 22:00:00 |
This pattern is equivalent to PostgreSQL’s DISTINCT ON syntax — but QUALIFY is standard SQL with better portability across databases.
Interview gold: Any time you hear “Top N per group” or “latest record per entity,” QUALIFY is your answer.
Real-World Scenario 3: Anomaly Detection with LAG + QUALIFY
QUALIFY works with all window functions, not just ranking ones. Here’s an anomaly detection use case with LAG:
-- Daily revenue data
CREATE TABLE daily_revenue AS
SELECT * FROM (VALUES
('2026-05-20', 12000),
('2026-05-21', 13500),
('2026-05-22', 11000),
('2026-05-23', 8500),
('2026-05-24', 9000),
('2026-05-25', 14000),
('2026-05-26', 16000),
('2026-05-27', 15500),
('2026-05-28', 17000),
('2026-05-29', 10000)
) AS t(dt, revenue);
-- Detect days with revenue drop > 15%
SELECT dt, revenue,
ROUND((revenue - LAG(revenue) OVER (ORDER BY dt))
/ NULLIF(LAG(revenue) OVER (ORDER BY dt), 0) * 100, 1) AS pct_change
FROM daily_revenue
QUALIFY (revenue - LAG(revenue) OVER (ORDER BY dt))
/ NULLIF(LAG(revenue) OVER (ORDER BY dt), 0) < -0.15;
Results:
| dt | revenue | pct_change |
|---|---|---|
| 2026-05-22 | 11000 | -18.5 |
| 2026-05-23 | 8500 | -22.7 |
| 2026-05-29 | 10000 | -41.2 |
This is one of the most common needs in e-commerce analytics — automatically flag anomalies. With QUALIFY, you don’t need views, CTEs, or subqueries. One SQL statement, done.
Execution Plan: Is QUALIFY Faster?
Let’s check with EXPLAIN:
EXPLAIN
SELECT dept, name, salary
FROM employees
QUALIFY RANK() OVER (PARTITION BY dept ORDER BY salary DESC) <= 3;
DuckDB’s optimizer translates QUALIFY into the same physical plan as a subquery. Performance is identical. So why use it?
Because the human brain is not a compiler. Your reading speed is determined by nesting depth and line count. QUALIFY flattens 3 levels of nesting into 1, reducing cognitive load by at least 50%.
-- ❌ Subquery: mental stack required
SELECT dept, name, salary FROM (
SELECT *, RANK() OVER (...) AS rnk FROM employees
) WHERE rnk <= 3;
-- ✅ QUALIFY: linear reading, no context switching
SELECT dept, name, salary FROM employees
QUALIFY RANK() OVER (...) <= 3;
Comparison: QUALIFY Support Across Databases
| Feature | DuckDB | PostgreSQL | Snowflake | BigQuery | MySQL | SQLite |
|---|---|---|---|---|---|---|
| QUALIFY | ✅ Native | ❌ Use DISTINCT ON or subquery | ✅ Native | ❌ Subquery/CTE only | ❌ Window functions but no QUALIFY | ❌ Not supported |
| DISTINCT ON | ❌ Not supported | ✅ Native | ❌ Not supported | ❌ Not supported | ❌ Not supported | ❌ Not supported |
| Subquery workaround | Supported | Supported | Supported | Supported | Supported | Supported |
| CTE + WHERE filter | Supported | Supported | Supported | Supported | Supported | Supported |
| Execution order | FROM→WHERE→GROUP BY→HAVING→Window→QUALIFY→SELECT | FROM→WHERE→GROUP BY→HAVING→Window→SELECT | FROM→WHERE→GROUP BY→HAVING→Window→QUALIFY→SELECT | FROM→WHERE→GROUP BY→HAVING→Window→SELECT | FROM→WHERE→GROUP BY→HAVING→Window→SELECT | FROM→WHERE→GROUP BY→HAVING→Window→SELECT |
| Code conciseness | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ (DISTINCT ON helps partially) | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐ |
| SQL Standard | SQL:1999 extension | PostgreSQL extension | SQL:1999 extension | Google dialect | MySQL dialect | SQLite dialect |
Key takeaway: DuckDB and Snowflake are the two best modern analytic databases for QUALIFY support. PostgreSQL, despite being incredibly powerful, requires 3-5 extra lines for the same task. BigQuery and MySQL don’t support it at all.
QUALIFY Limitations and Pitfalls
1. Cannot Reference Plain Columns
-- ❌ Error: QUALIFY can't contain non-window expressions
SELECT dept, name, salary
FROM employees
QUALIFY salary > 10000 AND RANK() OVER (...) <= 3;
-- ✅ Correct: WHERE for columns, QUALIFY for window functions
SELECT dept, name, salary
FROM employees
WHERE salary > 10000
QUALIFY RANK() OVER (...) <= 3;
2. QUALIFY Placement
QUALIFY must come after WHERE/GROUP BY/HAVING and before ORDER BY/LIMIT:
SELECT ...
FROM ...
WHERE ... -- filter rows first
GROUP BY ... -- then aggregate
HAVING ... -- then filter aggregates
QUALIFY ... -- then filter window results
ORDER BY ... -- finally sort
LIMIT ...;
3. Cannot Use SELECT Aliases in QUALIFY
-- ❌ Error: SELECT aliases are created after QUALIFY
SELECT
RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
QUALIFY salary_rank <= 10;
-- ✅ Correct: write the window expression directly
SELECT
RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
QUALIFY RANK() OVER (ORDER BY salary DESC) <= 10;
4. Performance Is Identical to Subqueries
As mentioned, QUALIFY is syntactic sugar, not a performance optimization. But the readability improvement translates directly into reduced maintenance costs and faster debugging cycles.
Advanced Pattern: QUALIFY + CTE Pipeline
QUALIFY combined with CTEs creates remarkably clean ETL pipelines:
-- 1. Clean data first
WITH cleaned_events AS (
SELECT user_id, event, event_time
FROM raw_events
WHERE event IS NOT NULL
),
-- 2. Dedup — latest event per user (QUALIFY shines here)
latest_events AS (
SELECT user_id, event, event_time
FROM cleaned_events
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_time DESC
) = 1
),
-- 3. Aggregate
user_stats AS (
SELECT
DATE_TRUNC('day', event_time) AS active_date,
COUNT(*) AS active_users
FROM latest_events
GROUP BY active_date
)
-- 4. Final output
SELECT * FROM user_stats
ORDER BY active_date DESC;
This pipeline separates four steps cleanly, each doing one thing. Without QUALIFY, step 2 would require an extra subquery layer, adding unnecessary complexity.
Monetization: Turn Your QUALIFY Skills Into Income
SQL Interview Question Pack — QUALIFY is a blind spot for most data analysts. Create a “DuckDB QUALIFY 50-Question Interview Pack” e-book priced at $5-10. Sell it on Gumroad, Dev.to, or your own site. The “Top N per group” pattern is a guaranteed interview question — QUALIFY reduces it from 8 lines of subquery to 3 lines.
SQL Code Review Consulting — Many organizations still write SQL like it’s 2005. Offer a “SQL Modernization Audit” service: review 50 queries, identify ones that can be simplified with QUALIFY, charge $200-500 per audit. The ROI is immediate — cleaner code = fewer bugs = lower maintenance costs.
E-commerce Monitoring SaaS — Combine QUALIFY + LAG for anomaly detection in your Shopify monitoring service (as described in the companion channel post). The fact that your anomaly detection SQL is 50% shorter means lower maintenance costs, which lets you undercut competitors on price while maintaining higher margins. Target: $50-200/month per client.
YouTube Tutorial Monetization — Publish QUALIFY tutorials on youtube.com/@duckdblab. A well-optimized “DuckDB QUALIFY tutorial” video can attract 5000+ monthly views from data engineers actively searching for this content. Monetize through YouTube ads, channel memberships ($4.99/month), and sponsored segments from DuckDB-related tooling companies.
Summary
| Point | Details |
|---|---|
| What is QUALIFY | SQL:1999 standard extension — filters directly after window function computation |
| SQL Order | FROM → WHERE → GROUP BY → HAVING → Window → QUALIFY → SELECT → ORDER BY → LIMIT |
| All window functions | RANK, ROW_NUMBER, DENSE_RANK, NTILE, LAG/LEAD, SUM/AVG OVER — everything works |
| Key advantage | Eliminates subquery nesting, improves readability 2x-5x |
| Performance | Identical to subquery (pure syntactic sugar) |
| Supported by | DuckDB ✅ Snowflake ✅ PostgreSQL ❌ (DISTINCT ON instead) BigQuery ❌ MySQL ❌ |
Remember QUALIFY in one sentence: When you find yourself writing WHERE but the condition comes from a window function — that’s when you need QUALIFY.
Next time you type RANK() OVER in DuckDB, try adding QUALIFY instead of wrapping it in a subquery. Your code will be cleaner, and your teammates will thank you.
📺 More DuckDB tutorials → youtube.com/@duckdblab