Featured image of post DuckDB QUALIFY Clause: Filter Window Functions Without Subqueries

DuckDB QUALIFY Clause: Filter Window Functions Without Subqueries

DuckDB's QUALIFY clause is a SQL:1999 standard extension that lets you filter window function results directly, without nested subqueries. Complete guide with real-world examples, execution plans, and comparisons with PostgreSQL and Snowflake.

The Pain: Why Does Window Function Filtering Need Two Levels of Nesting?

Every SQL developer has been here — you want to find the top 3 salaries per department, and your code ends up looking like this:

SELECT dept, name, salary
FROM (
  SELECT *,
    RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
  FROM employees
) sub
WHERE rnk <= 3;

Two levels of nesting. One misplaced parenthesis and it breaks. And this is the simple case — throw in a JOIN, a GROUP BY, or a HAVING clause, and your query becomes a tangled mess.

DuckDB’s answer: the QUALIFY clause.

In one sentence: QUALIFY is a SQL standard extension (introduced in SQL:1999, but rarely implemented) that lets you filter directly on window function results before the final result set is returned — no subquery needed.

SELECT dept, name, salary
FROM employees
QUALIFY RANK() OVER (PARTITION BY dept ORDER BY salary DESC) <= 3;

The difference? QUALIFY is 3 clean lines. The subquery version is 8. Readability improves by at least 2x.

The diagram below shows where QUALIFY fits in SQL execution order:

QUALIFY in SQL execution order


What Exactly Is QUALIFY?

QUALIFY is syntactic sugar — it sits after WHERE/GROUP BY/HAVING and before ORDER BY/LIMIT in the SQL execution pipeline. The standard SQL execution order is:

  1. FROM + JOIN
  2. WHERE
  3. GROUP BY + aggregate functions
  4. HAVING
  5. Window function computation ← here
  6. QUALIFY ← DuckDB filters here
  7. SELECT (projection)
  8. DISTINCT
  9. UNION / INTERSECT / EXCEPT
  10. ORDER BY
  11. LIMIT / OFFSET

Key insight: QUALIFY executes after window functions compute and before SELECT projection. This means you can reference window function expressions in QUALIFY, but you cannot reference SELECT aliases.

Core Rules

  • QUALIFY can only reference window function expressions (RANK, ROW_NUMBER, SUM OVER, LAG, etc.)
  • It cannot reference plain columns directly (though the window function itself can)
  • WHERE and QUALIFY are complementary — WHERE filters rows before aggregation, QUALIFY filters after window computation
  • Performance-wise, QUALIFY is equivalent to a subquery (the optimizer generates the same plan), but readability is dramatically better

Real-World Scenario 1: Top N Per Group

This is the most classic QUALIFY use case. You have sales data and need the top performers per region:

-- Create sample sales data
CREATE TABLE sales AS
SELECT * FROM (VALUES
  ('East', 'Alice', '2026-01', 85000),
  ('East', 'Bob', '2026-01', 92000),
  ('East', 'Carol', '2026-01', 78000),
  ('East', 'Dave', '2026-01', 105000),
  ('East', 'Eve', '2026-02', 88000),
  ('South', 'Frank', '2026-01', 95000),
  ('South', 'Grace', '2026-01', 72000),
  ('South', 'Hank', '2026-01', 110000),
  ('South', 'Ivy', '2026-02', 87000),
  ('North', 'Jack', '2026-01', 65000),
  ('North', 'Kate', '2026-01', 89000),
  ('North', 'Leo', '2026-01', 92000)
) AS t(region, salesperson, month, amount);

-- Top 2 per region
SELECT region, salesperson, month, amount
FROM sales
QUALIFY RANK() OVER (
  PARTITION BY region 
  ORDER BY amount DESC
) <= 2
ORDER BY region, amount DESC;

Results:

regionsalespersonmonthamount
EastDave2026-01105000
EastBob2026-0192000
SouthHank2026-01110000
SouthFrank2026-0195000
NorthLeo2026-0192000
NorthKate2026-0189000

The subquery version would be twice as long and require mental stack-tracing — you’d have to mentally “unfold” the inner query before reading the outer filter. QUALIFY reads linearly, top to bottom.


Real-World Scenario 2: Dedup — Keep Latest Record Per User

A common pattern in data lakes: incremental data produces multiple records per user, and you only want the latest one:

-- User event data
CREATE TABLE user_events AS
SELECT * FROM (VALUES
  ('user_001', 'login', '2026-05-29 10:30:00'),
  ('user_001', 'purchase', '2026-05-29 10:35:00'),
  ('user_001', 'logout', '2026-05-29 11:00:00'),
  ('user_002', 'login', '2026-05-29 09:00:00'),
  ('user_002', 'view_item', '2026-05-29 09:15:00'),
  ('user_002', 'purchase', '2026-05-29 09:20:00'),
  ('user_003', 'login', '2026-05-28 22:00:00')
) AS t(user_id, event, event_time);

-- Latest event per user
SELECT user_id, event, event_time
FROM user_events
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id 
  ORDER BY event_time DESC
) = 1;

Results:

user_ideventevent_time
user_001logout2026-05-29 11:00:00
user_002purchase2026-05-29 09:20:00
user_003login2026-05-28 22:00:00

This pattern is equivalent to PostgreSQL’s DISTINCT ON syntax — but QUALIFY is standard SQL with better portability across databases.

Interview gold: Any time you hear “Top N per group” or “latest record per entity,” QUALIFY is your answer.


Real-World Scenario 3: Anomaly Detection with LAG + QUALIFY

QUALIFY works with all window functions, not just ranking ones. Here’s an anomaly detection use case with LAG:

-- Daily revenue data
CREATE TABLE daily_revenue AS
SELECT * FROM (VALUES
  ('2026-05-20', 12000),
  ('2026-05-21', 13500),
  ('2026-05-22', 11000),
  ('2026-05-23', 8500),
  ('2026-05-24', 9000),
  ('2026-05-25', 14000),
  ('2026-05-26', 16000),
  ('2026-05-27', 15500),
  ('2026-05-28', 17000),
  ('2026-05-29', 10000)
) AS t(dt, revenue);

-- Detect days with revenue drop > 15%
SELECT dt, revenue,
  ROUND((revenue - LAG(revenue) OVER (ORDER BY dt)) 
    / NULLIF(LAG(revenue) OVER (ORDER BY dt), 0) * 100, 1) AS pct_change
FROM daily_revenue
QUALIFY (revenue - LAG(revenue) OVER (ORDER BY dt)) 
    / NULLIF(LAG(revenue) OVER (ORDER BY dt), 0) < -0.15;

Results:

dtrevenuepct_change
2026-05-2211000-18.5
2026-05-238500-22.7
2026-05-2910000-41.2

This is one of the most common needs in e-commerce analytics — automatically flag anomalies. With QUALIFY, you don’t need views, CTEs, or subqueries. One SQL statement, done.


Execution Plan: Is QUALIFY Faster?

Let’s check with EXPLAIN:

EXPLAIN
SELECT dept, name, salary
FROM employees
QUALIFY RANK() OVER (PARTITION BY dept ORDER BY salary DESC) <= 3;

DuckDB’s optimizer translates QUALIFY into the same physical plan as a subquery. Performance is identical. So why use it?

Because the human brain is not a compiler. Your reading speed is determined by nesting depth and line count. QUALIFY flattens 3 levels of nesting into 1, reducing cognitive load by at least 50%.

-- ❌ Subquery: mental stack required
SELECT dept, name, salary FROM (
  SELECT *, RANK() OVER (...) AS rnk FROM employees
) WHERE rnk <= 3;

-- ✅ QUALIFY: linear reading, no context switching
SELECT dept, name, salary FROM employees
QUALIFY RANK() OVER (...) <= 3;

Comparison: QUALIFY Support Across Databases

FeatureDuckDBPostgreSQLSnowflakeBigQueryMySQLSQLite
QUALIFY✅ Native❌ Use DISTINCT ON or subquery✅ Native❌ Subquery/CTE only❌ Window functions but no QUALIFY❌ Not supported
DISTINCT ON❌ Not supported✅ Native❌ Not supported❌ Not supported❌ Not supported❌ Not supported
Subquery workaroundSupportedSupportedSupportedSupportedSupportedSupported
CTE + WHERE filterSupportedSupportedSupportedSupportedSupportedSupported
Execution orderFROM→WHERE→GROUP BY→HAVING→Window→QUALIFY→SELECTFROM→WHERE→GROUP BY→HAVING→Window→SELECTFROM→WHERE→GROUP BY→HAVING→Window→QUALIFY→SELECTFROM→WHERE→GROUP BY→HAVING→Window→SELECTFROM→WHERE→GROUP BY→HAVING→Window→SELECTFROM→WHERE→GROUP BY→HAVING→Window→SELECT
Code conciseness⭐⭐⭐⭐⭐⭐⭐⭐ (DISTINCT ON helps partially)⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
SQL StandardSQL:1999 extensionPostgreSQL extensionSQL:1999 extensionGoogle dialectMySQL dialectSQLite dialect

Key takeaway: DuckDB and Snowflake are the two best modern analytic databases for QUALIFY support. PostgreSQL, despite being incredibly powerful, requires 3-5 extra lines for the same task. BigQuery and MySQL don’t support it at all.


QUALIFY Limitations and Pitfalls

1. Cannot Reference Plain Columns

-- ❌ Error: QUALIFY can't contain non-window expressions
SELECT dept, name, salary
FROM employees
QUALIFY salary > 10000 AND RANK() OVER (...) <= 3;

-- ✅ Correct: WHERE for columns, QUALIFY for window functions
SELECT dept, name, salary
FROM employees
WHERE salary > 10000
QUALIFY RANK() OVER (...) <= 3;

2. QUALIFY Placement

QUALIFY must come after WHERE/GROUP BY/HAVING and before ORDER BY/LIMIT:

SELECT ...
FROM ...
WHERE ...              -- filter rows first
GROUP BY ...           -- then aggregate
HAVING ...             -- then filter aggregates
QUALIFY ...            -- then filter window results
ORDER BY ...           -- finally sort
LIMIT ...;

3. Cannot Use SELECT Aliases in QUALIFY

-- ❌ Error: SELECT aliases are created after QUALIFY
SELECT 
  RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
QUALIFY salary_rank <= 10;

-- ✅ Correct: write the window expression directly
SELECT 
  RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees
QUALIFY RANK() OVER (ORDER BY salary DESC) <= 10;

4. Performance Is Identical to Subqueries

As mentioned, QUALIFY is syntactic sugar, not a performance optimization. But the readability improvement translates directly into reduced maintenance costs and faster debugging cycles.


Advanced Pattern: QUALIFY + CTE Pipeline

QUALIFY combined with CTEs creates remarkably clean ETL pipelines:

-- 1. Clean data first
WITH cleaned_events AS (
  SELECT user_id, event, event_time
  FROM raw_events
  WHERE event IS NOT NULL
),
-- 2. Dedup — latest event per user (QUALIFY shines here)
latest_events AS (
  SELECT user_id, event, event_time
  FROM cleaned_events
  QUALIFY ROW_NUMBER() OVER (
    PARTITION BY user_id 
    ORDER BY event_time DESC
  ) = 1
),
-- 3. Aggregate
user_stats AS (
  SELECT 
    DATE_TRUNC('day', event_time) AS active_date,
    COUNT(*) AS active_users
  FROM latest_events
  GROUP BY active_date
)
-- 4. Final output
SELECT * FROM user_stats
ORDER BY active_date DESC;

This pipeline separates four steps cleanly, each doing one thing. Without QUALIFY, step 2 would require an extra subquery layer, adding unnecessary complexity.


Monetization: Turn Your QUALIFY Skills Into Income

  1. SQL Interview Question Pack — QUALIFY is a blind spot for most data analysts. Create a “DuckDB QUALIFY 50-Question Interview Pack” e-book priced at $5-10. Sell it on Gumroad, Dev.to, or your own site. The “Top N per group” pattern is a guaranteed interview question — QUALIFY reduces it from 8 lines of subquery to 3 lines.

  2. SQL Code Review Consulting — Many organizations still write SQL like it’s 2005. Offer a “SQL Modernization Audit” service: review 50 queries, identify ones that can be simplified with QUALIFY, charge $200-500 per audit. The ROI is immediate — cleaner code = fewer bugs = lower maintenance costs.

  3. E-commerce Monitoring SaaS — Combine QUALIFY + LAG for anomaly detection in your Shopify monitoring service (as described in the companion channel post). The fact that your anomaly detection SQL is 50% shorter means lower maintenance costs, which lets you undercut competitors on price while maintaining higher margins. Target: $50-200/month per client.

  4. YouTube Tutorial Monetization — Publish QUALIFY tutorials on youtube.com/@duckdblab. A well-optimized “DuckDB QUALIFY tutorial” video can attract 5000+ monthly views from data engineers actively searching for this content. Monetize through YouTube ads, channel memberships ($4.99/month), and sponsored segments from DuckDB-related tooling companies.


Summary

PointDetails
What is QUALIFYSQL:1999 standard extension — filters directly after window function computation
SQL OrderFROM → WHERE → GROUP BY → HAVING → Window → QUALIFY → SELECT → ORDER BY → LIMIT
All window functionsRANK, ROW_NUMBER, DENSE_RANK, NTILE, LAG/LEAD, SUM/AVG OVER — everything works
Key advantageEliminates subquery nesting, improves readability 2x-5x
PerformanceIdentical to subquery (pure syntactic sugar)
Supported byDuckDB ✅ Snowflake ✅ PostgreSQL ❌ (DISTINCT ON instead) BigQuery ❌ MySQL ❌

Remember QUALIFY in one sentence: When you find yourself writing WHERE but the condition comes from a window function — that’s when you need QUALIFY.

Next time you type RANK() OVER in DuckDB, try adding QUALIFY instead of wrapping it in a subquery. Your code will be cleaner, and your teammates will thank you.

📺 More DuckDB tutorials → youtube.com/@duckdblab

📺 Watch video tutorials → DuckDB Lab YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy