Here’s a common coding scenario for SQL Server developers:
“I want to see the oldest amount due for each account, along with the account number and due date, ordered by account number.”
Since the release of SQL Server 2005, the simplest way to do this has been to use a window function like ROW_NUMBER. In many cases, everything you need can be done in a single SELECT statement with your window function. The trouble comes when you want to incorporate that function in some other way. For instance, using it a WHERE clause.
Let’s use this example from AdventureWorks2012:
/* This gives us all orders for each account as determined by DueDate , and the result set is ordered by AccountNumber */ SELECT ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY DueDate) AS rn , AccountNumber , DueDate , TotalDue FROM Sales.SalesOrderHeader AS soh ORDER BY AccountNumber
This works perfectly and gives us every row in the table. Now, let’s modify it to meet the scenario we outlined earlier, and only show the oldest order for each account.
We could try this…
/* This gives us the oldest order for each account as determined by DueDate */ SELECT ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY DueDate) AS rn , AccountNumber , DueDate , TotalDue FROM Sales.SalesOrderHeader AS soh WHERE rn = 1 ORDER BY AccountNumber
…to which, SQL Server gives you a Dikembe Mutumbo finger-wag:
Msg 207, Level 16, State 1, Line 7 Invalid column name 'rn'.
“Invalid column name”?
Maybe we need to not alias it. Let’s try again.
SELECT ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY DueDate) AS rn , AccountNumber , DueDate , TotalDue FROM Sales.SalesOrderHeader AS soh WHERE ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY DueDate) = 1 ORDER BY AccountNumber
Okay, that’s much bett–
Msg 4108, Level 15, State 1, Line 7 Windowed functions can only appear in the SELECT or ORDER BY clauses.
Why can’t we put a window function in a where clause?
Because the WHERE clause already happened.
Logical Query Processing
SQL Server doesn’t process parts of a query in the same order they’re written. Rather than start with SELECT the way we read and write it, here’s the order SQL Server progresses through:
- GROUP BY
- ORDER BY
The first four steps are all about getting the source data and reducing the result set down. Steps 5 & 6 determine which columns are presented and in which order. Step 7 (TOP) is only applied at the end because you can’t say which rows are in the top n rows until the set has been sorted. (You can read Itzik Ben-Gan’s explanation of this process in way more detail here.)
Since the WHERE clause happens before the SELECT, it’s too late in the process to add the window function to the WHERE clause. It’d have to loop back around a second time to re-evaluate WHERE.
There’s No Magic Hack
Unfortunately, the logical query processing model is a fundamental way of doing business for SQL Server, so we can’t just cheat around it. Instead, we have to reference our window function through a subquery or CTE (common table expression). In other words, we have to code in a way that makes SQL Server evaluate a WHERE clause after the SELECT containing the window function:
/* This gives us the oldest order for each account as determined by DueDate */ SELECT rn, AccountNumber, DueDate, TotalDue FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY DueDate) AS rn , AccountNumber , DueDate , TotalDue FROM Sales.SalesOrderHeader AS soh ) as t WHERE t.rn = 1 ORDER BY AccountNumber
By putting the WHERE clause in the outer query and the window function in the subquery (also called an inner query), we get the window function to come before the WHERE.
It’s not a shortcut, but it’s the shortest path to get what we want.
On a related note, window functions can’t be used in an UPDATE statement, either.
Trivia Isn’t Always Trivial
More internal concepts like logical query processing won’t often play into your day-to-day work. However, having at least an awareness of deeper workings like these can not only save you time when troubleshooting T-SQL, but also understand the limitations of the code base you have to work with.
- Using the Same Column Twice in a SQL UPDATE Statement September 13, 2018
- T-SQL Tuesday #96: Three Pivot Points November 14, 2017
- Why You Can’t Use ROW_NUMBER() In Your WHERE Clause September 28, 2017
- Three Magic Letters to Make Your DBA Like You More September 21, 2017
- Stop Using PRINT to Track Query Progress September 12, 2017