A new feature added to SQL Server 2005 for the sake of the windowing functions is the OVER clause. Using this clause, you can specify ordering or partitioning for the windowing functions. For instance, to enumerate the names of all of the products in the AdventureWorks database that have a list price, along with their list prices and the rank of those prices compared to all of the other prices, the following query can now be used:
SELECT P.Name, P.ListPrice, DENSE_RANK() OVER (ORDER BY P.ListPrice DESC) AS PriceRank FROM Production.Product P WHERE ListPrice > 0 ORDER BY P.Name ASC Name List Price PriceRank ------------------------------------------------- All-Purpose Bike Stand 159.0000 44 AWC Logo Cap 8.9900 98 Bike Wash - Dissolver 7.9500 99 Cable Lock 25.0000 88 Chain 20.2400 93 Classic Vest, L 63.5000 66 Classic Vest, M 63.5000 66 Classic Vest, S 63.5000 66 Fender Set - Mountain 21.9800 91 Front Brakes 106.5000 55 Front Derailleur 91.4900 59 ...
So what does this tell us? All-Purpose Bike Stand is the 44th most expensive item sold by AdventureWorks. AWC Logo Cap is the 98th most expensive item. And the Vests are tied for 66th most expensive. Which is why DENSE_RANK was used for this example! But really, this example is only here to demonstrate one use of the OVER clause. And this post isn’t about windowing functions or rankings at all. That’s another post for another day.
What this post is about is normal, non-windowing aggregate functions. Like SUM(). It turns out that the OVER clause can be used for them, too!
Pretend that you’re an employee of AdventureWorks and your manager comes to you with a request: Write a query to return all of the products, their prices, their subcategories, and the average price for all products in the subcategory that any given product belongs to… Why? Perhaps the manager wants to re-categorize products based on whether they fall, percentage-wise, close to the same average price. Or maybe it just makes a good contrived example for showing this feature! Regardless…
Here’s how you can solve this in SQL Server 2000:
SELECT P.Name AS ProductName, P.ListPrice, PS.Name AS ProductSubCategoryName, x.AveragePrice FROM Production.Product P JOIN Production.ProductSubCategory PS ON P.ProductSubCategoryID = PS.ProductSubCategoryID JOIN ( SELECT P2.ProductSubCategoryID, AVG(P2.ListPrice) AS AveragePrice FROM Production.Product P2 WHERE P2.ProductSubCategoryID IS NOT NULL GROUP BY P2.ProductSubCategoryID ) x ON x.ProductSubCategoryID = P.ProductSubCategoryId ORDER BY P.Name
I don’t know about you (since I have no clue who you are), but I personally have a difficult time reading this. If I came back to this query in six months, it would take me a few minutes to figue out what was going on. And doesn’t it feel like there should be a more efficient way of expressing it?
…Well, now there is…
SELECT P.Name AS ProductName, P.ListPrice, PS.Name AS ProductSubCategoryName, AVG(P.ListPrice) OVER (PARTITION BY P.ProductSubCategoryID) FROM Production.Product P JOIN Production.ProductSubCategory PS ON P.ProductSubCategoryID = PS.ProductSubCategoryID ORDER BY P.Name
So what’s going on here? Under the covers, SQL Server builds a subquery for the average, based on the partitioning column of the OVER clause — which in this case is ProductSubCategoryID. It’s a little bit less efficient in this case than the derived table approach, but a lot cleaner from a readability standpoint. Personally, I think it’s a really cool feature, although I don’t honestly see myself using it too often.
More ways to express yourself using SQL Server 2005. Madonna would be proud.