Originally posted
here.
A new feature added to SQL Server 2005 for the sake of the windowing
functions is the OVER clause. Using this clause, you can specify
ordering or partitioning for the windowing functions. For instance, to
enumerate the names of all of the products in the AdventureWorks
database that have a list price, along with their list prices and the
rank of those prices compared to all of the other prices, the following
query can now be used:
SELECT
P.Name,
P.ListPrice,
DENSE_RANK() OVER (ORDER BY P.ListPrice DESC) AS PriceRank
FROM Production.Product P
WHERE
ListPrice > 0
ORDER BY
P.Name ASC
Name List Price PriceRank
-------------------------------------------------
All-Purpose Bike Stand 159.0000 44
AWC Logo Cap 8.9900 98
Bike Wash - Dissolver 7.9500 99
Cable Lock 25.0000 88
Chain 20.2400 93
Classic Vest, L 63.5000 66
Classic Vest, M 63.5000 66
Classic Vest, S 63.5000 66
Fender Set - Mountain 21.9800 91
Front Brakes 106.5000 55
Front Derailleur 91.4900 59
...
So what does this tell us? All-Purpose Bike Stand is the 44th most
expensive item sold by AdventureWorks. AWC Logo Cap is the 98th most
expensive item. And the Vests are tied for 66th most expensive. Which
is why DENSE_RANK was used for this example! But really, this example
is only here to demonstrate one use of the OVER clause. And this post
isn't about windowing functions or rankings at all. That's another post
for another day.
What this post is about is normal, non-windowing aggregate functions. Like SUM(). It turns out that the OVER clause can be used for them, too!
Pretend that you're an employee of AdventureWorks and your
manager comes to you with a request: Write a query to return all of the
products, their prices, their subcategories, and the average price for
all products in the subcategory that any given product belongs to...
Why? Perhaps the manager wants to re-categorize products based on
whether they fall, percentage-wise, close to the same average price. Or
maybe it just makes a good contrived example for showing this feature!
Regardless...
Here's how you can solve this in SQL Server 2000:
SELECT
P.Name AS ProductName,
P.ListPrice,
PS.Name AS ProductSubCategoryName,
x.AveragePrice
FROM Production.Product P
JOIN Production.ProductSubCategory PS ON P.ProductSubCategoryID = PS.ProductSubCategoryID
JOIN
(
SELECT
P2.ProductSubCategoryID,
AVG(P2.ListPrice) AS AveragePrice
FROM Production.Product P2
WHERE
P2.ProductSubCategoryID IS NOT NULL
GROUP BY
P2.ProductSubCategoryID
) x ON x.ProductSubCategoryID = P.ProductSubCategoryId
ORDER BY
P.Name
I don't know about you (since I have no clue who you are), but I
personally have a difficult time reading this. If I came back to this
query in six months, it would take me a few minutes to figue out what
was going on. And doesn't it feel like there should be a more efficient
way of expressing it?
...Well, now there is...
SELECT
P.Name AS ProductName,
P.ListPrice,
PS.Name AS ProductSubCategoryName,
AVG(P.ListPrice) OVER (PARTITION BY P.ProductSubCategoryID)
FROM Production.Product P
JOIN Production.ProductSubCategory PS ON P.ProductSubCategoryID = PS.ProductSubCategoryID
ORDER BY
P.Name
So what's going on here? Under the covers, SQL Server builds a
subquery for the average, based on the partitioning column of the OVER
clause -- which in this case is ProductSubCategoryID. It's a little bit
less efficient in this case than the derived table approach, but a lot
cleaner from a readability standpoint. Personally, I think it's a
really cool feature, although I don't honestly see myself using it too
often.
More ways to express yourself using SQL Server 2005. Madonna would be proud.