THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Paging in SQL Server 2005

I keep seeing questions on newsgroups about paging in stored procedures, and whether there will be a better way in SQL Server 2005. However, aside from a few answers in newsgroups, I haven't seen any content on how to do it. So I'd like to spend a few minutes and share with you the new features that will make paging stored procedures both easier to build and a lot more performant...

 

But first, since this is a performance-related blog, let's generate a bunch of big test data!

Since I don't have AdventureWorks installed on my test SQL Server 2005 server at the moment and am too lazy to track it down, the data is somewhat ugly... Anyway, start this up and let it run for a while (takes around 35 minutes on my test box):

 

SELECT DISTINCT A.Name + B.Name + C.Name AS SomewhatLargeString
INTO #BigTableOfStrings
FROM master..spt_values A,
master..spt_values B,
master..spt_values C
WHERE a.TYPE NOT IN ('P', 'R', 'F', 'F_U')
AND b.TYPE NOT IN ('P', 'R', 'F', 'F_U')

CREATE CLUSTERED INDEX CI_LargeString ON #BigTableOfStrings(SomewhatLargeString)

Coffee (or preferably, beer) time! See you in 35 minutes.

... All set?

Okay, now let's pretend we're in a web app with a SQL Server 2000 backend and we want the second page of data... Rows 11-20, ordered by SomewhatLarge. How might we do this in SQL Server 2000..? Here's one way:

 

SELECT A.SomewhatLargeString
FROM #BigTableOfStrings A
JOIN #BigTableOfStrings B ON B.SomewhatLargeString <= A.SomewhatLargeString
GROUP BY A.SomewhatLargeString
HAVING COUNT(*) BETWEEN 11 AND 20

... Still waiting for that to return? Keep waiting. Maybe get another coffee, or go home for the night. On my system, that query has quite possibly the biggest estimated cost I've ever seen, a staggering 1,523,110,700. Yeah, that sucks. So let's change it around a bit:

 

SELECT x.SomewhatLargeString
FROM (
SELECT TOP 20 A.SomewhatLargeString, COUNT(*) AS TheCount
FROM #BigTableOfStrings A
JOIN #BigTableOfStrings B ON B.SomewhatLargeString <= A.SomewhatLargeString
GROUP BY A.SomewhatLargeString
ORDER BY A.SomewhatLargeString ) x
WHERE x.TheCount BETWEEN 11 AND 20

Much nicer than before, with an estimated cost on my system of 23.1, and a virtually instant return time. But wait, we have an over-zealous user who wants rows 20,001 - 20,010!

 

SELECT x.SomewhatLargeString
FROM (
SELECT TOP 20010 A.SomewhatLargeString, COUNT(*) AS TheCount
FROM #BigTableOfStrings A
JOIN #BigTableOfStrings B ON B.SomewhatLargeString <= A.SomewhatLargeString
GROUP BY A.SomewhatLargeString
ORDER BY A.SomewhatLargeString ) x
WHERE x.TheCount BETWEEN 20001 AND 20010

Oops, cost skyrocketed to 50516, with a return time of 1:30... Can you feel your users abandoning your sinking ship of an app and heading towards the competitors? Probably not, since they won't actually click through 20,000 rows, but it makes a really good contrived example, so let's roll with it!

So how to solve this problem? In SQL Server 2000, the answer is, find some other paging mechanism, probably using a middle tier. But in SQL Server 2005, we have new and better toys to play with. Allow me to introduce your new paging best friend, the ROW_NUMBER() function. For those readers who are slow on the uptake, this function does exactly what its name implies; it creates a surrogate row number for each row in a result set. So now, instead of the very painfully inefficient COUNT(*) methods, we can let SQL Server do all the work as it builds the result set we actually want...

 

SELECT x.SomewhatLargeString
FROM (
SELECT TOP 20010 A.SomewhatLargeString, ROW_NUMBER() OVER(ORDER BY A.SomewhatLargeString) AS TheCount
FROM #BigTableOfStrings A
ORDER BY A.SomewhatLargeString) x
WHERE x.TheCount BETWEEN 20000 AND 20010

I guess we can call that a tiny improvement. Total estimated cost: 0.202. That's only about a 25 MILLION PERCENT difference.

So why is ROW_NUMBER so much more efficient? It's a combination of the COUNT(*) method itself being inefficient and the query optimizer probably not handling it as well as it should. The COUNT(*) method requires an ordered-forward clustered index scan of the table, followed by correlation of each of the top 20010 rows that we asked for to all of the rows less than or equal to that row in the same table, for the count. Alas, the optimizer chooses a nested loop for that, which causes the cost to shoot up as the operation is repeated over and over.

The ROW_NUMBER method, on the other hand, requires only the scan of the TOP 20010 rows, followed by computation of the row number itself, which is nothing more than a scalar calculation. Then, just filter the rows. Simple, easy on the optimizer, easy for the system, and good for your customers. Definitely a great feature that I'm looking forward to using!


Published Wednesday, July 12, 2006 10:12 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

max said:

Great article. Thanks

April 28, 2009 4:59 PM
 

daryl2000 said:

Thank you for posting this article, which greatly facilitated the development of a paging sproc.  While it was easy to incorporate the ROW_NUMBER() function, SQL 2005 did not allow me to use a variable with TOP or a order by clause in the subquery.  I tried to use a variable with SET ROWCOUNT in lieu of TOP, but the results were erroneous (due to the lack of the order by clause in the subquery).  Based upon the query in the article, the paging sproc would be:

CREATE PROCEDURE dbo.PagedResults

   @PAGE_SIZE bigint,

   @PAGE_NUM bigint

AS

BEGIN

   SET NOCOUNT ON;

   SELECT x.SomewhatLargeString

   FROM (

       SELECT A.SomewhatLargeString, ROW_NUMBER() OVER(ORDER BY A.SomewhatLargeString) AS TheCount

       FROM #BigTableOfStrings A

       ) x

   WHERE x.TheCount > ((@PAGE_NUM - 1) * @PAGE_SIZE) AND x.TheCount <= (@PAGE_NUM * @PAGE_SIZE)

   ORDER BY x.TheCount

END

Without the use of dynamic SQL, could the sproc be tweaked to improve performance?  Thanks.

July 14, 2010 11:48 AM
 

Adam Machanic said:

Hi Daryl2000,

There's no reason you shouldn't be able to use TOP + ORDER BY in a derived table. That's been allowed for several versions (at least since 7.0). Not sure what's going on there. However, in the time between when I originally wrote the post and now I've played a lot with this technique and have seen no differences between performance with and without the TOP + ORDER BY -- so it's not needed in any case.

Your query looks good to me as-is. I don't think there is any improvement to be had.

July 14, 2010 4:14 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement