<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 't-sql' and 'Query Tuning'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=t-sql,Query+Tuning&amp;orTags=0</link><description>Search results matching tags 't-sql' and 'Query Tuning'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Who's On First? Solving the Top per Group Problem (Part 1: Technique)</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2008/02/08/who-s-on-first-solving-the-top-per-group-problem-part-1-technique.aspx</link><pubDate>Fri, 08 Feb 2008 23:09:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:4992</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;Relative comparison is a simple matter of human nature. From early childhood we compare and contrast what we see in the world around us, building a means by which to rate what we experience. And as it turns out, this desire to discover top and bottom, rightmost and leftmost, or best and worst happens to extend quite naturally into business scenarios. Which product is the top seller? How about the one that's simply not moving off the shelves? Which of our customers has placed the most expensive order? What are the most recent orders placed at each of our outlets?&lt;/p&gt;&lt;p&gt;In the world of common business questions, the edge cases are generally of most interest. What's in the middle is unimportant; it's often too difficult for the mind to compare and comprehend when there are hundreds, thousands, or even millions of items, transactions, or facts that are all within a similar range. Instead, we focus on those that stick out in some extraordinary way.&lt;/p&gt;&lt;p&gt;Those of us who work with SQL products on a regular basis are faced with solving this same problem time and again as we work through various business requirements. Over time, I have noticed four basic query patterns that can be used to solve the problem; each are logically equivalent (within certain restrictions -- more on that later), but can have surprisingly different performance characteristics depending on the data being queried. In this first post, I will outline the available patterns/methods. In the following posts, I will show the results of testing each pattern against a variety of scenarios in an attempt to discover where and when each should be used.&lt;/p&gt;&lt;p&gt;The four basic patterns are outlined below. Each of the methods is illustrated using a query to show all customers' names, plus their most recent order date, and the amount of that order. I've included notes that indicate where logic differences can arise among the various methods.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 1: Join to full group and use correlated subquery with a MIN/MAX aggregate to filter&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method we use an inner join to get all required columns, then filter the resultant set using a correlated subquery in the WHERE clause. &lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;JOIN Orders o ON o.CustomerId = c.CustomerId&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate&amp;nbsp; =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT MAX(o1.OrderDate)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o1&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.CustomerId = o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: With this method ties are automatically included in the output, unless a tiebreaker is specified (which can be tricky given that you only have one column to work with). This method does not allow you to pull back an arbitrary number of rows, such as top 10 per customer; you are limited to the edge and any ties that might exist. &lt;b&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 1a: Join to full group and use correlated subquery with TOP(n) and ORDER BY to filter&lt;/b&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;This method is almost identical to Method 1 (which is why it is classified here as 1a), but the TOP and ORDER BY allow for a bit more flexibility than the aggregates.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;JOIN Orders o ON o.CustomerId = c.CustomerId&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate&amp;nbsp; =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP(1)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.OrderDate&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o1&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.CustomerId = o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.OrderDate DESC&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;/blockquote&gt;&lt;p&gt;Logic notes: With this method you can more easily integrate a tiebreaker than with Method 1; the comparison column can be anything, including a primary key, and you can still order on whatever column makes most sense. In addition, you can take more rows than with Method 1 by using IN instead of = in the WHERE clause, and increasing the argument value to TOP.&lt;br&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 2: CROSS APPLY to ordered TOP(n)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method, SQL Server 2005's CROSS APPLY operator is used. This operator allows us to essentially create a table-valued correlated subquery -- something that impossible in previous versions of SQL Server. By using TOP in conjunction with ORDER BY we can get as many rows per group as needed.&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;CROSS APPLY&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP(1)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId = c.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate DESC&lt;br&gt;) x&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: This method is almost identical, from a logic point of view, with Method 1a modified to use IN on a primary key column. With both methods WITH TIES can be added to the TOP in order to get ties.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 3: Join to derived table that uses a partitioned, ordered windowing function, and filter in the outer query based on the row number&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method a derived table or CTE is used, in conjunction with a windowing function partitioned based on the required grain of the final query. So for the "most recent order per customer" query, the row number is partitioned based on the customer. This gives us a count starting at 1 for each customer, which can be filtered in the outer query.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;INNER JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY o.OrderDate DESC&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS r&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;) x ON&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.CustomerId = c.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; AND x.r = 1&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: If ties are important, use DENSE_RANK instead of ROW_NUMBER. ROW_NUMBER is good for arbitrary TOP(n), similar to Method 2. Unlike the previously described methods, in conjunction with DENSE_RANK this method can return an arbitrary TOP(n) rows, all of which can include ties. So if you would like to see the three most recent order dates and each happens to have multiple orders, this method will be able to return them all by simply filtering on x.r = 3. This would not be directly possible with any of the other methods described here.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 4: "Carry-along sort"&lt;/b&gt;&lt;/p&gt;&lt;p&gt;This is the only "tricky" method, and not one that I recommend using, except as a last resort. I'm including it here only for completeness and comparison because it happens to be a very high performance method in some cases. This method involves converting each of the required inner columns into a string, concatenating them, then applying an aggregate to the string as a whole. By putting the "sort" column first, the other data is "carried along" -- thus the name for the method. The concatenated data is then "unpacked" in the outer query. This can be surprisingly efficient from an I/O standpoint, but the resultant code is a maintenance nightmare and it is quite easy to introduce errors. In addition, this method can only return the top 1 per group -- no ties or multiple return items are supported.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(DATETIME, SUBSTRING(x.OrderInfo, 1, 8)) AS OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(MONEY, SUBSTRING(x.OrderInfo, 9, 15)) AS OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;INNER JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAX&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(CHAR(8), OrderDate, 112) +&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(CHAR(15), SubTotal)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) OrderInfo&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; GROUP BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId&lt;br&gt;) x ON&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.CustomerId = c.CustomerId&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;This post is just the beginning; watch this space in the coming days for a series of performance tests and analysis of these methods, and some results that I personally found to be quite surprising.&lt;br&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>Medians, ROW_NUMBERs, and performance</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2006/12/18/medians-row-numbers-and-performance.aspx</link><pubDate>Mon, 18 Dec 2006 19:52:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:437</guid><dc:creator>Adam Machanic</dc:creator><description>A couple of days ago, Aaron Bertrand posted about &lt;a href="http://sqlblog.com/blogs/aaron_bertrand/archive/2006/12/15/428.aspx"&gt;a method for calculating medians in SQL Server 2005&lt;/a&gt; using the ROW_NUMBER function in conjunction with the COUNT aggregate. This method (credited to Itzik Ben-Gan) is interesting, but I discovered an even better way to attack the problem in &lt;a href="http://www.amazon.com/Celkos-Analytics-Kaufmann-Management-Systems/dp/0123695120/sr=8-1/qid=1166482464/ref=sr_1_1/105-6595410-7450029?ie=UTF8&amp;amp;s=books"&gt;Joe Celko's Analytics and OLAP in SQL&lt;/a&gt;.&lt;br&gt;&lt;br&gt;Rather than using a COUNT aggregate in conjunction with the ROW_NUMBER function, Celko's method uses ROW_NUMBER twice: Once with an ascending sort, and again with a descending sort. The output rows can then be matched based on the ascending row number being within +/- 1 of the descending row number.&amp;nbsp; This becomes clearer with a couple of small examples:&lt;br&gt;&lt;br&gt;

&lt;table class="MsoTableGrid" style="border:medium none;border-collapse:collapse;" cellpadding="0" cellspacing="0"&gt;
 &lt;tr&gt;
  &lt;td style="border:1pt solid windowtext;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;A&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;B&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;C&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;D&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
&lt;/table&gt;

&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;

&lt;table class="MsoTableGrid" style="border:medium none;border-collapse:collapse;" cellpadding="0" cellspacing="0"&gt;
 &lt;tr&gt;
  &lt;td style="border:1pt solid windowtext;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;A&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;5&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;B&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;C&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;D&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr style="height:14.35pt;"&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;E&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;5&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
&lt;/table&gt;

&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;br&gt;&lt;/o:p&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;o:p&gt;In the first table (even number of rows), the median rows are B and C. These can be matched based on [Ascending Column] IN ([Descending Column] + 1, [Descending Column] - 1). In the second table (odd number of rows), the median row is C, which is matched where [Ascending Column] = [Descending Column]. Note that in the second table, the match criteria &lt;/o:p&gt;for the first table does not apply -- so the generic expression to match either case is the combination of the two:&amp;nbsp; [Ascending Column] IN ([Descending Column], [Descending Column] + 1, [Descending Column] - 1).&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;We can apply this logic within the AdventureWorks database to find the median of the "TotalDue" amount in the Sales.SalesOrderHeader table, for each customer:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;div class="code"&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp; AVG(TotalDue)&lt;br&gt;FROM&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TotalDue,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc&lt;br&gt;&amp;nbsp;&amp;nbsp; FROM Sales.SalesOrderHeader SOH&lt;br&gt;) x&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp; RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)&lt;br&gt;GROUP BY CustomerId&lt;br&gt;ORDER BY CustomerId;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;The equivalent logic using Itzik Ben-Gan's method follows:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;div class="code"&gt;&lt;br&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp; AVG(TotalDue)&lt;br&gt;FROM&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TotalDue,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue) AS RowNum,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*) OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId) AS RowCnt&lt;br&gt;&amp;nbsp;&amp;nbsp; FROM Sales.SalesOrderHeader&lt;br&gt;) x&lt;br&gt;WHERE&lt;br&gt;&amp;nbsp;&amp;nbsp; RowNum IN ((RowCnt + 1) / 2, (RowCnt + 2) / 2)&lt;br&gt;GROUP BY CustomerId&lt;br&gt;ORDER BY CustomerId;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Taking a look at the estimated execution plans for these two queries, we might believe that Ben-Gan's method is superior: Celko's algorithm requires an expensive intermediate sort operation and has an estimated cost of 4.96, compared to 3.96 for Ben-Gan's. &lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Remember that these are merely estimates. And as it turns out, this is one of those times that the Query Optimizer's cost estimates are are totally out of line with the reality of what
happens when you actually run the queries. Although the performance
difference is not especially noticeable on a set of data as small as
that in Sales.SalesOrderHeader, check out the STATISTICS IO output. Celko's version does 703 logical reads; Ben-Gan's does an astonishing 140110!&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;There is a good lesson to be learned from this: &lt;i&gt;Cost-based optimization is far from perfect!&lt;/i&gt; Never completely trust what estimates tell you; they've come a long way, but clearly there is still some work to do in this area. The only way to actually determine that one query is better than another is to run it against a realistic set of data and look at how much IO and CPU time is actually used.&lt;br&gt;&lt;/p&gt;&lt;br&gt;In this case, Ben-Gan's query probably should perform better than Celko's. It seems odd that the Query Processor can't collect the row counts at the same time it processes the row numbers. Regardless, as of today this is the best way to solve this problem... Not that I've ever needed a median in any production application I've worked on. But I suppose that's beside the point!&lt;br&gt;</description></item><item><title>Scalar functions, inlining, and performance: An entertaining title for a boring post</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2006/08/03/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx</link><pubDate>Fri, 04 Aug 2006 03:07:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:146</guid><dc:creator>Adam Machanic</dc:creator><description>Scalar.&amp;nbsp; Function.&lt;br&gt;&lt;br&gt;Wow.&lt;br&gt;&lt;br&gt;Could any other combination of words evoke the same feeling of encapsulation, information hiding, and simplification of client code?&amp;nbsp; After years spent developing software in the procedural and OO worlds, it can be difficult--perhaps, even impossible--to migrate over to working with SQL Server and not consider how to architect your data access logic using some of the same techniques you'd use in the application tier.&lt;br&gt;&lt;br&gt;In short: Why would you &lt;i&gt;ever&lt;/i&gt; write the same piece of logic more than once?&amp;nbsp; Answer: &lt;i&gt;You wouldn't (damn it!)&lt;/i&gt;.&amp;nbsp; And so Microsoft bestowed upon the SQL Server community, in SQL Server 2000, the ability to write scalar user-defined functions.&amp;nbsp; And they could have been such beautiful things...&lt;br&gt;&lt;br&gt;But alas, reality can be painful, and as developers tried these new tools they were struck with a strange feeling of sadness as their applications buckled under the weight of what otherwise would have been a wonderful idea. As it turned out, putting all but the simplest of logic into these scalar functions was a recipe for disaster. Why?&amp;nbsp; Because they're essentially cursors waiting to happen (but they don't &lt;i&gt;look &lt;/i&gt;like cursors, so you may not know... until it's too late.)&lt;br&gt;&lt;br&gt;The central problem is that when you wrap logic in a multistatement UDF, the query optimizer just can't unwrap it too easily. And so there's really only one way to evaluate a scalar UDF: call it once per row. And that is really nothing more than a cursor.&lt;br&gt;&lt;br&gt;Seeing this behavior in action is easy enough; consider the following scalar function that some poor sap DBA working for AdventureWorks might be compelled to create:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;CREATE FUNCTION GetMaxProductQty_Scalar&lt;br&gt;(&lt;br&gt;    @ProductId INT&lt;br&gt;)&lt;br&gt;RETURNS INT&lt;br&gt;AS&lt;br&gt;BEGIN&lt;br&gt;    DECLARE @maxQty INT&lt;br&gt;&lt;br&gt;    SELECT @maxQty = MAX(sod.OrderQty)&lt;br&gt;    FROM Sales.SalesOrderDetail sod&lt;br&gt;    WHERE sod.ProductId = @ProductId&lt;br&gt;&lt;br&gt;    RETURN (@maxQty)&lt;br&gt;END&lt;br&gt;&lt;/pre&gt;
&lt;br&gt;Simple enough, right?&amp;nbsp; Let's pretend that AdventureWorks has a bunch of reports, each of which requires maximum quantity sold per product.&amp;nbsp; So the DBA, thinking he can save himself some time and keep everything centralized (and that is a good idea), puts all of the logic into a scalar UDF.&amp;nbsp; Now, when he needs this logic, he can just call the UDF.&amp;nbsp; And if the logic has a bug, or needs to be changed, he can change it in exactly &lt;i&gt;one&lt;/i&gt; place.&amp;nbsp; And so life is great... Or is it?&lt;br&gt;&lt;br&gt;Let's take a look at a sample query:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;SELECT&lt;br&gt;    ProductId,&lt;br&gt;    dbo.GetMaxProductQty_Scalar(ProductId)&lt;br&gt;FROM Production.Product&lt;br&gt;ORDER BY ProductId&lt;br&gt;&lt;/pre&gt;
&lt;br&gt;This query does nothing more than get the max quantity sold for each product in the Productin.Product table. And a look at the execution plan or the STATISTICS IO output might indicate that there's nothing too interesting going on here: The execution plan shows an index scan (to be expected, with no WHERE clause), followed by a compute scalar operation (the call to the UDF). And STATISTICS IO shows a mere 16 reads.&lt;br&gt;&lt;br&gt;So why is this query so problematic? Because the real issue is hiding just beneath the surface.&amp;nbsp;&lt;i&gt; The execution plan and STATISTICS IO didn't consider any of the code evaluated within the UDF!&lt;/i&gt; To see what's &lt;i&gt;really&lt;/i&gt; going on, fire up SQL Server Profiler, turn on the SQL:BatchCompleted event, and make sure you're showing the Reads column. Now run the query again and you'll see that this seemingly-innocent block of T-SQL is, in fact, using 365,247 logical reads. Quite a difference!&lt;br&gt;&lt;br&gt;Each of those "compute scalar" operations is really a call to the UDF, and each of the calls to the UDF is really a new query.&amp;nbsp; And all of those queries (all 504 of them -- the number of products in the Product table) add up to massive I/O.&amp;nbsp; Clearly not a good idea in a production environment.&lt;br&gt;&lt;br&gt;But luckily, we're not done here yet (or this would be a very boring post). Because while the performance penalty is a major turnoff, I really do love the encapsulation afforded by scalar UDFs.&amp;nbsp; I want them (or a similar tool) in my toolbox... And so I got to thinking.&lt;br&gt;&lt;br&gt;The answer to my dilemma, as it turns out, is to not use scalar UDFs at all, but rather to use &lt;i&gt;inline table-valued&lt;/i&gt; UDFs and treat them like scalars. This means that queries get slightly more complex than with scalar UDFs, but because the funtions are inlined (treated like macros) they're optimized along with the rest of the query. Which means, no more under-the-cover cursors.&lt;br&gt;&lt;br&gt;Following is a modified version of the scalar UDF posted above:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;CREATE FUNCTION GetMaxProductQty_Inline&lt;br&gt;(&lt;br&gt;    @ProductId INT&lt;br&gt;)&lt;br&gt;RETURNS TABLE&lt;br&gt;AS&lt;br&gt;    RETURN&lt;br&gt;    (&lt;br&gt;        SELECT MAX(sod.OrderQty) AS maxqty&lt;br&gt;        FROM Sales.SalesOrderDetail sod&lt;br&gt;        WHERE sod.ProductId = @ProductId&lt;br&gt;    )&lt;br&gt;&lt;/pre&gt;&lt;br&gt;This function is no longer actually scalar--in fact, it now returns a table. It just so happens that the table has exactly one column and exactly one row, and uses the same logic as the scalar UDF shown above. So it's still scalar enough for my purposes.&lt;br&gt;&lt;br&gt;The query shown above, used to retrieve the maximum quantity sold for each product, will not quite work with this UDF as-is. Trying to substitute in the new UDF will result in nothing more than a variant on an "object not found" error.&amp;nbsp; Instead, you need actually treat this function like&amp;nbsp; it returns a table (due to the fact that it does).&amp;nbsp; And that means, in this case, a &lt;i&gt;subquery&lt;/i&gt;:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;SELECT&lt;br&gt;    ProductId,&lt;br&gt;    (&lt;br&gt;        SELECT MaxQty&lt;br&gt;        FROM dbo.GetMaxProductQty_Inline(ProductId)&lt;br&gt;    ) MaxQty&lt;br&gt;FROM Production.Product&lt;br&gt;ORDER BY ProductId&lt;br&gt;&lt;/pre&gt;&lt;br&gt;So there it is. We're now treating the table-valued UDF more or less just like a scalar UDF.&amp;nbsp; And the difference in I/O results is really quite astounding: 1267 logical reads in this case. Meaning that the scalar UDF solution is around 288 times more I/O intensive!&lt;br&gt;&lt;br&gt;The question being, of course, was it worth it? The whole thing could have been written as one query, without the need for any UDFs at all. And the final query in this case is quite a bit more complex than the previous version, in addition to the fact that the encapsulation breaks down to some degree by forcing the caller to have some knowledge of how the UDF actually works. But I do feel that this sacrifice is warranted in some cases. Although the "greatest quantity sold" example shown here is simplistic, imagine other situations in which the same code fragments or logic are used over and over, due to lack of a good way of standardizing and centralizing them.&amp;nbsp; I know I've seen that a lot in my work, and some examples I can think of have included complex logic that might very well have been easier to maintain in a UDF.&lt;br&gt;&lt;br&gt;This technique may not be perfect for every case, and it certainly has its tradeoffs. But it may be a useful trick to keep in the back of your mind for a rainy day in the data center when someone's scalar UDF solution starts breaking down and you need a fix that doesn't require a massive code rewrite.&lt;br&gt;&lt;br&gt;</description></item><item><title>Paging in SQL Server 2005</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/paging-in-sql-server-2005.aspx</link><pubDate>Thu, 13 Jul 2006 01:12:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:75</guid><dc:creator>Adam Machanic</dc:creator><description>I keep seeing questions on newsgroups about paging in stored
procedures, and whether there will be a better way in SQL Server 2005.
However, aside from a few answers in newsgroups, I haven't seen any
content on how to do it. So I'd like to spend a few minutes and share
with you the new features that will make paging stored procedures both
easier to build and a lot more performant...
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;
But first, since this is a performance-related blog, let's generate a bunch of big test data!
&lt;/p&gt;&lt;p&gt;Since I don't have AdventureWorks installed on my test SQL
Server 2005 server at the moment and am too lazy to track it down, the
data is somewhat ugly... Anyway, start this up and let it run for a
while (takes around 35 minutes on my test box):
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT A.Name + B.Name + C.Name AS SomewhatLargeString&lt;br&gt;INTO #BigTableOfStrings&lt;br&gt;FROM	master..spt_values A,&lt;br&gt;	master..spt_values B,&lt;br&gt;	master..spt_values C&lt;br&gt;WHERE	a.TYPE NOT IN ('P', 'R', 'F', 'F_U')&lt;br&gt;	AND b.TYPE NOT IN  ('P', 'R', 'F', 'F_U')&lt;br&gt;&lt;br&gt;CREATE CLUSTERED INDEX CI_LargeString ON #BigTableOfStrings(SomewhatLargeString)&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Coffee (or preferably, beer) time!  See you in 35 minutes.
&lt;/p&gt;&lt;p&gt;
... All set?
&lt;/p&gt;&lt;p&gt;Okay, now let's pretend we're in a web app with a SQL Server
2000 backend and we want the second page of data... Rows 11-20, ordered
by SomewhatLarge. How might we do this in SQL Server 2000..? Here's one
way:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT A.SomewhatLargeString&lt;br&gt;FROM #BigTableOfStrings A&lt;br&gt;JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;GROUP BY A.SomewhatLargeString&lt;br&gt;HAVING COUNT(*) BETWEEN 11 AND 20&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;... Still waiting for that to return? Keep waiting. Maybe get
another coffee, or go home for the night. On my system, that query has
quite possibly the biggest estimated cost I've ever seen, a staggering
1,523,110,700. Yeah, that sucks. So let's change it around a bit:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20 A.SomewhatLargeString, COUNT(*) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;	GROUP BY A.SomewhatLargeString&lt;br&gt;	ORDER BY A.SomewhatLargeString ) x&lt;br&gt;WHERE x.TheCount BETWEEN 11 AND 20&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Much nicer than before, with an estimated cost on my system of 23.1,
and a virtually instant return time. But wait, we have an over-zealous
user who wants rows 20,001 - 20,010!
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20010 A.SomewhatLargeString, COUNT(*) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;	GROUP BY A.SomewhatLargeString&lt;br&gt;	ORDER BY A.SomewhatLargeString ) x&lt;br&gt;WHERE x.TheCount BETWEEN 20001 AND 20010&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Oops, cost skyrocketed to 50516, with a return time of 1:30... Can
you feel your users abandoning your sinking ship of an app and heading
towards the competitors? Probably not, since they won't actually click
through 20,000 rows, but it makes a really good contrived example, so
let's roll with it!
&lt;/p&gt;&lt;p&gt;So how to solve this problem? In SQL Server 2000, the answer
is, find some other paging mechanism, probably using a middle tier. But
in SQL Server 2005, we have new and better toys to play with. Allow me
to introduce your new paging best friend, the ROW_NUMBER() function.
For those readers who are slow on the uptake, this function does
exactly what its name implies; it creates a surrogate row number for
each row in a result set. So now, instead of the very painfully
inefficient COUNT(*) methods, we can let SQL Server do all the work as
it builds the result set we actually want...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20010 A.SomewhatLargeString, ROW_NUMBER() OVER(ORDER BY A.SomewhatLargeString) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	ORDER BY A.SomewhatLargeString) x&lt;br&gt;WHERE x.TheCount BETWEEN 20000 AND 20010&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
I guess we can call that a tiny improvement.  Total estimated cost: 0.202.  That's only about a &lt;i&gt;25 MILLION PERCENT&lt;/i&gt; difference.
&lt;/p&gt;&lt;p&gt;So why is ROW_NUMBER so much more efficient? It's a combination
of the COUNT(*) method itself being inefficient and the query optimizer
probably not handling it as well as it should. The COUNT(*) method
requires an ordered-forward clustered index scan of the table, followed
by correlation of each of the top 20010 rows that we asked for to all
of the rows less than or equal to that row in the same table, for the
count. Alas, the optimizer chooses a nested loop for that, which causes
the cost to shoot up as the operation is repeated over and over.
&lt;/p&gt;&lt;p&gt;The ROW_NUMBER method, on the other hand, requires only the
scan of the TOP 20010 rows, followed by computation of the row number
itself, which is nothing more than a scalar calculation. Then, just
filter the rows. Simple, easy on the optimizer, easy for the system,
and good for your customers. Definitely a great feature that I'm
looking forward to using!&lt;/p&gt;&lt;br&gt;</description></item></channel></rss>