<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'sql' and 'indexing'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=sql,indexing&amp;orTags=0</link><description>Search results matching tags 'sql' and 'indexing'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>The SQL Query Optimizer – when Logical Order can get it wrong</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/30/the-sql-query-optimizer-when-logical-order-can-get-it-wrong.aspx</link><pubDate>Sun, 30 Dec 2012 05:32:25 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46882</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;It’s very easy to get in the habit of imagining the way that a query should work based on the Logical Order of query processing – the idea that the FROM clause gets evaluated first, followed by the WHERE clause, GROUP BY, and so on – finally ending with whatever is in the SELECT clause. We even get in the habit of creating indexes that focus on the WHERE clause, and this is mostly right. &lt;/p&gt;  &lt;p&gt;But it’s only &lt;strong&gt;mostly&lt;/strong&gt; right, and it will often depend on statistics.&lt;/p&gt;  &lt;p&gt;There are other situations where statistics have to play a major part in choosing the right plan, of course. In fact, almost every query you ever run will use statistics to work out the best plan. What I’m going to show you in this post is an example of how the statistics end up being incredibly vital in choosing the right plan. It also helps demonstrate an important feature of the way that Scans work, and how to read execution plans.&lt;/p&gt;  &lt;p&gt;I’m going to use AdventureWorks2012 for this example. I’m going to ask for the cheapest product according to the first letter of the product name. This kind of query:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT MIN(ListPrice)       &lt;br /&gt;FROM Production.Product        &lt;br /&gt;WHERE Name LIKE 'H%';&lt;/font&gt;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Don’t run it yet. I want to ask you how you’d solve it on paper.&lt;/p&gt;  &lt;p&gt;Would you prefer I give you a list of the products sorted by name, or would you prefer I give you that list sorted by price?&lt;/p&gt;  &lt;p&gt;If you want the ‘sorted by name’ option, then you’ll have to look through all the products that start with H, and work out which is the cheapest (notice that my predicate is not an equality predicate – if I knew what the name had to be exactly, then I could have an index which ordered by name and then price, and very quickly find the cheapest with that name). This approach could be good if you don’t have many products starting with that particular letter. But if you have lots, then finding them all and then looking for the cheapest of them could feel like too much work. Funnily enough, this is the way that most people would imagine this query being run – applying the WHERE clause first, and applying the aggregate function after that.&lt;/p&gt;  &lt;p&gt;On the other hand, if you have lots of products with that particular letter, you might be better off with your list sorted by price, looking through for the first product that starts with the right letter. &lt;/p&gt;  &lt;p&gt;Let me explain this algorithm a little more.&lt;/p&gt;  &lt;p&gt;If you’re at a restaurant and are strapped for cash, you might want to see what the cheapest thing is. You’d pick the “sorted by price” menu, and go to the first item. But then if you saw it had peanut in, and you have an allergy, then you’d skip it and go to the next one. You wouldn’t expect to have to look far to find one that doesn’t have peanut, and because you’ve got the “sorted by price” menu, you have the cheapest one that satisfies your condition after looking through just a few records.&lt;/p&gt;  &lt;p&gt;It’s clearly not the same algorithm as finding all the things that satisfy the condition first, but it’s just as valid. If you’re only going to have to look through a handful of products before you find one that starts with the right letter, then great! But what if there are none? You’d end up having to look through the whole list before you realised.&lt;/p&gt;  &lt;p&gt;The Query Optimizer faces the same dilemma, but luckily it might have statistics, so it should be able to know which will suit better.&lt;/p&gt;  &lt;p&gt;Let’s create the two indexes – one sorted by Name, one sorted by Price. Both will include the other column, so that the query will only need one of them.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE INDEX ixNamePrice ON Production.Product (Name) INCLUDE (ListPrice);       &lt;br /&gt;CREATE INDEX ixPriceName ON Production.Product (ListPrice) INCLUDE (Name);&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now let’s consider two queries. Both queries give the same result – $0.00. But that’s not important, I’m only interested in how they run.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT MIN(ListPrice)       &lt;br /&gt;FROM Production.Product        &lt;br /&gt;WHERE Name LIKE 'I%';&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Consolas"&gt;SELECT MIN(ListPrice)       &lt;br /&gt;FROM Production.Product        &lt;br /&gt;WHERE Name LIKE 'H%';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The two queries are almost identical, but they run quite differently.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_78098898.png" width="575" height="273" /&gt;&lt;/p&gt;  &lt;p&gt;Ok, they’re fairly similar – they both use a Stream Aggregate operator, for example. And they have similar cost. But significantly, one is performing a Seek, while the other is doing a Scan. Different indexes, but nevertheless a Scan and a Seek.&lt;/p&gt;  &lt;p&gt;People will tell you that Scans are bad and Seeks are good, but it’s not necessarily the case. Here, we see that the Scan plan is no more expensive than the Seek plan – it’s just different. We should consider why.&lt;/p&gt;  &lt;p&gt;Those two indexes are the two different stories that I described earlier. There are very few products that start with the letter ‘I’, and quite a number than start with ‘H’, and so the Query Optimizer has chosen differently.&lt;/p&gt;    &lt;p&gt;There are exactly 10 products that start with I. From a total of 504. That’s less than 2% of the products.&lt;/p&gt;  &lt;p&gt;There are 91 products that start with H. That’s 18%. You might not have expected it to be that high, but that’s okay – if SQL has been maintaining statistics for you on this, it hopefully won’t be as surprised as you.&lt;/p&gt;  &lt;p&gt;18% – nearly 1 in 5. So by the time you’ve looked at, oh, a dozen records, you will have almost certainly found one that starts with an H. (Actually, the chance of NOT finding one in the first 12 would be power(.82, 12), which is 0.09. That’s just 9%.) If I do a bit of digging into the internals, I can discover that the pages in my index typically have over a hundred records on them each. The chance of not finding a product that starts with an H on that first page – you’d need lottery-scale luck (1 in 444 million).&lt;/p&gt;  &lt;p&gt;On the other hand, the cost of finding the cheapest value from 91 records is a lot more expensive than finding the cheapest from just 10. And getting all 10 records should be a small number of reads too.&lt;/p&gt;  &lt;p&gt;But a Scan! Really? It has to look through the whole table, right?&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;No. That’s not how it works.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;You see, execution plans go from left to right. If you start reading these plans from the right, you’ll start thinking that the whole index has been scanned, when it’s simply not the case. That Top operator asks for a single row from the index, and that’s all it provides. Once that row has been found, the Scan operation stops.&lt;/p&gt;  &lt;p&gt;For this information, I don’t even need to pull up the Properties window for the Scan (but I would recommend you get in the habit of doing that). No – this is all available in the Tool Tip. Look at the number of “Actual number of rows” – it’s just one.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_743BBDFB.png" width="345" height="559" /&gt;&lt;/p&gt;  &lt;p&gt;A predicate is applied – it looks through the index for rows that start with H – but it’s doing this in Order (see Ordered = True), and it’s stopping after the first row is found. Remember I mentioned that there are actually 91 rows that satisfy the predicate? The Scan doesn’t care – it only needs one and it stops right then.&lt;/p&gt;  &lt;p&gt;You might figure this is because we are using MIN. What if we needed the MAX though? Well, that’s just the same, except that the Direction of the Scan is BACKWARD (you’ll need F4 for that one).&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_63E04642.png" width="291" height="96" /&gt;&lt;/p&gt;  &lt;p&gt;MIN goes forward, because it’s most interested in the ‘smallest’ ones, MAX will go backward because it wants the ‘largest’. (And as you’d probably expect, if you’d created your index to be descending, then it would be reversed.)&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;But again – being able to tell which is the better algorithm depends entirely on your statistics being known&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;I see so many systems have bad statistics for one reason or another, and typically because the data most frequently queried is the newest data, and that makes up such a small percentage of the table. The statistics will think that there is almost no data for ‘today’, as they probably haven’t been updated since at least some number of hours ago.&lt;/p&gt;  &lt;p&gt;When you look at how a query is running, always have a think about you’d solve it on paper, and remember that you might actually have a better (or worse) picture of the statistics than what the Query Optimizer has.&lt;/p&gt;  &lt;p&gt;And remember that a Scan is not necessarily bad. I might do another post on that soon as well.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Joins in single-table queries</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/10/joins-in-single-table-queries.aspx</link><pubDate>Tue, 11 Dec 2012 00:05:17 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46601</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;Tables are only metadata. They don’t store data.&lt;/p&gt;  &lt;p&gt;I’ve written &lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2010/09/14/table-no-such-thing.aspx" target="_blank"&gt;something about this&lt;/a&gt; before, but I want to take a viewpoint of this idea around the topic of joins, especially since it’s the topic for T-SQL Tuesday this month. Hosted this time by &lt;a href="http://sqlity.net/en/1175/t-sql-tuesday-37-invite-to-join-me-in-a-month-of-joins/" target="_blank"&gt;Sebastian Meine&lt;/a&gt; (&lt;a href="http://twitter.com/sqlity" target="_blank"&gt;@sqlity&lt;/a&gt;), who has a whole series on joins this month. Good for him – it’s a great topic. &lt;a href="http://sqlity.net/en/1175/t-sql-tuesday-37-invite-to-join-me-in-a-month-of-joins/" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="TSQL2sDay150x150" border="0" alt="TSQL2sDay150x150" align="right" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_1CEA56EB.jpg" width="170" height="170" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In that last post I discussed the fact that we write queries against tables, but that the engine turns it into a plan against indexes. My point wasn’t simply that a table is actually just a Clustered Index (or heap, which I consider just a special type of index), but that data access always happens against indexes – never tables – and we should be thinking about the indexes (specifically the non-clustered ones) when we write our queries.&lt;/p&gt;  &lt;p&gt;I described the scenario of looking up phone numbers, and how it never really occurs to us that there is a master list of phone numbers, because we think in terms of the useful non-clustered indexes that the phone companies provide us, but anyway – that’s not the point of this post.&lt;/p&gt;  &lt;p&gt;So a table is metadata. It stores information about the names of columns and their data types. Nullability, default values, constraints, triggers – these are all things that define the table, but the data isn’t stored in the table. The data that a table describes is stored in a heap or clustered index, but it goes further than this.&lt;/p&gt;  &lt;p&gt;All the useful data is going to live in non-clustered indexes. Remember this. It’s important. Stop thinking about tables, and start thinking about indexes.&lt;/p&gt;  &lt;p&gt;So let’s think about tables as indexes. This applies even in a world created by &lt;a href="http://www.lmgtfy.com/?q=someone+who's+bad+at+database+design" target="_blank"&gt;someone else&lt;/a&gt;, who doesn’t have the best indexes in mind for you.&lt;/p&gt;  &lt;p&gt;I’m sure you don’t need me to explain Covering Index bit – the fact that if you don’t have sufficient columns “included” in your index, your query plan will either have to do a Lookup, or else it’ll give up using your index and use one that does have everything it needs (even if that means scanning it). If you haven’t seen that before, drop me a line and I’ll run through it with you. Or go and read &lt;a href="http://msmvps.com/blogs/robfarley/archive/2008/10/12/doing-the-maths-to-understand-sql-optimiser-choices.aspx" target="_blank"&gt;a post I did a long while ago&lt;/a&gt; about the maths involved in that decision.&lt;/p&gt;  &lt;p&gt;So – what I’m going to tell you is that a Lookup is a join.&lt;/p&gt;  &lt;p&gt;When I run &lt;font face="Consolas"&gt;SELECT CustomerID FROM Sales.SalesOrderHeader WHERE SalesPersonID = 285; &lt;/font&gt;against the AdventureWorks2012 get the following plan:&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_31AC299E.png" width="470" height="226" /&gt;&lt;/p&gt;  &lt;p&gt;I’m sure you can see the join. Don’t look in the query, it’s not there. But you should be able to see the join in the plan. It’s an Inner Join, implemented by a Nested Loop. It’s pulling data in from the Index Seek, and joining that to the results of a Key Lookup.&lt;/p&gt;  &lt;p&gt;It clearly is – the QO wouldn’t call it that if it wasn’t really one. It behaves exactly like any other Nested Loop (Inner Join) operator, pulling rows from one side and putting a request in from the other. You wouldn’t have a problem accepting it as a join if the query were slightly different, such as &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT sod.OrderQty       &lt;br /&gt;FROM Sales.SalesOrderHeader AS soh        &lt;br /&gt;JOIN Sales.SalesOrderDetail as sod        &lt;br /&gt;on sod.SalesOrderID = soh.SalesOrderID        &lt;br /&gt;WHERE soh.SalesPersonID = 285;&lt;/font&gt;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5CA417A3.png" width="478" height="223" /&gt;&lt;/p&gt;  &lt;p&gt;Amazingly similar, of course. This one is an explicit join, the first example was just as much a join, even thought you didn’t actually ask for one.&lt;/p&gt;  &lt;p&gt;You need to consider this when you’re thinking about your queries.&lt;/p&gt;  &lt;p&gt;But it gets more interesting.&lt;/p&gt;  &lt;p&gt;Consider this query: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT SalesOrderID       &lt;br /&gt;FROM Sales.SalesOrderHeader        &lt;br /&gt;WHERE SalesPersonID = 276        &lt;br /&gt;AND CustomerID = 29522;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;It doesn’t look like there’s a join here either, but look at the plan.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3993A633.png" width="513" height="240" /&gt;    &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;That’s not some Lookup in action – that’s a proper Merge Join. The Query Optimizer has worked out that it can get the data it needs by looking in two separate indexes and then doing a Merge Join on the data that it gets. Both indexes used are ordered by the column that’s indexed (one on SalesPersonID, one on CustomerID), and then by the CIX key SalesOrderID. Just like when you seek in the phone book to Farley, the Farleys you have are ordered by FirstName, these seek operations return the data ordered by the next field. This order is SalesOrderID, even though you didn’t explicitly put that column in the index definition. The result is two datasets that are ordered by SalesOrderID, making them very mergeable. &lt;/p&gt;  &lt;p&gt;Another example is the simple query&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT CustomerID       &lt;br /&gt;FROM Sales.SalesOrderHeader        &lt;br /&gt;WHERE SalesPersonID = 276;&lt;/font&gt;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_0191B250.png" width="514" height="245" /&gt;&lt;/p&gt;  &lt;p&gt;This one prefers a Hash Match to a standard lookup even! This isn’t just ordinary index intersection, this is something else again! Just like before, we could imagine it better with two whole tables, but we shouldn’t try to distinguish between joining two tables and joining two indexes.&lt;/p&gt;  &lt;p&gt;The Query Optimizer can see (using basic maths) that it’s worth doing these particular operations using these two less-than-ideal indexes (because of course, the best indexese would be on both columns – a composite such as (SalesPersonID, CustomerID – and it would have the SalesOrderID column as part of it as the CIX key still).&lt;/p&gt;  &lt;p&gt;You need to think like this too.&lt;/p&gt;  &lt;p&gt;Not in terms of excusing single-column indexes like the ones in AdventureWorks2012, but in terms of having a picture about how you’d like your queries to run. If you start to think about what data you need, where it’s coming from, and how it’s going to be used, then you will almost certainly write better queries. &lt;/p&gt;  &lt;p&gt;…and yes, this would include when you’re dealing with regular joins across multiples, not just against joins within single table queries.&lt;/p&gt;</description></item><item><title>Covering, schmuvvering – when a covering index is actually rubbish</title><link>http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx</link><pubDate>Thu, 19 May 2011 04:11:34 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:35727</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;Take a look at this query plan.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3B3A3400.png" width="342" height="114" /&gt;&lt;/p&gt;  &lt;p&gt;Yes, that arrow indicates a single row. This is an Index Seek, returning a single row. And yet it’s rubbish. That’s right – it’s rubbish!&lt;/p&gt;  &lt;p&gt;In fact, I had to provide a hint for it to use this index. A table scan would’ve been better, and this is what happens without the index hint.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;margin:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4767BFF5.png" width="389" height="121" /&gt;&lt;/p&gt;  &lt;p&gt;Let’s look at the query. I can promise you there’s no bug here.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT Name, ProductNumber        &lt;br /&gt;FROM Production.Product         &lt;br /&gt;WHERE DaysToManufacture &amp;lt; 4         &lt;br /&gt;AND ReorderPoint &amp;lt; 100         &lt;br /&gt;AND Color = 'Red';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;And the covering index is defined as follows:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE INDEX rf_ix_Covering ON Production.Product(DaysToManufacture)        &lt;br /&gt;INCLUDE (Name, ProductNumber, Size, ReorderPoint, Color)         &lt;br /&gt;WITH (FILLFACTOR=30);&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Try it yourself on AdventureWorks, you should see the same.&lt;/p&gt;  &lt;p&gt;Of course, there’s a bit of trickery going on. For example, I purposely spaced out the NCIX using FILLFACTOR=30. But despite that, &lt;strong&gt;most DBAs would tell you that the first plan is better than the second.&lt;/strong&gt; But they’d be &lt;strong&gt;wrong&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;Let’s look at the tooltip of each of those Index operations. First, the Clustered Index Scan.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_16E115B2.png" width="360" height="539" /&gt;&lt;/p&gt;  &lt;p&gt;There’s no real surprise here. We know there’s only one row being returned, I told you that before. There’s a predicate listed which is tested on every row, and the Name and ProductNumber are the two fields which are being outputted. Estimated Subtree Cost of 0.0127253.&lt;/p&gt;  &lt;p&gt;How about that Seek then, which the Query Optimizer only chose when its hand was forced by a hint?&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_59240828.png" width="340" height="583" /&gt;&lt;/p&gt;  &lt;p&gt;You’ll probably notice immediately that the Estimated Subtree Cost is higher! 0.0141 instead of 0.0127. The reason why is in the Predicate and Seek Predicate sections. Sure, it can do a Seek – it can seek on the first key column of the index, on DaysToManufacture. But having found those records, the data is still very much in DaysToManufacture order. We can’t search on those predicates very easily at all, and we’re left with them being handled as a Residual Predicate.&lt;/p&gt;  &lt;p&gt;To use a phone-book example, this is like searching for people called “Rob F.” – I can easily find people whose last name starts with F, but would then have to go through all of them looking for people called Rob. That residual check could be painful. Actually, this particular example is more like saying “Find me anyone called Rob whose surname is &amp;lt; Z”, as the Seek Predicate involved returns most of the table.&lt;/p&gt;  &lt;p&gt;Unfortunately, the query plan doesn’t show us how many rows go through the Residual Predicate check. If it did, we’d be able to see that it’s over 400 rows (in a table of 504 rows), and we might rethink our indexing strategy.&lt;/p&gt;  &lt;p&gt;In fact, I can easily make an index which causes the same query to perform a full Index Scan, plus Lookup, and which is half the cost of either of the plans we’ve seen so far.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE INDEX rf_ix_NotCovering ON Production.Product(DaysToManufacture)        &lt;br /&gt;WHERE DaysToManufacture &amp;lt; 4         &lt;br /&gt;AND ReorderPoint &amp;lt; 100         &lt;br /&gt;AND Color = 'Red';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_70226397.png" width="548" height="212" /&gt;     &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;Of course, now I’m really cheating, by using a Filtered Index which would only contain a single row. And I could’ve eliminated the Lookup by including the other columns, halving the cost yet again.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE INDEX rf_ix_Covering2 ON Production.Product(DaysToManufacture)        &lt;br /&gt;INCLUDE (Name, ProductNumber)         &lt;br /&gt;WHERE DaysToManufacture &amp;lt; 4         &lt;br /&gt;AND ReorderPoint &amp;lt; 100         &lt;br /&gt;AND Color = 'Red';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;margin:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5E8252FF.png" width="351" height="122" /&gt;&lt;/p&gt;  &lt;p&gt;Interestingly, the tooltips for both of these Filtered Index operations don’t show the predicates – the Query Optimizer knows that the index has already done the necessary filtering, and in this case, it doesn’t need to apply any further predicates, whether Seek or Residual.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;margin:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_54D9E4C9.png" width="340" height="470" /&gt;&lt;/p&gt;  &lt;p&gt;So… just because an index is &lt;em&gt;covering&lt;/em&gt; a query, doesn't mean it’s necessarily the right choice. The better question is &lt;strong&gt;“Is this index good for this query?”&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;(Incidentally, this is the kind of thing that I’d go through in my pre-conference seminar at SQLPASS if chosen, and in the talk on “Joins, SARGability and the Evils of Residualiciousness” if that one gets picked)&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Function Invertability for SARGability</title><link>http://sqlblog.com/blogs/rob_farley/archive/2011/03/23/function-invertability-for-sargability.aspx</link><pubDate>Thu, 24 Mar 2011 00:42:07 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:34369</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;My good friend &lt;a href="http://sqlblogcasts.com/blogs/simons" target="_blank"&gt;Simon Sabin&lt;/a&gt; used the term ‘invertability’ on a &lt;a href="https://connect.microsoft.com/SQLServer/feedback/details/653206/the-optimizer-should-be-able-to-understand-the-invertability-of-built-in-intrinsics-on-dates-for-the-purposes-of-index-sargability" target="_blank"&gt;Connect item&lt;/a&gt; he logged today.&lt;/p&gt;  &lt;p&gt;Essentially, Simon’s noticed that there are lots of people that use year(someDate), but that the system doesn’t understand that this function doesn’t affect the order of the items in the index. month(someDate) does, but if you’re already using year(someDate), then the combination of the two doesn’t change.&lt;/p&gt;  &lt;p&gt;This is one of the keys to SARGability, which I’ve written about before, like at &lt;a href="http://bit.ly/sargability"&gt;http://bit.ly/sargability&lt;/a&gt;. I’ve also raised a &lt;a href="https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=526431" target="_blank"&gt;Connect item&lt;/a&gt; myself about it.&lt;/p&gt;  &lt;p&gt;However, the term ‘invertability’ is interesting, and ties into the Inverse Predicates concept that I’ve also used before, like at &lt;a title="http://bit.ly/inversepredicates" href="http://bit.ly/inversepredicates"&gt;http://bit.ly/inversepredicates&lt;/a&gt;. The idea is that you might have applied a function to a column, creating a predicate that isn’t sargable, but if you (or the system) can tell how to invert it, then you can make a predicate that can be easily handled by the Query Optimizer. Currently, the system doesn’t understand the invertability of all the functions (even easy ones like the YEAR function), but it’s something which would make SQL a lot faster if it did.&lt;/p&gt;  &lt;p&gt;I’m going to let you read those various posts yourself, and encourage you to vote for Simon’s connect item. But as well as that, I’m going to encourage you to consider the SARGability of the predicates in your query.&lt;/p&gt;  &lt;p&gt;Edit: Simon’s written a &lt;a href="http://sqlblogcasts.com/blogs/simons/archive/2011/03/24/the-optimiser-should-understand-that-year-datecolumn-is-correlated-to-the-datecolumn.aspx" target="_blank"&gt;post&lt;/a&gt; on this now.&lt;/p&gt;</description></item><item><title>Probe Residual when you have a Hash Match – a hidden cost in execution plans</title><link>http://sqlblog.com/blogs/rob_farley/archive/2011/03/21/probe-residual-when-you-have-a-hash-match-a-hidden-cost-in-execution-plans.aspx</link><pubDate>Mon, 21 Mar 2011 22:58:44 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:34299</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;No, this post has nothing to do with airport security, and nothing to do with marijuana.&lt;/p&gt;  &lt;p&gt;Being honest, this post is only half about Hash Matches and Probe Residuals. It’s more about the types of predicates you can see mentioned in a SQL Server Execution Plan (or Query Plan, but I prefer to call them Execution Plans) – but you may well see some described as a Probe Residual when you look at the properties of a Hash Match operator.&lt;/p&gt;  &lt;p&gt;The main point of this post is: &lt;strong&gt;Some of these predicates can be really bad, even if they’re part of things which seem really good&lt;/strong&gt; (like Seeks or Merge Joins).&lt;/p&gt;  &lt;p&gt;Let’s consider a join. Two streams of data, of which matching rows must be found. These matching rows will be the ones that satisfy the join conditions, expressed through predicates listed in the ON clause and/or the WHERE clause. In fact, these predicates might only involve one side of the join, such as &lt;em&gt;OrderDate &amp;lt;= '2011-03-22T00:00:00.000'&lt;/em&gt;. There are plenty of times when a join condition will incorporate a one-sided predicate like this – imagine a scenario in which matching rows can be easily located in an index, but only those that match an additional condition are allowed to be included.&lt;/p&gt;  &lt;p&gt;In a join (ha!) there are these two streams of data, joined by one of a few different operators. This operator could be a Hash Match; it could be a Merge Join. It could even be a Nested Loop, although then the predicates are (generally) handled in the second (lower) data source. In fact, let’s start with that scenario.&lt;/p&gt;  &lt;p&gt;When a Seek is performed, there are two main kinds of predicates that can be included. One is the Seek Predicate, and one is simply listed as Predicate. I prefer to call this second one the Residual Predicate. It’s the leftover one, after the Seek has been performed. This often happens when SARGability isn’t possible. SARGability is about being able to use an index effectively (with a seek), so if you have a predicate which doesn’t allow SARGability (for example, being able to consider the last letter of a character string.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_18B1C804.png" width="450" height="233" /&gt;&amp;#160;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_083604B1.png" width="314" height="199" /&gt;&lt;/p&gt;  &lt;p&gt;You’ll see here that for each Product we find, we look up the Subcategory name for it. We have [s].[ProductSubcategoryID] before the Seek is done (the Nested Loop calls the Seek operator using each one), but although it can quickly seek to the right row(s) involved, it performs an additional check, making sure that the Product Name ends in a 'y'.&lt;/p&gt;  &lt;p&gt;To get this plan, I used the query: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;CREATE INDEX rf_ix_Product_SubCat_inc_Name ON     &lt;br /&gt;Production.Product(ProductSubcategoryID) INCLUDE (Name);&lt;/p&gt;    &lt;p&gt;SELECT s.Name, COUNT(*)     &lt;br /&gt;FROM Production.ProductSubcategory s      &lt;br /&gt;INNER JOIN Production.Product p      &lt;br /&gt;ON s.ProductSubcategoryID = p.ProductSubcategoryID      &lt;br /&gt;WHERE p.Name like '%y'      &lt;br /&gt;GROUP BY s.Name;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The index lets the system immediately search for the rows needed. It can seek to the rows in rf_ix_Product_Subcat_inc that match the SubcategoryID, but this is only half the story. Having applied this Seek Predicate, there’s still the matter of the last letter of the Product Name, which isn’t something that can be checked easily with the index. The values are there, but each row that the Seek finds must be checked individually, with this leftover, or Residual, Predicate.&lt;/p&gt;  &lt;p&gt;Hopefully this helps show why I want to call the Predicate here a Residual Predicate.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;But this Residual Predicate must be tested on every row that is fetched by the Seek. That could well be a lot of rows, if the Seek isn’t particularly selective.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The Seek might feel really nice, and might be returning very few rows. But the effort could be a lot larger than you expect if most of the work is being done in the Residual Predicate.&lt;/p&gt;  &lt;p&gt;The Nested Loop operator pulls rows from the first stream of data, and passes the required values down to the next row of operators, pulling a stream of data which is then simply joined. All the rows that come in from the second stream of data are known to be matches with the row that provided the values, so that there is relatively little work in doing the actual join.&lt;/p&gt;  &lt;p&gt;In a Merge Join or Hash Match, things are slightly different. The predicate checking happens in the actual join operator. However, there are still two types of predicates – the main ones and the residuals.&lt;/p&gt;  &lt;p&gt;In a Merge Join, the two data streams are ordered by columns appropriate for the join, but there could still be a leftover predicate.&lt;/p&gt;  &lt;p&gt;Consider this query (but I’ve dropped the extra index at this point). It’s a very similar query to before, but I’m forcing a Merge Join with a Join Hint, and I’ve thrown in a predicate which is non-SARGable from the perspective of either table. &lt;strong&gt;This is part of the clue to the problem – it’s a non-SARGable predicate.&lt;/strong&gt; Mind you, it could seem perfectly SARGable. It might simply be that the stream of data being used for the Merge Join isn’t ordered by all the columns involved in the join.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;SELECT s.Name, COUNT(*)     &lt;br /&gt;FROM Production.ProductSubcategory s      &lt;br /&gt;INNER MERGE JOIN Production.Product p      &lt;br /&gt;ON s.ProductSubcategoryID = p.ProductSubcategoryID      &lt;br /&gt;WHERE RIGHT(p.Name, LEN(s.Name)) = 'blah'      &lt;br /&gt;GROUP BY s.Name;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;When hovering over the Merge Join operator in this query’s plan, we see this tooltip. You’ll notice that it has a section called “Where (join columns)” which shows that the join is being done on the ProductSubcategoryID.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_77DA8CF7.png" width="332" height="429" /&gt;&lt;/p&gt;  &lt;p&gt;However, the other predicate isn’t mentioned. It’s nowhere in the tooltip. Hitting F4 shows the Properties window, and this is where we find it, in a property called Residual.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Residual: [AdventureWorks].[Production].[Product].[ProductSubcategoryID] as [p].[ProductSubcategoryID]=[AdventureWorks].[Production].[ProductSubcategory].[ProductSubcategoryID] as [s].[ProductSubcategoryID] AND right([AdventureWorks].[Production].[Product].[Name] as [p].[Name],len([AdventureWorks].[Production].[ProductSubcategory].[Name] as [s].[Name]))=N'blah'&lt;/p&gt; &lt;/blockquote&gt;    &lt;p&gt;Interestingly, it re-checks that the ProductSubcategoryID columns match, but the important thing is that it’s here (and only here) that Residual Predicate is tested.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;This Residual Predicate must be tested on every combination of rows that match the ‘Where (join columns)’ predicate. That could well be a lot of rows, if those rows aren’t particularly selective.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;With a Hash Match, the join is done by first applying a Hash function to columns involved in the join, using the resultant Hash Key to populate the data (including all the required columns) into a Hash Table. Once that has been done for one stream of data, the second stream is pulled in, and the Hash function applied to the columns from the second stream. The result of each row from the second stream is used in a Probe of the Hash Table. However, predicates which don’t work nicely with the Hash Key concept (such as the ‘blah’ predicate I used earlier) are considered residual.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;SELECT s.Name, COUNT(*)     &lt;br /&gt;FROM Production.ProductSubcategory s      &lt;br /&gt;INNER JOIN Production.Product p      &lt;br /&gt;ON s.ProductSubcategoryID = p.ProductSubcategoryID      &lt;br /&gt;WHERE RIGHT(p.Name, LEN(s.Name)) = 'blah'      &lt;br /&gt;GROUP BY s.Name;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_4A78F727.png" width="333" height="511" /&gt;&lt;/p&gt;  &lt;p&gt;So when candidate rows are identified via the Probe, the Probe Residual still needs to be tested. Just like the Predicate after the Seek Predicate was done, and the Residual after the Where (join columns) are handled.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;This Probe Residual must be tested on every combination of rows that satisfies the Hash Key Probe. That could well be a lot of rows, if that probe isn’t particularly selective.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;When you’re trying to tune your query, you need to consider how many rows are being matched by each section of the join.&lt;/p&gt;  &lt;p&gt;Imagine with me that you have a Merge Join (or a Hash Match), in which you have a predicate such as p1.ListPrice - p2.Listprice = 0&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;SELECT p1.Product     &lt;br /&gt;FROM Production.Product p1      &lt;br /&gt;JOIN Production.Product p2      &lt;br /&gt;ON p1.ProductSubcategoryID = p2.ProductSubcategoryID      &lt;br /&gt;AND p1.ListPrice - p2.ListPrice = 0&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;(Incidentally, this query could use any of the three joins, depending on indexes and other filters. Put an index on ProductSubcategoryID including ListPrice, and then run the query with either no WHERE clause (Hash Match), a WHERE clause for ProductSubcategoryID &amp;lt; 2 (Nested Loop) or &amp;lt; 5 (Merge Join).)&lt;/p&gt;  &lt;p&gt;The predicate featuring the ListPrice column is always going to be treated as residual. It’s something that can only be tested once both values are known, and is considered a non-SARGable predicate.&amp;#160; Regardless of what type of join is done, the ListPrice predicate is handled as a Residual.&lt;/p&gt;  &lt;p&gt;For this query, the answer is hopefully obvious. Rewriting the predicate as p1.ListPrice = p2.ListPrice will resolve it nicely, but an example you have might not be so straightforward. &lt;/p&gt;  &lt;p&gt;Residual predicates can be expensive, and the bottleneck of your query might not be obvious from looking at the plan. The mere fact that a Residual in a Merge Join is not shown in the tooltip could mean you miss it significantly. Don’t worry – you’ll be in good company. Plenty of proper experts miss this.&lt;/p&gt;  &lt;p&gt;Luckily, the answer is simple. Look at your Seek Predicates, your Where (join columns) and your Hash Keys Probes, and compare this to the Residuals. If the Residuals are needing to be checked for a lot more rows that you’d like, then you have a tuning opportunity you can leverage. Ideally, Residuals only need to be applied on a tiny number of rows.&lt;/p&gt;  &lt;p&gt;Remember, a Residual Predicate will feel like a Scan, because it’s not using the Index nicely. Scanning a tiny table might be fine, but scanning a large one could be horrible.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>The SSIS tuning tip that everyone misses</title><link>http://sqlblog.com/blogs/rob_farley/archive/2011/02/17/the-ssis-tuning-tip-that-everyone-misses.aspx</link><pubDate>Thu, 17 Feb 2011 06:19:09 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:33564</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;I know that &lt;em&gt;everyone&lt;/em&gt; misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this.&lt;/p&gt;  &lt;p&gt;When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database.&lt;/p&gt;  &lt;p&gt;Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do.&lt;/p&gt;  &lt;p&gt;I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;The speed of an SSIS Source is not about how fast your query runs.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination.&lt;/p&gt;  &lt;p&gt;Let’s look at a very simple query as an example, using the AdventureWorks database:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_479078B4.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_3CEF58FA.png" width="554" height="195" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded.&lt;/p&gt;  &lt;p&gt;It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_156866C3.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_59785CA8.png" width="139" height="196" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Before I explain the problem here, let me jump back into the SSIS world...&lt;/p&gt;  &lt;p&gt;If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations.&lt;/p&gt;  &lt;p&gt;Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_31B194E4.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_6E464129.png" width="464" height="273" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation.&lt;/p&gt;  &lt;p&gt;Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan?&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled.&lt;/p&gt;  &lt;p&gt;Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_3312655E.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_03FBEFBB.png" width="554" height="273" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual.&lt;/p&gt;  &lt;p&gt;The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow.&lt;/p&gt;  &lt;p&gt;Luckily, T-SQL gives us a brilliant query hint to help avoid this.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;OPTION (FAST 10000)&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant.&lt;/p&gt;  &lt;p&gt;First let’s consider a simple example, then we’ll look at a larger one.&lt;/p&gt;  &lt;p&gt;Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_009A5813.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_5A9481EF.png" width="596" height="165" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking.&lt;/p&gt;  &lt;p&gt;The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_24FB7430.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_53FDB007.png" width="130" height="222" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want.&lt;/p&gt;  &lt;p&gt;The Query Optimizer seems to do this by optimising the query as if there were only one row coming through:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_27084D2C.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_41F0AAD3.png" width="582" height="196" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does.&lt;/p&gt;  &lt;p&gt;I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set.&lt;/p&gt;  &lt;p&gt;This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that.&lt;/p&gt;  &lt;p&gt;First, without the hint.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_045FFAB2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_54916ABE.png" width="644" height="232" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;...and now with OPTION (FAST 10000):&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_10D96E02.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_4C6C2AC0.png" width="644" height="250" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000.&lt;/p&gt;  &lt;p&gt;The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute.&lt;/p&gt;  &lt;p&gt;But in SSIS-land, things are different.&lt;/p&gt;  &lt;p&gt;The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared.&lt;/p&gt;  &lt;p&gt;The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete.&lt;/p&gt;  &lt;p&gt;The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds.&lt;/p&gt;  &lt;p&gt;When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred.&lt;/p&gt;  &lt;p&gt;By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.&lt;/p&gt;</description></item><item><title>Using SQL Execution Plans to discover the Swedish alphabet</title><link>http://sqlblog.com/blogs/rob_farley/archive/2011/01/03/using-sql-execution-plans-to-discover-the-swedish-alphabet.aspx</link><pubDate>Mon, 03 Jan 2011 05:54:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:32237</guid><dc:creator>rob_farley</dc:creator><description>&lt;P&gt;SQL Server is quite remarkable in a bunch of ways. In this post, I’m using the way that the Query Optimizer handles LIKE to keep it SARGable, the Execution Plans that result, Collations, and PowerShell to come up with the Swedish alphabet.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msmvps.com/blogs/robfarley/archive/2010/01/22/sargable-functions-in-sql-server.aspx" target=_blank&gt;SARGability&lt;/A&gt; is the ability to seek for items in an index according to a particular set of criteria. If you don’t have SARGability in play, you need to scan the whole index (or table if you don’t have an index). For example, I can find myself in the phonebook easily, because it’s sorted by LastName and I can find Farley in there by moving to the Fs, and so on. I can’t find everyone in my suburb easily, because the phonebook isn’t sorted that way. I can’t even find people who have six letters in their last name, because also the book is sorted by LastName, it’s not sorted by LEN(LastName). This is all stuff I’ve looked at before, including in the &lt;A href="http://www.sqlbits.com/Sessions/Event7/Understanding_SARGability_to_make_your_queries_run_faster" target=_blank&gt;talk I gave at SQLBits in October 2010&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;If I try to find everyone who’s names start with F, I can do that using a query a bit like:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;SELECT LastName FROM dbo.PhoneBook WHERE LEFT(LastName,1) = 'F';&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Unfortunately, the Query Optimizer doesn’t realise that all the entries that satisfy LEFT(LastName,1) = 'F' will be together, and it has to scan the whole table to find them.&lt;/P&gt;
&lt;P&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_0BB56CF3.png" width=595 height=143&gt;&lt;/P&gt;
&lt;P&gt;But if I write:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;SELECT LastName FROM dbo.PhoneBook WHERE LastName LIKE 'F%';&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;then SQL is smart enough to understand this, and performs an Index Seek instead.&lt;/P&gt;
&lt;P&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_0F7311C3.png" width=566 height=133&gt;&lt;/P&gt;
&lt;P&gt;To see why, I look further into the plan, in particular, the properties of the Index Seek operator. The ToolTip shows me what I’m after:&lt;/P&gt;
&lt;P&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_14754F72.png" width=302 height=200&gt;&lt;/P&gt;
&lt;P&gt;You’ll see that it does a Seek to find any entries that are at least F, but not yet G. There’s an extra Predicate in there (a Residual Predicate if you like), which checks that each LastName is really LIKE F% – I suppose it doesn’t consider that the Seek Predicate is quite enough – but most of the benefit is seen by its working out the Seek Predicate, filtering to just the “at least F but not yet G” section of the data.&lt;/P&gt;
&lt;P&gt;This got me curious though, particularly about where the G comes from, and whether I could leverage it to create the Swedish alphabet.&lt;/P&gt;
&lt;P&gt;I know that in the Swedish language, there are three extra letters that appear at the end of the alphabet. One of them is ä that appears in the word Västerås. It turns out that Västerås is quite hard to find in an index when you’re looking it up in a Swedish map. I talked about this briefly in my five-minute &lt;A href="http://bit.ly/RFCollation" target=_blank&gt;talk on Collation&lt;/A&gt; from SQLPASS (the one which was slightly less than serious).&lt;/P&gt;
&lt;P&gt;So by looking at the plan, I can work out what the next letter is in the alphabet of the collation used by the column. In other words, if my alphabet were Swedish, I’d be able to tell what the next letter after F is – just in case it’s not G.&lt;/P&gt;
&lt;P&gt;It turns out it is… Yes, the Swedish letter after F is G. But I worked this out by using a copy of my PhoneBook table that used the Finnish_Swedish_CI_AI collation. I couldn’t find how the Query Optimizer calculates the G, and my friend &lt;A href="http://sqlblog.com/blogs/paul_white/" target=_blank&gt;Paul White&lt;/A&gt; (&lt;A href="http://twitter.com/SQL_Kiwi" target=_blank&gt;@SQL_Kiwi&lt;/A&gt;) tells me that it’s frustratingly internal to the QO. He’s particularly smart, even if he is from New Zealand.&lt;/P&gt;
&lt;P&gt;To investigate further, I decided to do some PowerShell, leveraging the &lt;A href="http://sqlblog.com/blogs/rob_farley/archive/2010/12/07/powershell-to-fetch-a-sql-execution-plan.aspx" target=_blank&gt;Get-SqlPlan function&lt;/A&gt; that I blogged about recently (make sure you also have the SqlServerCmdletSnapin100 snap-in added. To do that, run &lt;EM&gt;get-pssnapin -reg&lt;/EM&gt; to see if it's on the machine, and &lt;EM&gt;get-pssnapin -reg | add-pssnapin&lt;/EM&gt; to make the cmdlet available).&lt;/P&gt;
&lt;P&gt;I started by indicating that I was going to use Finnish_Swedish_CI_AI as my collation of choice, and that I’d start whichever letter came straight after the number 9. I figure that this is a cheat’s way of guessing the first letter of the alphabet (but it doesn’t actually work in Unicode – luckily I’m using varchar not nvarchar. Actually, there are a few aspects of this code that only work using ASCII, so apologies if you were wanting to apply it to Greek, Japanese, etc). I also initialised my $alphabet variable.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$collation = 'Finnish_Swedish_CI_AI'; &lt;BR&gt;$firstletter = '9'; &lt;BR&gt;$alphabet = '';&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now I created the table for my test. A single field would do, and putting a Clustered Index on it would suffice for the Seeks. &lt;EM&gt;[Edit: You may need to insert some rows if you find that it's producing&amp;nbsp;an Index Scan. Theoretically it should prefer to Scan,&amp;nbsp;knowing there's&amp;nbsp;no rows, but my tests showed it would always Seek]&lt;/EM&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;Invoke-Sqlcmd -server . -data tempdb -query "create table dbo.collation_test (col varchar(10) collate $collation primary key);"&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now I get into the looping. &lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$c = $firstletter; &lt;BR&gt;$stillgoing = $true; &lt;BR&gt;while ($stillgoing) &lt;BR&gt;{&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I construct the query I want, seeking for entries which start with whatever $c has reached, and get the plan for it:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$query = "select col from dbo.collation_test where col like '$($c)%';"; &lt;BR&gt;[xml] $pl = get-sqlplan $query "." "tempdb";&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;At this point, my $pl variable is a scary piece of XML, representing the execution plan. A bit of hunting through it showed me that the EndRange element contained what I was after, and that if it contained NULL, then I was done.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$stillgoing = ($pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange -ne $null);&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now I could grab the value out of it (which came with apostrophes that needed stripping), and append that to my $alphabet variable.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;&amp;nbsp; if ($stillgoing) &lt;BR&gt;&amp;nbsp; {&amp;nbsp; &lt;BR&gt;$c=$pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange.RangeExpressions.ScalarOperator.ScalarString.Replace("'",""); &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $alphabet += $c; &lt;BR&gt;&amp;nbsp; }&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Finally, finishing the loop, dropping the table, and showing my alphabet!&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;} &lt;BR&gt;Invoke-Sqlcmd -server . -data tempdb -query "drop table dbo.collation_test;"; &lt;BR&gt;$alphabet;&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;When I run all this, I see that the Swedish alphabet is ABCDEFGHIJKLMNOPQRSTUVXYZÅÄÖ, which matches what I see at &lt;A href="http://en.wikipedia.org/wiki/Swedish_orthography" target=_blank&gt;Wikipedia&lt;/A&gt;. Interesting to see that the letters on the end are still there, even with Case Insensitivity. Turns out they’re not just “letters with accents”, they’re letters in their own right.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/rob_farley/image_528E6A23.png"&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_thumb_0511C796.png" width=609 height=292&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I’m sure you gave up reading long ago, and really aren’t that&amp;nbsp;grabbed about the idea of doing this using PowerShell. I chose PowerShell because I’d already come up with an easy way of grabbing the estimated plan for a query, and PowerShell does allow for easy navigation of XML.&lt;/P&gt;
&lt;P&gt;I find the most interesting aspect of this as the fact that the Query Optimizer uses the next letter of the alphabet to maintain the SARGability of LIKE. &lt;A href="https://connect.microsoft.com/SQLServer/feedback/details/526431/make-more-functions-sargable" target=_blank&gt;I’m hoping they do something similar for a whole bunch of operations&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Oh, and the fact that you know how to find stuff in the IKEA catalogue.&lt;/P&gt;
&lt;HR&gt;

&lt;P&gt;Footnote: &lt;/P&gt;
&lt;P&gt;If you are interested in whether this works in other languages, you might want to consider the following screenshot, which shows that in principle, it should work with Japanese. It might be a bit harder to run this in PowerShell though, as I’m not sure how it translates. In Hiragana, the Japanese alphabet starts あ, ぃ, ぅ, ぇ, ぉ, ...&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/rob_farley/image_20F2D6C1.png"&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_thumb_57DC8F2D.png" width=644 height=413&gt;&lt;/A&gt;&lt;/P&gt;</description></item><item><title>Inverse Predicates - look both ways before you cross</title><link>http://sqlblog.com/blogs/rob_farley/archive/2010/11/08/inverse-predicates-look-both-ways-before-you-cross.aspx</link><pubDate>Mon, 08 Nov 2010 21:53:21 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:30273</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;When I’m at the &lt;a href="http://www.sqlpass.org/summit/na2010" target="_blank"&gt;PASS Summit&lt;/a&gt; in Seattle this week, I will need to remember that I can’t just step onto the road if I’m walking along the footpath on the right-hand side. In the UK and Australia, where we drive the correct side, it’s fine (just don’t tell my kids), because the cars that are nearest me are coming towards me – except of course, the idiots who are reading blog posts while they’re driving. They could be anywhere.&lt;/p&gt;  &lt;p&gt;I should also learn to stop picking on left-handed people for being ‘&lt;a href="http://en.wiktionary.org/wiki/sinister" target="_blank"&gt;sinister&lt;/a&gt;’. The word ‘sinister’ comes from the Latin word for left, and is associated with the fact that writing with your left hand is unnatural – some would even use the word ‘evil’. My beautiful wife is left-handed, and she’s definitely not sinister or evil. So really I should just pick on left-handed people for their inability to write on whiteboards, not for their sinisterisity. &lt;/p&gt;  &lt;p&gt;You see, the world is designed for right-handed people. Even cars are designed to be driven by right-handed people, and this is why in normal places, the steering wheel is positioned on the right-hand side of the car so that the right-hand can be in control of it all the time, with the left hand being free to change gears, use the indicator, change the radio stations, drink coffee, tweet, etc – all those things that are less important than controlling the car.&lt;/p&gt;  &lt;p&gt;And so we consider queries – in particular, the joining of tables. We might include something in our FROM clause like:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;FROM dbo.Deliveries AS d        &lt;br /&gt;JOIN dbo.Orders AS o         &lt;br /&gt;ON d.DeliveryDate = DATEADD(day,1,o.OrderDate)&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This aligns nicely with our thoughts around SARGability, which suggests that you leave the left-hand side of predicates untouched. Unfortunately, I don’t tend to write my ON clause predicates this way around, because I like to focus on the table I’ve just introduced into the mix. I write my queries like:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;FROM dbo.Deliveries AS d        &lt;br /&gt;JOIN dbo.Orders AS o         &lt;br /&gt;ON o.OrderDate = DATEADD(day,-1,d.DeliveryDate)&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;You’ll notice that I’ve inverted the predicate, because I still like the idea of keeping the left-hand side untouched, but at this point, I want to ask you to have a think about what we’re trying to achieve.&lt;/p&gt;  &lt;p&gt;I’ve written before about how the point of SARGability is to allow an index to be used nicely. Think of the phone book, and how you can easily find the name you want (also, just for fun, think of the Japanese phone book, and how you can’t find names in it unless you understand how their alphabet is ordered). It’s very similar with our Deliveries and Orders tables – we can find a particular date in it if there is an index on that field.&lt;/p&gt;  &lt;p&gt;But which query should we think about? The first one, where we’re trying to find a Delivery, or the second, where we’re trying to find an Order.&lt;/p&gt;  &lt;p&gt;Naturally, being a data question, the answer is “It depends”. And that’s right – it depends on what your query is actually doing.&lt;/p&gt;  &lt;p&gt;If you’re just creating a query, and have a WHERE clause that contains something like: WHERE o.OrderDate = '20101106', then you’re okay. You know that you can easily find the Orders you want, and need the ON clause in the first example (to help find the Delivery). But if you’re creating a &lt;em&gt;view&lt;/em&gt;, and people can use this query in a variety of ways, then you don’t know which side you’re going to start.&lt;/p&gt;  &lt;p&gt;When SARGability is working nicely, we probably see the execution plan using an Index Seek, and can notice in that Seek’s properties that there is a Seek Predicate being used to support that. You can also see it in the ToolTip, as I’ve shown in the picture below. I ran the query &lt;em&gt;SELECT * FROM Production.Product WHERE ProductID = 316;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_432BF6A4.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:block;float:none;margin-left:auto;border-top:0px;margin-right:auto;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_67CCE84E.png" width="593" height="554" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If there are other predicates involved, that aren’t so SARGable (quite probably nothing to do with functions, rather than a choice of index), then we see this reflected in the Predicate property. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_18EC22EF.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:block;float:none;margin-left:auto;border-top:0px;margin-right:auto;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_2C990C83.png" width="777" height="615" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Here I’m filtering on the SafetyStockLevel as well as the ProductID. It uses the ProductID for the Seek, but checks all the results of this Seek using the SafetyStockLevel value, before returning anything out of the Seek operator. We’re not doing a scan with SafetyStockLevel, but we are checking every value returned by the Seek Predicate. This could be improved with an index across both columns (together, not individually), but if the Seek is effective, then the regular Predicate won’t have to be applied to many rows anyway.&lt;/p&gt;  &lt;p&gt;When thinking about your indexes, do remember that included_columns (rather than key_columns) are not ordered, so any predicates involving them won’t be Seek Predicates. If the phone book were only on LastName, and FirstName was an included columnn, then finding me in there would mean checking all the farleys for Rob, rather than finding the Farleys and then continuing the seek to find the Rob entry.&lt;/p&gt;  &lt;p&gt;Back with our orders and deliveries, we see an interesting phenomenon at play.&lt;/p&gt;  &lt;p&gt;If I create a view using the first FROM clause, and filter the view on OrderDate, I might get a nice plan, involving Seeks.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_1977DFD7.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:block;float:none;margin-left:auto;border-top:0px;margin-right:auto;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_03ADF77A.png" width="764" height="666" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This is an excellent plan for this particular query, but if I start with the DeliveryDate, then it’s not so nice.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_564C61A9.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:block;float:none;margin-left:auto;border-top:0px;margin-right:auto;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_5573FBBF.png" width="833" height="659" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The cost of this is much larger – probably orders of magnitude larger. But if I use the other ON clause, I see the same behaviour with the columns switched (good performance if I start with the DeliveryDate, bad performance if I start with OrderDate).&lt;/p&gt;  &lt;p&gt;Ideally, we can do a Seek to find the records we want, regardless of whether we want to look up the records by DeliveryDate or by OrderDate. This would mean getting a Seek Predicate both times to see SARGability in use.&lt;/p&gt;  &lt;p&gt;Having shown you earlier the example that used both a Seek Predicate AND a Predicate in the same query provides the key here...&lt;/p&gt;  &lt;p&gt;Use both! (Yes, work out the inverse predicate, and include it)&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;FROM dbo.Deliveries AS d        &lt;br /&gt;JOIN dbo.Orders AS o         &lt;br /&gt;ON d.DeliveryDate = DATEADD(day,1,o.OrderDate)        &lt;br /&gt;AND o.OrderDate = DATEADD(day,-1,d.DeliveryDate)&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The predicate which isn’t suitable for the Seek will be used as a regular Predicate, while the SARGable one gets the effective Seek Predicate. Now you can the best of both worlds, but you will need to work out the inverse of some of your predicates. &lt;/p&gt;  &lt;p&gt;It’s not hard, but it may have great reward. In the plans below, I’ve created a view like the notice the fact that when we look up records with a known OrderDate, we can seek for the Order, and then look up the Delivery using the transformed OrderDate. Checking that the OrderDate matches the transformed DeliveryDate falls into the Predicate, not the Seek Predicate.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_6D97661A.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:block;float:none;margin-left:auto;border-top:0px;margin-right:auto;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_28BB1C19.png" width="779" height="696" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Conversely, if we have the DeliveryDate, then we can seek to find the Delivery, and then look up the Order using an appropriate Seek Predicate, with the other becoming the regular Predicate.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_39BF49FC.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:block;float:none;margin-left:auto;border-top:0px;margin-right:auto;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_0DA24D0B.png" width="774" height="705" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;So now when you want to create views for a variety of uses, you can hopefully see some potential in saying the same thing from both perspectives. Don’t just think left-to-right – consider all the different ways you could approach your data. Just maybe don’t try to write your execution plan on a whiteboard.&lt;/p&gt;</description></item><item><title>SARGability isn’t about the left-hand side</title><link>http://sqlblog.com/blogs/rob_farley/archive/2010/10/12/sargability-isn-t-about-the-left-hand-side.aspx</link><pubDate>Tue, 12 Oct 2010 22:10:12 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:29350</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;This month’s T-SQL Tuesday has me caught slightly unawares. I’m in the UK, having come over for &lt;a href="http://www.sqlbits.com" target="_blank"&gt;SQLBits&lt;/a&gt;, and I’m left writing this with only a few hours to go. &lt;a href="http://SankarReddy.com/2010/10/invitation-to-participate-in-t-sql-tuesday-11-misconceptions-in-sql-server/" target="_blank"&gt;&lt;img style="border-bottom:0px;border-left:0px;margin:5px;display:inline;border-top:0px;border-right:0px;" title="TSQL2sDay150x150" border="0" alt="TSQL2sDay150x150" align="right" src="http://SankarReddy.com/wp-content/uploads/2010/10/TSQL2sDay150x150.jpg" width="150" height="150" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The theme this time is on misconceptions in SQL, and this fits nicely with SARGability – one of the topics that I presented on at SQLBits a week and a half ago. I often see people refer to SARGability as making sure that the left-hand side of the predicate is untouched. In other words, suggesting that something like this would be fine:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;WHERE someColumn = someFunction(someValue)&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;whereas something like this would not:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;WHERE someOtherFunction(someColumn) = someValue&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;As with most misconceptions, there’s an element of truth to this, but it definitely doesn’t tell the whole story.&lt;/p&gt;  &lt;p&gt;If you have an index on someColumn, then great – the first predicate should let you find the value you want in that index nicely. Then you have other questions about whether the index is suitable or not, but that’s for a different post. On the other hand, if you don’t have an index, then the system is going to scan the table (or at least some less suitable index) in both scenarios. Applying the function to each column is painful in the second one, but SARGability is about producing seeks instead of scans, and these would both cause scans.&lt;/p&gt;  &lt;p&gt;But how often do you write predicates like this. Normally, we have something more like:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;ON x.someColum = y.someOtherColumn&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;…and this naturally causes a question. Do we want to be looking up the values in x, or in&amp;#160; y? What needs to be SARGable here, the left or the right? It’s very easy to assume the left, but is it really going to be best for you? SQL Server doesn’t care which way around you’ve written your predicate, so don’t hold onto the ‘left-hand side’ idea longer than the end of this post. In fact, think about the way you want your query to be executed, and ask yourself where your SARGability should actually be looking.&lt;/p&gt;  &lt;p&gt;It’s all a matter of perspective…&lt;/p&gt;</description></item><item><title>Table? No such thing…</title><link>http://sqlblog.com/blogs/rob_farley/archive/2010/09/13/table-no-such-thing.aspx</link><pubDate>Mon, 13 Sep 2010 23:48:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:28633</guid><dc:creator>rob_farley</dc:creator><description>&lt;P&gt;No really – hear me out.&lt;A href="http://michaeljswart.com/?p=844" target=_blank&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;MARGIN:5px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=TSQL2sDay150x150 border=0 alt=TSQL2sDay150x150 align=right src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_6D7D4C90.jpg" width=154 height=154&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Of course you create tables, and you query tables, and we say that data is stored in tables. The table is (rightly) a fundamental part of relational theory. But I find that when I think about queries and how they run, I need to approach the system thinking about the indexes that I’m querying, not the tables.&lt;/P&gt;
&lt;P&gt;When you decide to find someone’s phone number, you have to make a decision about where you’re going to look that up. Do you pick up the White Pages, the Yellow Pages, your little black book, your kids’ school’s parent directory, or something else? In fact, this is a question that you ask yourself before you even consider whether the particular information is accessible or not. If you had no other option but to scan the entire White Pages (starting at page 1), you would probably just consider the information unobtainable.&lt;/P&gt;
&lt;P&gt;And yet all these different sources of data are just different non-clustered indexes on the master set. They’re generally filtered, ordered in an appropriate way, and don’t contain all the information. The Yellow Pages is just businesses, ordered by business type. The White Pages doesn’t include the business type at all, and maybe the school’s parent directory doesn’t include the address. Each of them is designed for a particular type of use. The cost of maintaining them is considered (and presumably it’s understood that the benefits of having the index far outweighs of cost).&lt;/P&gt;
&lt;P&gt;I know some of you will be telling me that not all tables have indexes – but every table in SQL Server is either a Heap or a Clustered Index, and if you consider that a Heap is simply an unordered index, then hopefully my points can stand.&lt;/P&gt;
&lt;P&gt;When you write a query, you have the option of forcing a particular index to be used with a Hint. This is the kind of hint where you don’t get a choice. I’m sure you all know times when someone has “hinted” that a particular task needs doing, or that you should avoid telling a particular kind of joke, or etc. The kind of hint that if you can’t obey, an error situation will occur. So if you “hint” that a particular index should be used but that index isn’t there, you’ll get an error. So I try not to use index hints if possible.&lt;/P&gt;
&lt;P&gt;But my dislike for index hints doesn’t stop me from wanting to consider indexes rather than tables. If I think about what data I want, I really ought to be thinking about the ideal index for my query, and expect to see that index being used in the execution plan. If it’s not being used, or it’s being used in a way that I didn’t expect, then I really ought to ask myself why and investigate the situation (and consider whether or not the way that the query is being run is acceptable or not – remember the analogy of not wanting to start on page one of the White Pages). Hopefully I’ll realise that the index I imagined wasn’t actually ideal when considering the statistics involved, or that the Query Optimizer has found some clever way to answer my query in a way that I didn’t expect, but sometimes you find that the way you’ve written the query is actually persuading the Query Optimizer to create a plan that’s not actually what you want. Correct, valid, but not necessarily ideal.&lt;/P&gt;
&lt;P&gt;It’s very easy to just think in terms of tables. You need to reference tables in your query, which ultimately is a logical construct – but the plan that gets produced is what determines the performance, the locking, the cache use, etc, and the plan doesn’t talk about tables (and if it does, the word Table means Heap). By all means you should remember that two indexes that are based on the same ‘table’ may refer to the same data, and a perfectly valid plan could use a different index to what you’re expecting – but this doesn’t mean you shouldn’t give thought to what you consider the ‘correct’ index to be.&lt;/P&gt;
&lt;P&gt;So move on from tables, and think in terms of indexes. You’ll learn more about how your queries get executed, and your indexing techniques will improve wildly. More importantly though, your queries will almost certainly run faster.&lt;/P&gt;
&lt;P&gt;But to complete the story, it's also completely valid to think of every index as a table. Each non-clustered index could be thought of as a table which the key is like the table's clustered index, and the NCIX's included columns are like the rest of the columns in the table. This helps you think about the cost associated with maintaining your indexes, and also helps you plan them. I'm not saying that you should create different tables instead of indexes by any means, but when you consider the idea that a table is an index, also consider that each index is a table - just to complete the picture.&lt;/P&gt;</description></item></channel></rss>