THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

T-SQL Tuesday #4: I/O, You Know

It's time for the fourth T-SQL Tuesday, managed this time by Mike Walsh. I almost missed this deadline completely, since I didn't see the announcement at all. I wrote to Adam to ask if there even was an event this month, since I wasn't able to get into my own blog site (www.SQLBlog.com) for a week, and he pointed me to Mike's site. I'm wondering if it's this hit and miss for everyone. There is no single location where those people interested in T-SQL Tuesday can find out about it. Do you just have to read every possible SQL Server blog to see where the announcement might pop up?

Mike's topic is I/O and in keeping with my past posts, this contribution will include a history lesson. Also, because I just found out about it with only hours to go until the end of Tuesday UTC, it will be very short.

Several of the other bloggers who have already made their I/O posts have mentioned the fact that just looking at the numbers for read operations do not tell the whole story.  Back in the early days, the main tool we had was STATISTICS IO, and we used the numbers it returned for Logical IOs to determine which queries were 'better' than which others. In back then, those of us who did a lot of query tuning knew that the numbers returned did not give the full story, as read operations come in two flavors. There are cache reads and there are physical reads (from disk). The value for logical reads includes both. If all the reads were from cache, there wouldn't be much cost at all, but the value gives us a worst case scenario. Potentially, all the Reads could be physical, if the cache is very small or extremely volatile (or both.) So the Logical Reads value basically sets an upper limit. In many cases, with a large cache and popular tables, all, or most, of the Reads were from cache, so what looked like an expensive query from the STATISTICS IO value was not that expensive at all.

The Logical Reads value could look quite large in the case of a join, where the inner table was repeatedly accessed, over and over, for each qualifying row in the outer table. Most of the repeated accesses would be from cached data, but each access to the same data would be counted as a separate Logical Read.

Still, when comparing two queries, looking at this worst case value could frequently be useful. If I had a choice between two queries which yielded the same results, and if one showed a value for Logical Reads in the tens of thousands, and the other showed a few hundred, it was an easy decision as to which query was better.

That all changed in SQL Server 7, when the optimizer was greatly enhanced to add a couple of new processing techniques, in particular HASH JOIN and HASH AGGREGATION.

HASH operations perform minimal read operations, as each table is scanned only once, at most. SQL Server takes the rows and hashes them, storing the hash 'buckets' in memory (as much as possible). The work of HASH operations is not in IOs, but in the CPU cost of calculating the hash values and managing the hash buckets, and the memory cost of storing the hash buckets.

I just wrote a query that joined two tables, and ran it twice. For the first one I forced a LOOP JOIN, and for the second I forced a HASH JOIN. I enabled STATISTICS IO and STATISTICS TIME. Here are the results:

-- First, for the LOOP JOIN:

Table 't1'. Scan count 4999, logical reads 10773, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 323 ms.

-- Next, for the HASH JOIN:

Table 't1'. Scan count 1, logical reads 149, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't2'. Scan count 1, logical reads 583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 1341 ms.

The LOOP JOIN case has far more Logical Reads, particularly for the inner table (t1) which is accessed multiple times. The HASH JOIN has far fewer reads, but the times, but for actual CPU and for elapsed time, are considerably longer.

I still frequently look at STATISTICS IO values, even with SQL Server 2008 queries, but I try to keep in mind that there is a lot more data to be gathered before I can get the true performance picture.

~Kalen

Published Tuesday, March 09, 2010 3:22 PM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Mike Walsh said:

Hey Kalen!

Thanks for the post, looking forward to reading through it more. Apologies about the lack of a good notice mechanism. I blogged about it a week ago but I am not as well subscribed to as a blog like yours. Adam also blogged about it and some folks tweeted about it.

I just added an RSS feed to the #tsql2sday twitter hashtag in my feed reader and that shows me all the tweets that mention #tsql2sday. That is one possible way to get notifications. The way the time flies it is pretty tough to remember which Tuesday of the month you are on, let alone the fact that that is the one TSQL Tuesday happens on :-)

I am glad you made it in with some time to spare, though.

March 9, 2010 5:34 PM
 

Kalen Delaney said:

Thanks, Mike. I actually didn't have any time to spare. Deadline was  16:00 PST I posted at 15:22 PST, and I had a dentist appointment at 15:30. So I was a bit late for that.

Your solution is an interesting one, but not realistic for someone who is not into twitter or tweeting.

I'll just have to set up a reminder to look around everywhere on the first Tuesday of the month, and hope to find an announcement somewhere, or at least a link to an announcement.

~Kalen

March 9, 2010 8:02 PM
 

Rob Farley said:

I think one useful method could be to get the previous month's host to mention it. That didn't happen this time though — clearly last month's host is a slacker. Eh? Oh.

Of course, if April's host is already known, Mike could mention it in his roundup...

March 9, 2010 10:33 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement