THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact: Bookmark Lookup is Expensive - Even in Memory

It’s well known that bookmark lookup (aka key lookup in case of a clustered index) is not cheap, especially when it comes to retrieving a lot of data. So I’m not going to rehash the pros and cons of bookmark lookup or why bookmark lookup is expensive. But I’ve noticed that when it comes to discussing bookmark lookup, all the literature seems to being focusing on their implications on storage I/Os. There is nothing wrong with that. In practice, it ultimately does come down to the fact that when reading a large amount of data, random I/Os are much more expensive than sequential I/Os, and bookmark lookup tends to incur random I/Os.

 

What I’d like to highlight in this post is that even if all or most of the pages are in memory already, bookmark lookup is still very expensive compared to scan.

 

To show this is the case, I piggybacked on the test setup described in my last post. Let me briefly recap the test setup. A single table was used, and its definition is as follws:

CREATE TABLE test (
      i           int NOT NULL,
      j           int NOT NULL,   
      dt          datetime,
      filler      char(5000) NOT NULL
)

Two indexes were created on the table:

 

CREATE CLUSTERED INDEX cix_test ON test(i);
CREATE INDEX cx_test ON test(j);

And the table was populated with 2,000,000 rows with the following INSERT statement in a loop (local variable @i going from 1 to 2,000,000):

INSERT test(i, j, dt, filler)
SELECT @i,
       CASE WEHN @i % 2 = 0 THEN @i ELSE 2000000 - @i END,
       getdate(),
       'abc'

Then, the following two queries were separately run multiple times on a server with 32GB of RAM (25GB of which was allocated to the SQL2005 instance buffer pool):

 

SELECT max(dt) FROM test;
SELECT max(dt) FROM test WITH (index(cx_test));

 

After the second run, all the pages were cached in memory (the table was about 16GB in size). Notice that with the first query SQL Server used clustered index scan to produce the result, whereas with the second query the index hint forced SQL Server to scan the nonclustered index and then use bookmark lookup to produce the result.

 

I’ll look at the storage I/O implication of these two access paths in a separate post, but for now what do you think might be the difference in the elapsed time of these two query-processing methods when all the pages were cached in memory?

 

The following chart shows the difference:

For this test, the bookmark lookup method took almost three times as long as did the clustered index scan method.

 

Why such a huge difference? When all the pages were cached in the buffer pool, we can’t explain the query elapsed time difference with the difference in storage I/O efficiency. We can, however, explain the difference with the difference in the number of pages SQL Server engine must visit in the buffer pool.

 

The output from SET STATISTICS IO ON for each method is as follows:

 

Clustered Index Scan:

Scan count 1, logical reads 2008931, physical reads 0, read-ahead reads 0 …

 

Bookmark Lookup:

Scan count 1, logical reads 8130363, physical reads 0, read-ahead reads 0 …

 

Clearly, the access method with bookmark lookup was not efficient, visiting four times as many pages as did the access method with clustered index scan.

 

Also note that compared with the size of the table, the size of the nonclustered index was insignificant. There were about 5,000 pages in the leaf pages of the nonclustered index, whereas there were 2,000,000 leaf pages in the clustered index. So the cost of processing the query with the nonclustered index was dominated by the bookmark or key lookups.

 

Of course, what I’ve discussed in this post is probably more academic than practical as you can’t expect all the pages being cached when processing large reporting queries in most real environments, and therefore the real dominate factor is not how many pages SQL Server needs to traverse in memory, but how many pages SQL Server has to bring into memory and how these pages are brought into memory.

 

It's not my intention to leave you an impression that I'm picking on the bookmark lookup operation because it's just a bad method to retrieve data. That's not the case at all because for some queries, it is an excellent method. But that's not the focus of this post, and there are plenty of discussions on the advantages of the bookmark lookup operation you can find elsewhere.

 

Test Environment

SQL Server 2005 Enterprise x64 Edition ran on Windows Server 2003 Enterprise x64 Edition with Service Pack 2. The SQL2005 build was 9.0.3239 (i.e. SQL Server 2005 CU7) with 25GB allocated to the buffer pool. The server was a HP BL680 G5 including four 2.4GHz quad-core Xeon E7340 processors (aka Tigerton) with 2x4MB L2 cache and 32GB of RAM.

 

Published Sunday, July 06, 2008 10:54 PM by Linchi Shea

Attachment(s): inMemory.jpg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

superhasty said:

good article!

July 24, 2008 6:11 AM
 

David said:

I just want to have a test environment like that!!

July 27, 2008 11:01 PM
 

mario b said:

Great post Linchi, this shows again that not all logical I/O's are equal..

October 14, 2008 7:34 PM
 

DBAAMI said:

Linchi...great article.  I would agree that it is very academic.  It would be more pratical from an operational stand point to discuss how you would identify those queries that are performing those book mark lookups and once you identified them how to correct them.

It becomes more critical to understand the identification process in a very high OLTP database.  

Also, in 2005 and 2008, how does the query plan indicate a book mark.  In 2000, it was very clear.  Conceptually the operation is still the same in 2005 and 2008 but is it identified the same way in 2005 & 2008?

For sure...understanding is the first step in the journey.  Application sustains the life along the journey.

Best Regards

October 22, 2008 1:18 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement