THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Rob Farley

- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and provides consulting and training courses around the world in SQL Server and BI topics.

Scans are better than Seeks. Really.

Hi! - Great that you've found this page, but it's no longer here! You can find the content over at: http://blogs.lobsterpot.com.au/2014/03/12/scans-are-better-than-seeks-really/

Published Wednesday, March 12, 2014 1:00 AM by Rob Farley

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dave Wentzel said:

Spot on.  One of the lesser-known tenants of the Big Data movement is the preference of scans over seeks for batch processing.  I even heard a NoSQL vendor say that with the advent of FusionIO and such, there is no performance difference between seeks and scans or even sequential vs random IO.  Not sure I'm ready to take that leap of faith yet tho.  

March 11, 2014 12:13 PM
 

Andomar said:

You write:

"The ‘key data’ is anything that’s defined in the index key, plus enough extra data to make sure that each row is uniquely identifiable (if the index is a ‘unique index’, then it already has enough information, if not, then the clustered index key(s) are included – with uniquifier column if the CIX keys are not unique), and therefore searchable."

Are you sure about that?  I think even a unique index contains the clustered keys.  After all, how could you do a lookup to the clustered index without the clustered key?

March 11, 2014 3:57 PM
 

Rob Farley said:

In a unique index, the CIX keys are included at the leaf level. They're not required on the other levels.

March 11, 2014 4:49 PM
 

Uri Dimant said:

Hi Rob

What is about the index is very fragmented, for Seek it can get to any row easily no matter how fragmented the table is.

In that case fragmentation has no impact on performance.But if you do a scan (following the logical order) on a fragmented table, you can be jumping all over the to get the rows you need,instead of accessing them nice and neat and contiguously. What do you think?

March 12, 2014 1:35 AM
 

Rob Farley said:

Hi Uri,

I'm not suggesting that a Scan is better at locating a single row than a Seek. I'm suggesting a Scan across a filtered index is better than a RangeScan performed by a Seek, or frequently, than a large number of Seeks performed separately. Remember that a Seek performs a RangeScan which traverses the leaf nodes just like a Scan, but diving into a particular start point. If the index is filtered down to only the rows of interest, the Scan is less work than the RangeScan performed by a Seek - it doesn't need to search for the start, or keep checking it's in the range. It also reduces the impact of maintaining the index, as fewer rows need to be inserted / updated.

The jumping around that you can see because of fragmentation occurs just as much with RangeScans as with Scans. The perception that a seek is immune is wrong. A Seek simply scans a portion of the index. When people have low-selective predicates like "Price > 0" will cause a Seek which gives an illusion of improvement, but not much more.

March 12, 2014 4:23 AM
 

Rainer Unwin said:

Hi Rob,

Nice article and something that I fully endorse. Neither a seek nor a scan is inherently better. It all depends on the data and the required operation at hand. One thing I do disagree with is that you say that each row in a seek and range scan needs to be evaluated. While this can be true it isn't necessarily true to my understanding. If the range is big enough SQL Server will evaluate the non-leaf level and decide that several pages in the index leaf meet the criteria. The first and last page require a check for every entry, however as I understand it SQL Server can just consume the intermediate pages (Between the first and last page) knowing that all records must match. I believe Paul Randal explained this in one of the MCM videos. My apologies to Paul and you if I'm incorrect. Can I get your thoughts on that?

March 14, 2014 1:45 PM
 

Rob Farley said:

Rows can be checked en masse using higher levels, but the point is that the engine needs to be considering whether each row is in the Range or not. In a range of 50k rows, I'm not saying that it's 50k checks, just that it needs to know about each row.

March 14, 2014 10:25 PM
 

Ewald Cress said:

Great post, and a great subject for exploring in the context of knee-jerk "optimisation" based on having just enough knowledge to be dangerous.

March 16, 2014 6:42 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Tags

No tags have been created or used yet.

News

News? Haven't you read my blog?

My Company


Can't find something?

Contact Me

IM: rob_farley@hotmail.com
Twitter: @rob_farley
Skype: rob_farley
E: rob_farley@hotmail.com

MVP (SQL Server)




Certifications








Adelaide SQL UG

Privacy Statement