THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand. See also my articles on SQLperformance.com

Enforcing Uniqueness for Performance

A little while back, I posted a short series on seeks and scans, and one of the things I highlighted was the difference between a singleton seek and a range scan.  You can find that post here, if you want a refresher.

Anyway, the broad point is that a singleton equality seek always retrieves exactly one row, and is guaranteed to do so because a unique index exists to enforce it.  A range scan, on the other hand, seeks down the B-tree to a starting (or ending) point, and scans forward (or backward) from that point using the next or previous page pointers.  The point of today’s short post is to show how much faster a singleton seek is, compared with a range scan, even when both return exactly the same number of records.

Pretty simple test rig today; a 5 million row table with a single BIGINT NOT NULL column with all the values from 1 to 5,000,000:

CREATE TABLE dbo.SeekTest
(
    col1    BIGINT NOT NULL
);
GO
CREATE CLUSTERED INDEX cx ON dbo.SeekTest (col1);
GO
-- 5 million rows numbered 1 to 5,000,000
INSERT dbo.SeekTest WITH (TABLOCKX)
    (col1)
SELECT TOP (5000000)
    ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM 
    master.sys.columns AS c,
    master.sys.columns AS c2,
    master.sys.columns AS c3;

Notice that the clustered index is not defined as unique.  There will not be any uniquifiers (in case you think I am cheating) because SQL Server only adds them when necessary, and there are no duplicates in this example.  The test query simply joins the table to itself, forcing a nested loops join so we get 5 million seek operations:

SELECT 
    COUNT_BIG(*)
FROM dbo.SeekTest AS st WITH (TABLOCK)
INNER LOOP JOIN dbo.SeekTest AS st2 WITH (TABLOCK) ON
    st2.col1 = st.col1
OPTION (MAXDOP 1);

And here’s the query plan:

image

On my machine, typical results are:

image

Notice the Scan Count.  We get one scan for the Clustered Index Scan iterator in the plan, and one scan for every seek operation.  This is one way to see that we are getting range scans here – the lack of a unique index means that SQL Server cannot guarantee that only one row will be returned from each seek, so a range scan is performed to pick up any extra rows.

Now let’s create a second table, where the only difference is that the index is declared as UNIQUE:

CREATE TABLE dbo.SeekTestUnique
(
    col1    BIGINT NOT NULL
);
GO
CREATE UNIQUE CLUSTERED INDEX cuq ON dbo.SeekTestUnique (col1);
GO
-- 5 million rows numbered 1 to 5,000,000
INSERT dbo.SeekTestUnique WITH (TABLOCKX)
    (col1)
SELECT TOP (5000000)
    ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM 
    master.sys.columns AS c,
    master.sys.columns AS c2,
    master.sys.columns AS c3;

As before, we join this table to itself using loops join (the TABLOCKs are just to reduce locking overheads):

SELECT 
    COUNT_BIG(*)
FROM dbo.SeekTestUnique AS stu WITH (TABLOCK)
INNER LOOP JOIN dbo.SeekTestUnique AS stu2 WITH (TABLOCK) ON
    stu2.col1 = stu.col1
OPTION (MAXDOP 1);

And we get the same plan:

image

But very different results:

image

Now there is only one scan, because the seeks are all singleton seeks, and execution time has improved from 9472ms to 6096ms.

There are lots of good reasons to enforce uniqueness where you can (and a few not to).  For further information see:

Is it a Seek or a Scan?
Sneaky Reads with Unique Indexes
Unique Indexes with GROUP BY (Rob Farley)
Scan Count Meaning in STATISTICS IO Output – Part 1 (Amit Banerjee)
Scan Count Meaning in STATISTICS IO Output – Part 2 (Amit Banerjee)

© 2011 Paul White

email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi

Published Friday, July 29, 2011 5:38 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Amit Banerjee said:

This is what you call a quick post? It's so short!! :P

Good point. I had written about confusing nature of scan count in the STATISTICS IO output sometime back.

July 28, 2011 12:08 PM
 

Paul White said:

Thanks Amit, I have added links to your MSDN entries - everyone should read them.

Paul

July 28, 2011 12:23 PM
 

Meher said:

Thanks Paul. Very nice post. You explained it in simple lnaguage and I like it. Amit also has a good post. Thanks to both of you.

Meher

July 28, 2011 5:28 PM
 

ALZDBA said:

Rule no 1: Tell your system what you know ! If you know it is unique, declare it as unique ! ( If it must be unique, it is considered a constraint and should be declared to meet the constraint )

Nice read. Good refs.

Thanks

July 28, 2011 6:03 PM
 

WayneS said:

ALZDBA - I really like your rule.

July 30, 2011 2:07 PM
 

Paul White: Page Free Space said:

In my last post , I showed how using a index unique could speed up equality seeks by around 40%. For

August 3, 2011 8:34 AM
 

kristina said:

It has been some time since I visited website with such high quality information. Thank you so much for providing such helpful information. This is really informative and I will for sure refer my friends the same. Thanks.

July 11, 2012 11:43 PM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement