THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Denis Gobo

Has Anyone Succeeded In Creating A Collision Between NEWID and NEWSEQUENTIALID

SQL Server 2005 introduced a new type of function to create a uniqueidentifier; the NEWSEQUENTIALID(). This new function has been created for performance reasons, each new value is greater than the previous value. In theory this means that the value will be inserted at the end of a page and not in the middle which can cause splits.

Let's run this code to see the difference

CREATE TABLE #TableSeqID (ColumnA uniqueidentifier DEFAULT NEWSEQUENTIALID(),

ColumnB uniqueidentifier DEFAULT NEWID())

 

INSERT #TableSeqID DEFAULT VALUES

INSERT #TableSeqID DEFAULT VALUES

INSERT #TableSeqID DEFAULT VALUES

INSERT #TableSeqID DEFAULT VALUES

INSERT #TableSeqID DEFAULT VALUES

GO

 

SELECT * FROM #TableSeqID

Output
-----------
BBF765FE-57BD-DC11-875F-000D5684F8D8 CE51B9E4-1640-47E2-87C6-6ADD46C63A87
BCF765FE-57BD-DC11-875F-000D5684F8D8 CA220BAB-462E-440E-829A-E0037CAF0D1F
BDF765FE-57BD-DC11-875F-000D5684F8D8 01748772-8155-4F83-A58F-CC3253DDD3F3
BEF765FE-57BD-DC11-875F-000D5684F8D8 9C4B2C87-AE33-4432-8310-3BE731179382
BFF765FE-57BD-DC11-875F-000D5684F8D8 1F84B827-F42A-4C47-8A1B-4B672B4402F1


As you can see ColumnB is random (Or at least pseudo-random) while ColumnA is not
Let's say you have a table with a billion rows, this table used NEWID() up till now. What will happen when you change the table to use NEWSEQUENTIALID(), could you get a duplicate?
I tried my best and filled up my 400GB External Seagate drive without success

Have you run into a collision, is it even possible?

Published Monday, January 07, 2008 3:47 PM by Denis Gobo

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AaronBertrand said:

I have a terabyte external WD drive, if you save me the time of constructing the script by sending it to me, I will try it out this week (once I clear out the space).  :-)

January 7, 2008 5:25 PM
 

AaronBertrand said:

On second thought, I think the script would be quite easy to construct.  :-)

January 7, 2008 5:26 PM
 

Adam Machanic said:

I doubt you'll find a dupe, but you can find one with almost no space needed...

---

create table #x

(

n tinyint null,

x1 uniqueidentifier default (newsequentialid()),

x2 uniqueidentifier default (newid()),

check (x1 <> x2)

)

while 1=1

begin

truncate table #x

insert #x default values 

declare @i int

set @i = 20

while @i > 0

begin

begin try

insert #x (n) select n from #x

end try

begin catch

select 'some problem happened... dupe?'

break

end catch

set @i = @i - 1

end

if @i > 0 --problem?

break

end

---

January 7, 2008 5:44 PM
 

Peter W. DeBetta said:

What about tempdb?

January 7, 2008 5:58 PM
 

Pieter said:

This article may provide some insight into the question posed

http://sqljunkies.com/Article/4067A1B1-C31C-4EAF-86C3-80513451FC03.scuk

January 8, 2008 8:34 AM
 

Denis Gobo said:

Adam,  the check constraint just makes sure that the values are not same on the same row

create table #x2

(x1 int,x2 int,check (x1 <> x2))

insert #x2 values(1,2)

insert #x2 values(2,1)

insert #x2 values(2,1)

select * from #x2

I want to check that the value in x1 does not exists in any row in x2

so basically I created 2 tables, both between 10 and 20 billion rows

then I did select * from table 1 where exists(select * from table 2 where 1.id =2.id)

And nothing

January 8, 2008 12:36 PM
 

Denis Gobo said:

>>What about tempdb?

or the log file? I did a shrink every 10 million inserts

January 8, 2008 12:37 PM
 

AaronBertrand said:

Adam, I think you are checking for a much more obscure possibility... that a NEWID() happens upon a single instance of a NEWSEQUENTIALID().  What Denis is trying to determine is how likely it is that you will ever hit a previously-assigned NEWSEQUENTIALID()... there is a big difference.

January 8, 2008 4:18 PM
 

Adam Machanic said:

Aaron: Good point.  

One thing I've noticed in the past is that the first character of the third "group" is always "4" for GUIDs created with NEWID().  And from what I've seen so far, it's always "D" for GUIDs created with NEWSEQUENTIALID().  If both of these two patterns are always true, there can never be a collision.  I seem to recall that the "4" is done on purpose for some reason, but I'm not sure about the "D".  Anyone know?

January 8, 2008 6:41 PM
 

Denis Gobo said:

>>One thing I've noticed in the past is that the first character of the third "group" is always "4" for GUIDs created with NEWID().  

I just checked one of our table and that looks correct

January 10, 2008 9:53 AM
 

Mike Walbert said:

I've found three newid() calls leading in very different times (over the course of 2 days) to:

1DC674B3-C85D-11DC-98BD-000C29C94123

DF12F3EC-C868-11DC-98BD-000C29C94123

62DB2E89-C869-11DC-98BD-000C29C94123

January 22, 2008 1:11 PM
 

Joshua said:

Mike Walbert appears to be using Windows 9x or NT4. That is the old guid generator algorithm and it can't duplicate (MAC addr + counter + reboot counter).

September 8, 2008 11:59 AM
 

Gerardo said:

Using SQL 2008 on W7 or W2008, the NEWID always generates a uuids with a 4 in the 14th position.

26D96F22-DE61-42A3-9699-0E5587AD4835

3DF1DD17-3213-47D3-9737-0F2B1B0F8FF0

AE287E89-384D-42D0-9764-142FBB2B7CC9

D6049CD0-E96F-43CD-A039-169209D49D37

891D51C0-6567-493F-ACE4-1EB703AEEA54

61FB216E-4245-4730-9790-2F01C1FC5EB8

Creating many temp tables with NEWSEQUENTIALID generates uuids with a E in the 14th position:

09241602-54A6-E011-86A9-00155D016691

AB0CA418-54A6-E011-86A9-00155D016691

58CF5124-54A6-E011-86A9-00155D016691

7DCF5124-54A6-E011-86A9-00155D016691

A824672C-54A6-E011-86A9-00155D016691

So i expect that no collisions would ocurr, ever.

July 4, 2011 1:17 PM
 

joe ho said:

If GUID generation are true randomness:

The GUID is 128-bit but there are include 6 bit variant and version number.  So is actually 122-bit.  From the birthday problem, to get  50% GUID collision is actually only 2^62.

If the world has 2^33 population (in future I would assume 2^38 for earth limit).  Each population hold 2^4 (16 devices).  If Each device has 2^25 ID records (or 2^20 ID records if future is 2^38 population).  then will have 50% collision.

But since generating GUID is not true random but just pseudorandom, the probability of collision is much higher.  Some forum claim that only 1 million record has find GUID collision in real data.

July 13, 2012 11:30 PM

Leave a Comment

(required) 
(required) 
Submit

About Denis Gobo

I was born in Croatia in 1970, when I was one I moved to Amsterdam (and yes Ajax is THE team in Holland) and finally in 1993 I came to the US. I have lived in New York City for a bunch of years and currently live in Princeton, New Jersey with my wife and 3 kids. I work for Dow Jones as a Database architect in the indexes department, one drawback: since our data goes back all the way to May 1896 I cannot use smalldates ;-( I have been working with SQL server since version 6.5 and compared to all the other bloggers here I am a n00b. Some of you might know me from http://sqlservercode.blogspot.com/ or even from some of the newsgroups where I go by the name Denis the SQL Menace If you are a Tek-Tips user then you might know me by the name SQLDenis, I am one of the guys answering SQL Questions in the SQL Programming forum.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement