THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Michael Zilberstein

Partitioning by GUID

Recently I had to use GUID (Global Unique Identifier) as primary key in order to ensure cross-server uniqueness. In many cases surrogate key (some sort of concatenation between server id and intra-server identity) would give a better performance but in my case due to application requirements surrogate key wasn't an option.

Next step is partitioning by GUID column. Say, you want 3 partitions. Ho do you build partition function? Sure, you can perform a calculation and find the exact 2 values that will divide all the GUID range to 3 equal parts. I needed at least 20 partitions, so I used an approximation. If you write down long hexadecimal number, you'll receive something like A09F76... I decided that partitioning by first 2 tabs is enough. Example for decimal: let's divide values from 1 to 10000 to 3 partitions - what would be the ranges? The exact answer is: 1 - 3333, 3334 - 6666, 6667-10000. Using approximation by 2 first digits will give the following ranges: 1 - 3300, 3301 - 6600, 6601 - 10000. 2 orders of magnitude (first 2 digits) are good enough approximation. In hexadecimal first 2 digits (00 - FF) represent 0-255 in decimal. I'm not accustomed to hexadecimal calculations, so first I divide decimal range to 3 parts: 1 - 85, 86 - 170, 171 - 255. Next step is converting it to hexadecimal: 01 - 55, 56 - AA, AB - FF.

OK, done with values, how do I translate them into GUID? For example, 01 in hexadecimal - what is the GUID representation of it (taking into account that those should be first 2 digits - not the last ones). Took me awhile to understand. First I tried "01000000-0000-0000-0000-000000000000" - surprisingly all the rows (more then a million) came to the first partition. OK, another try: "00000001-0000-0000-0000-000000000000". Still the same. The catch is that GUID should be read in a byte groups order from right to left while inside every byte group reading order is from left to right. And every 2 tabs are 1 byte. So the right representation for 01 as first 2 digits in GUID is: "00000000-0000-0000-0000-010000000000".

To finish with, partition function that partitions GUID to 20 range groups:

CREATE PARTITION FUNCTION [pf_GUIDRangePartition](uniqueidentifier) AS RANGE RIGHT 
FOR VALUES 
(
  
N'00000000-0000-0000-0000-0D0000000000',
  
N'00000000-0000-0000-0000-1A0000000000',
  
N'00000000-0000-0000-0000-270000000000',
  
N'00000000-0000-0000-0000-340000000000',
  
N'00000000-0000-0000-0000-410000000000',
  
N'00000000-0000-0000-0000-4E0000000000',
  
N'00000000-0000-0000-0000-5B0000000000',
  
N'00000000-0000-0000-0000-680000000000',
  
N'00000000-0000-0000-0000-750000000000',
  
N'00000000-0000-0000-0000-820000000000',
  
N'00000000-0000-0000-0000-8F0000000000',
  
N'00000000-0000-0000-0000-9C0000000000',
  
N'00000000-0000-0000-0000-A90000000000',
  
N'00000000-0000-0000-0000-B60000000000',
  
N'00000000-0000-0000-0000-C30000000000',
  
N'00000000-0000-0000-0000-D00000000000',
  
N'00000000-0000-0000-0000-DD0000000000',
  
N'00000000-0000-0000-0000-EA0000000000',
  
N'00000000-0000-0000-0000-F70000000000'
) ;
Published Wednesday, June 11, 2008 4:40 AM by Michael Zilberstein
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Doug Brendle said:

Very intersting Micheal. This is coming up at myu workplace now. I didn't doubt that it could be partitioned but more, is there a performance impact in using partitions with GUID data types?

July 8, 2010 4:09 PM
 

Michael Zilberstein said:

Hi Doug,

Partition elimination works efficiently - exactly like with any other datatypes. Performance penalty can come later when you create index or even, forgive me DBA's god for my sin, clustered index on GUID column. The penalty will be in heavy paging that you have no way to avoid. I had no choice because multiple servers assign those IDs (including servers non-existing yet but would be existing later, so ranges per server aren't solution).

July 19, 2010 10:32 AM
 

developer said:

Hi,

I need to do a partition by modulus. Do u know whether it can be performed with  GUID as the partition column??

August 31, 2010 5:45 AM
 

Michael Zilberstein said:

Hi Developer,

SQL Server doesn't support function-based partitioning. So in order to partition by modulus you'll have to add columns that will contain actual modulus values.

August 31, 2010 6:18 AM
 

John said:

You don't get the same paging problem with a clustered index and sequential guid do u?

October 6, 2010 11:54 PM
 

Michael Zilberstein said:

Hi John,

Sequential GUID should allow you to avoid fragmentation from INSERT operations. But if you can afford sequential GUIDs - meaning, I guess, that your DB server is creating them - why not to use INT or BIGINT? Even BIGINT is half the size of GUID. INT is a quarter. In my case many application servers assigned GUIDs, so I couldn't make it sequential.

October 7, 2010 3:13 AM
 

John said:

Thanks for the reply Michael. I use the guid because I'm anticipating have copies of the same application running independently in multiple locations. If they want to swap a record, the GUID would go with the record yet never conflict. The GUID is creating the sequential IDs so that takes the responsibility from the app to the database. Even if the app made a guid, it would be ignored during the insert process and overriden with the GUID the database sends. Even in a dataset scenario, I'd have the datarelations (in .Net land) set to cascase updates of the keys but that another issue. The size doesn't bother me because I work in a scenario where space is cheap.

October 8, 2010 2:26 PM
 

Yuriy Rozhok said:

Great article, thanks Michael!

Really helpful - I've got the same task.

It's so simple!

April 21, 2011 1:20 PM
 

Dylan said:

I just come across the same problem; i want to batch dml on a table clustered on a guid. With the help of this post I wrote this to automate the generation of guid ranges and thought id share it.

I usually use a tvf (dbo.NumbersTable commented out) to generate the table of numbers but replaced it here with a cte to generate the same thing.

Let me know your thoughts;-

declare @MaxHex bigint = (select convert(bigint, 0xffffffffff))

  ,    @Intervals bigint = 10000 /*max 65536*/

declare @EachGroupSize bigint = @MaxHex / @Intervals

;with   two as (select number from (values (1), (1), (1), (1)) v(number))

   ,   eight as (select a.number from two a, two b, two c, two d)

   ,   sixteen as (select a.number from eight a, eight b)

   ,   AllNumbers as (select number = row_number() over (order by getdate()) from sixteen)

   ,   NumbersTable as (select number from AllNumbers where number <= @Intervals)

select GroupLimitGuid = convert(uniqueidentifier, '00000000-0000-0000-0000-'

                       +convert(varchar(20), convert(varbinary(5), n.Number * @EachGroupSize), 2)

                       +'00')

from    NumbersTable n -- dbo.NumbersTable(@Intervals) n

September 18, 2013 10:26 PM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement