THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand.

Sequence Tables

It is frequently useful to generate sequences of values within SQL Server, perhaps for use as surrogate keys.  Using the IDENTITY property on a column is the easiest way to automatically generate such sequences:

CREATE  TABLE dbo.SomeTable
        (
        row_id  INTEGER IDENTITY PRIMARY KEY,
        data    SQL_VARIANT NOT NULL,
        );

Sometimes though, the database designer finds that she needs a more flexible scheme than is provided by the IDENTITY property.  One alternative is to use a Sequence Table.

What is a Sequence Table?

A Sequence Table is a regular table that stores the next available value for one or more sequences – somewhat like a manual identity value in many respects.  The definition of a Sequence Table might look like this:

CREATE  TABLE dbo.SequenceTable
        (
        sequence_name   NVARCHAR(20) PRIMARY KEY,
        next_value      INTEGER NOT NULL,
        );

The table contains one row for each sequence range under management. One Sequence Table can maintain any number of independent sequences. Typically, though, the table only contains a few tens of rows.

The first column represents the sequence name, and the second represents the next unallocated value in the range. A sequence is initialized with a statement like:

INSERT  INTO
        dbo.SequenceTable
        (sequence_name, next_value)
VALUES  (N'Test Sequence', 1);

The values obtained from the sequence table can be used for any purpose, but the most common is to use them as key values for rows in other tables.  Values are allocated from the sequence through a stored procedure, which encapsulates the logic required to extend the sequence, and provides an easy way to manage permissions. Generally, users have no permissions on the Sequence Table itself, to protect the integrity of the sequence.

The Allocation Stored Procedure

The most common use of a Sequence Table is in pre-allocating a range of keys, to make a batch process more efficient. A process that needs to add a large number of new rows to a table can be made to perform better by batching the requests, and submitting each batch in a single transaction. To make this work, we need to know the key values in advance.

Say we want to reserve a contiguous range of 250 keys from the Test Sequence defined above. We might call a stored procedure like this:

DECLARE @RC     INTEGER,    -- Procedure return code
        @Value  INTEGER;    -- The first key value allocated to the caller
        
-- Call the procedure to allocate a range of 250 contiguous keys
EXECUTE @RC =
        dbo.Allocate_TSQL
            @SequenceName = N'Test Sequence',
            @RangeSize = 250,
            @FirstAllocated = @Value OUTPUT;

The procedure returns the first sequence value reserved, so if we receive the value 1 back, we can safely use key values from 1 to 250 inclusive, without calling the procedure again. No other process can obtain key values from our reserved range.

After this procedure call, the Sequence Table looks like this:

image009

An implementation of the procedure itself might look like this:

CREATE  PROCEDURE dbo.Allocate_TSQL
 
        @SequenceName       NVARCHAR(20),   -- The name of the sequence to allocate keys from
        @RangeSize          INTEGER = 1,    -- The number of keys to allocate
        @FirstAllocated     INTEGER OUTPUT  -- The first key allocated (output)
 
AS
BEGIN
 
    SET XACT_ABORT ON;  -- Most errors will abort the batch
    SET NOCOUNT ON;     -- Supress 'x row(s) affected' messages
    SET ROWCOUNT 0;     -- Reset rowcount
 
    -- Validate the range size requested
    IF      (@RangeSize IS NULL OR @RangeSize < 1)
    BEGIN
            RAISERROR('@RangeSize must be a positive integer (supplied value = %i)', 16, 1, @RangeSize);
            RETURN  999;
    END;
 
    -- Initialize the output parameter
    SET     @FirstAllocated = NULL;
 
    -- Update the row associated with @SequenceName, returning the 
    -- current value, and then incrementing it by @RangeSize
    UPDATE  dbo.SequenceTable WITH (READCOMMITTEDLOCK)
    SET     @FirstAllocated = next_value,
            next_value = next_value + @RangeSize
    WHERE   sequence_name = @SequenceName;
 
    -- If @Allocated has a non-NULL value, we know we successfully updated a row
    RETURN  CASE WHEN (@FirstAllocated IS NOT NULL) THEN 0 ELSE -999 END;
 
END;

The important part of the procedure is the UPDATE statement. The output parameter @FirstAllocated is set to the pre-update value of the next_value column. The next_value column then has @RangeSize added to its value, thereby reserving a contiguous range of values for the procedure caller.

As a reminder, recall that the caller checks the value returned in @FirstAllocated, knowing that the next @RangeSize keys after that have been reserved exclusively for her use.

Where to Use a Sequence Table

Generally, you will want to use an identity column, if it fits the application requirements. Only consider a Sequence Table where some aspect of the identity property makes it unsuitable.

A Sequence Table might be the right choice where you need:

  1. To generate a custom sequence of a type not directly supported by identity columns; or
  2. To maintain a sequence of key values over more than one table; or
  3. A guarantee that a multi-row insert will receive a contiguous block of sequence values; or
  4. To pre-allocate a range of sequence values for performance reasons

Cases 1 and 2 are perhaps obvious. Between them, they cover custom sequences that cannot conveniently be implemented with an identity column.

Case 3 is quite interesting. If concurrent queries insert multiple rows to the same table with an identity column, SQL Server does not guarantee that the identity values allocated will be contiguous for each caller. In fact, the asynchronous nature of identity value allocations means that the values assigned will typically not be contiguous. This can be important to some applications, and there is more about this later on, in the section on concurrency.

Case 4 covers a class of question often seen on the forums. The identity value assigned to a row is only available after the insert has completed. Frequently, it would be more efficient to know the value before the insert is performed.

Consider a process which must add a number of related objects to the database. One option is to insert one object at a time, retrieve the identity value, and then use that value in subsequent inserts. If the application were able to pre-allocate the values, the insert operations could all be performed in a single batch, and a single transaction. This could prove to have very significant performance benefits.

This blog entry is concerned primarily with case four (pre-allocation of ranges). The Sequence Table method can be used to solve problems in all four cases.

Concurrency and Deadlocking

A common objection to using the Sequence Table design is that it might encounter concurrency problems. Although the process of allocating a new range is atomic and pretty fast, like any data modification it acquires an exclusive row-level lock on the Sequence Table, which will be held until the containing transaction completes.

Imagine a process (A) starting a transaction and requesting a new range of key values from the database via the Sequence Table allocation stored procedure.  The allocation completes very quickly (typically in a fraction of a millisecond), but process (A) goes on to perform another operation before committing its transaction. As luck would have it, this second operation encounters a delay of some sort. Process (A) continues to hold an exclusive lock on a Sequence Table row until it completes its transaction.

Meanwhile, a second process (B) also requests a range of key values, for the same sequence name as process (A). This request needs to update the same row that process (A) has locked exclusively. Process B is therefore blocked until process A completes its transaction.  This limitation means that solutions based on a Sequence Table historically often suffered from an unacceptable level of deadlocks.

In the example above, if process B already held a lock which process A must wait to acquire, we have a deadlock – neither process can make progress since it is waiting for a lock which is incompatible with one already held by the opposing process.

How SQL Server Does It

At this point, you might be wondering how SQL Server avoids this sort of problem when working with identity columns. How does SQL Server avoid these kinds of concurrency and deadlocking problems when users add multiple rows concurrently to the same table?

The answer is that SQL Server allocates identity values using an internal function that does not participate in user transactions. (This is the reason that the current identity value for a table is not reset when a user transaction rolls back – the change in identity value was not part of the aborted transaction).

Another way to look at this is to say that the allocation of identity values is performed using an ‘autonomous’ transaction – one that exists independently of any user transaction.

SQL Server does briefly lock an internal resource while updating the current identity value, but the lock is transient and held just long enough to prevent data corruption. The autonomous transaction is therefore very fast, and does not cause concurrency problems.

This autonomous behaviour is also the reason that two users concurrently executing a multi-row insert to the same table typically do not receive a contiguous range of identity values. Because identity allocations do not block, a single INSERT statement can be given multiple small ranges of identity values, which do not form a single contiguous sequence.

It appears that the way to avoid concurrency problems is to use an autonomous transaction. Unfortunately, SQL Server does not currently offer this facility – but it is possible to simulate it. The next section examines this in detail, since it is essential for a robust implementation of a Sequence Table.

Simulating Autonomous Transactions

The goal here is to find a way to allocate a range from the Sequence Table without holding long-term locks. This implies that we need to acquire our range in the context of a separate transaction. The most obvious way to achieve this is to use a separate database connection.

In SQL Server 2000, we could write an extended stored procedure to make a new connection to the database and update the Sequence Table on that connection. The exclusive lock on the Sequence Table row is held for a very short time on the second connection, and is unaffected by the open transaction on the first connection.

The downside to this is that extended stored procedures are difficult to write, and may affect the stability of the SQL Server instance. Extended stored procedures were deprecated with the release of SQL Server 2005.

From SQL Server 2005, a CLR stored procedure can be used as a direct replacement for an extended stored procedure. The CLR hosted environment makes this option very safe to use within SQL Server, and accessible to anyone with basic .NET coding skills.

In SQL Server 2008, there is a new way to achieve autonomous transactions in pure T-SQL. The method is fully documented and supported, and slightly out-performs the equivalent CLR implementation.

SQL Server 2005

A full implementation of a CLR stored procedure is included in the script at the end of this blog entry. Here, we will just take a quick look at the main features of this solution.

The CLR stored procedure needs to make a new connection to the SQL Server instance, call the T-SQL allocation stored procedure on that connection, and then return the results.

The first problem is that the CLR procedure needs to know the name of the SQL Server instance it was called from, in order to create a new connection to it.

We can find the information we need using the context connection – a built-in feature, which allows the CLR procedure to talk back to the database via the connection it was called on.

It is important to note that we cannot use the context connection to call the T-SQL allocation stored procedure, since the context connection is associated with the still-open original transaction. If we did that, we would not solve any of the concurrency or deadlocking problems.

The code to get the name of the SQL Server instance we need to connect to looks like this:

// Use the context connection to discover the name of the calling SQL Server
using (SqlConnection conn = new SqlConnection("context connection=true"))
{
    conn.Open();
    using (SqlCommand cmd =
        new SqlCommand("SELECT SERVERPROPERTY(N'ServerName')", conn))
        {
        serverName = (string)cmd.ExecuteScalar();
        }
}

Now that we have the name of the SQL Server instance, we can make a new connection to it:

// Construct the connection string for the non-context connection
SqlConnectionStringBuilder csb = new SqlConnectionStringBuilder();
csb.DataSource = serverName;
csb.IntegratedSecurity = true;
csb.Enlist = false;
 
// Use a separate transaction scope to allocate the range
using (SqlConnection conn = new SqlConnection(csb.ConnectionString))

It is important to specify (enlist = false) in the connection string, to ensure that the new connection has no link to the original transaction. The rest of the code in the CLR stored procedure is just concerned with calling the original allocation routine and returning the results.

At first look, it seems as if this method might be inefficient. Each allocation from the Sequence Table requires a call back to the server to find its name, and the creation of a new connection, in addition to the cost of calling the Sequence Table’s allocation procedure.

In practice though, the process is quite fast. The context connection already exists, and the new connection created is usually allocated from a pool. The whole process typically takes around one millisecond to complete.

This method works well, but does require that CLR integration be enabled.

SQL Server 2008

In this release, SQL Server introduces a new option for linked servers, which allows a purely T-SQL approach to autonomous transactions.  To use this new feature, we need to create a loopback linked-server – a linked server that points back to the same server it is created on.

We can then use the sp_serveroption system stored procedure, to set the new option “remote proc transaction promotion” to FALSE (or OFF). This setting prevents a local transaction from being promoted to a distributed transaction when making a call to the loopback linked-server.

The effect is that of an autonomous transaction – exactly what we need. Further details on this method can be found here in a blog entry by the SQL Programmability & API Development Team.

Again, a full implementation of this method can be found at the end of this entry. We will now look at the essential features of this approach.  First, we need to create a loopback linked-server, and set the options. The following code snippet takes care of that:

-- Create the loopback linked server
DECLARE @servername SYSNAME;
SET     @servername = CONVERT(SYSNAME, SERVERPROPERTY(N'ServerName'));
 
EXECUTE sys.sp_addlinkedserver
            @server = N'loopback', 
            @srvproduct = N'', 
            @provider = N'SQLNCLI', 
            @datasrc = @servername;
            
EXECUTE sys.sp_serveroption 
            @server = N'loopback', 
            @optname = 'RPC OUT', 
            @optvalue = 'ON';
 
-- Prevent a local transaction esclating to a distributed transaction (SQL Server 2008 only)
EXECUTE sys.sp_serveroption 
        @server = N'loopback', 
        @optname = 'remote proc transaction promotion',
        @optvalue = 'OFF';

Calling the allocation procedure is now very simple:

-- Call the procedure to allocate a range of 250 contiguous keys
EXECUTE @RC = 
        loopback.tempdb.dbo.Allocate_TSQL
            @SequenceName = N'Test Sequence',
            @RangeSize = 250,
            @FirstAllocated = @Value OUTPUT;

Using this method, we can take full advantage of the flexibility the Sequence Table offers, while avoiding concurrency issues and deadlocks. Calling the allocation procedure in this way is typically a little faster than using the CLR implementation.

Testing and Results

A full test rig is included at the end of this entry. This script contains all the code necessary to test the behaviour of each of the three methods presented in this article: the original stored procedure, the CLR stored procedure, and the method using a loopback linked server.

You will need a copy of SQL Server 2008 for best results. The test rig is written so that the SQL Server 2008 method will still work when run on SQL Server 2005, but you will not get the benefits of an autonomous transaction.

When run on SQL Server 2005, the loopback linked server method will hold long-term locks, and will be very much slower. The CLR stored procedure works well for both SQL Server 2005 and 2008 users.

The tests create a single test sequence and allocate a number of key values using each method, displaying the locks held by each open transaction.

The transaction wrapping each test is rolled back at the end, to show the effect on the Sequence Table. It is important to note that the non-blocking (CLR and loopback server) methods behave like an identity column in that the next value in the sequence is not affected by a roll back.

The output is reproduced and discussed below.

Original Allocation Stored Procedure

This implementation takes no special steps to avoid holding long-term locks. The locks held after allocating values from the test sequence are:

image022

An exclusive row-level lock is held on the Sequence Table, together with the normal higher-level intent-exclusive locks. This row-level lock will prevent other concurrent sessions allocating from the same sequence until the transaction commits or is rolled back.

This method is the fastest, but will exhibit poor concurrency, and may result in a high level of deadlocks, depending on the design of the calling application. This method is therefore not recommended.

2005 CLR Stored Procedure

This implementation uses a separate external connection to the database to avoid holding long-term locks. The locks held by this procedure are:

image024

The only locks held are schema-stability locks associated with the CLR assembly. These locks prevent the assembly from being dropped or modified while it is in use, and do not affect Sequence Table concurrency or deadlocking. The important fact is that there are no locks held on the Sequence Table itself – therefore no concurrency or deadlocking problems can occur.

This method is well suited to pre-allocating ranges of keys for an application.

2008 Loopback Linked Server

This method holds no locks at all, after the allocation is made, confirming the autonomous transaction behaviour we were seeking. This method is slightly faster than the CLR solution.

Performance

On a test run of ten thousand (arbitrarily large) range allocations, the CLR method averages around 2ms per call, and the loopback server method averages around 1ms per call.

It is important to note though, that both methods are slower than using a column with the IDENTITY property. If you can use a simple identity column, you probably should.

Other Considerations and Alternatives

Page Splits

Pre-allocating key values can be an excellent performance optimization. If, however, you were relying on an identity column to control index page splits, you should be careful to consider the implications of switching to a Sequence Table method.

In general, the potential changes in page-splitting behaviour are not a great concern. As always though, it is important to test any proposed solution thoroughly, and the Sequence Table method is no exception to that rule.

Alternatives

Other potential solutions to the wider problem include using sequential GUIDs, the OUTPUT clause, or maintaining key sequences outside SQL Server.

GUIDs are relatively wide for use as keys, and some people find them less convenient to work with, compared to integers. This method also requires a good way of producing them in sequential order. Of course, there are scenarios where it makes sense to use GUIDs, sequential or otherwise.

Using the OUTPUT clause to return allocated identity values is can be problematic in batch scenarios.  Using OUTPUT does not guarantee that the range of values assigned will be contiguous. SQL Server also does not typically return rows from the OUTPUT clause in any defined order. This leads on to problems mapping the returned identity values to the rows submitted for insertion. Nevertheless, there are occasions where this method can be made to work well.

You might also consider maintaining key values outside of SQL Server, possibly in a mid-tier component. One disadvantage is that the key values are then not available to any process running inside SQL Server. It might also be argued that key allocation, as a data entity, logically belongs in the database layer.

Conclusion

Properly implemented, a Sequence Table is a high-performing and robust solution for appropriate applications running on SQL Server 2005 or later. Anyone with a need to pre-allocate key values or to maintain a custom sequence should give a Sequence Table serious consideration.

© Paul White

email: SQLkiwi@gmail.com
twitter: @SQL_kiwi

Published Tuesday, October 19, 2010 5:34 AM by Paul White

Attachment(s): Sequence Table.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

merrillaldrich said:

A great post - one maybe dumb question: is it not enough to run the allocate proc before the start of your "actual" transaction? i.e.:

EXECUTE dbo.Allocate_TSQL <parameters>

BEGIN TRANSACTION

<do stuff>

COMMIT

October 18, 2010 12:36 PM
 

Robert L Davis said:

I worked on a sequencing project several years ago. The company was using a federated database system, and they wanted all databases on all servers to using the same sequencing generator to ensure that all ID's were unique across all servers. Needless to say, they had concurrency problems and performance problems because all 15 servers were calling back to 1 server for the sequence generator. Lots and lots of cross-server queries.

The system I implemented for them consisted of having a sequence generation master on 1 server that doled out blocks of sequences to individual servers. This allowed the assignment of ID vlaues to be localized on each servers, and only when the server was nearing the end of its sequencing would it get a new block of sequence ID's.

I also improved their performance by allowing the application to request a block of ID's for inserting multiple rows. Customers could upload Excel spreadsheets with data for bulk processing, and instead of making an individual call for a new ID for every row (row by agonizing row), they could request a block of IDs to make 1 call for the whole batch.

October 18, 2010 12:40 PM
 

Paul White said:

Hi Merrill,

First, thank you!

Yes, you certainly can run the allocation before starting a transaction - at least in the test rig.

The more general problem is that there might already be an active outer transaction.

Paul

October 18, 2010 12:47 PM
 

Paul White said:

Hi Robert,

Yes that's a very neat solution, thank you for taking the time to read and comment.

Paul

October 18, 2010 12:52 PM
 

Gianluca Sartori said:

Great post, Paul.

I've been using the CLR solution during the last 3 years and I have to say I'm very happy with it. It solved all our deadlocking issues.

As a side note, opening a new connection from a CLR assembly requires external access, which is something that could make some DBAs unhappy.

Keep doing the great job!

Gianluca

October 19, 2010 3:50 AM
 

Paul White said:

Hey Gianluca,

Thanks.  You're quite right about the need for the external access permission set.  See you around SSC!

Paul

October 19, 2010 5:13 AM
 

Casey said:

Paul,

I really appreciate your post. I have seen may responses to these questions about sequencing in a database with very few offering any practical advice beyond saying redesign your database or use built-in database sequencing (IDENTITY, AUTO_NUMBER, and so on). There is a lot of talk about concurrency issues without providing solutions to real problems. Here we have a practical solution and as good as it seems; you wisely express caution in its use.

I have not ignored all of the warnings about going outside the built-in database support for sequencing and so I have a question about other techniques to achieve the sequencing that is not quite fulfilled by SQL Server’s identity. There are places in a database design where you need a set of objects, but not so many are anticipated that the user cannot keep track of them without identifiers. I like to call these unnamed objects. I subscribe to using natural keys rather than surrogates in nearly every situation and so a table containing the set of unnamed objects will typically have a compound foreign key to a named object. Even though the object is unidentified to the user, it needs to be identified to the database and this is where a database generated identifier comes in. I will use an identity as a primary key here, but when the unnamed object table has dependent objects, I am never sure what to do with the foreign key in this other table. Should the primary key which is an identity become the foreign key to the new dependent object table? Should I make the new foreign key compound using both the identity and foreign key from the unnamed object? Now I have another seemingly robust and concurrency safe option to generate a sequence that is unique only to the foreign key rather than the entire unnamed object table. I would modify the Allocate_TSQL procedure to use the compound primary key of the named object instead of SequenceName with similar changes to the SequenceTable. Now I can have a more natural key for any table of objects dependant on the unnamed object. I know that the identity will work, but I am not all that comfortable with either of the other solutions for my foreign key dilemma. This new option seems right here and tables dependent on the unnamed object have a proper foreign key. Even if the unnamed object table is not involved in a query, the table dependent on the unnamed object can be joined to the named object table; the unnamed object table is not needed in these queries. This scenario happens at times and although it is simple to join in the unnamed object table, it just seems right to me to continue using the natural key developed for the named object.

So, is the sequence table right for this application or is this just a case of a stubborn natural key advocate trying desperately to hold on to what makes sense to him?

November 2, 2010 11:22 AM
 

Paul White said:

Denali (the next major release of SQL Server aka SQL11) will support proper sequences!

See Denis Gobo's excellent early review (CTP 1):

http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/a-first-look-at-sequences-in-sql-server

Thanks to Ted Krueger for tweeting that link.

Paul

November 9, 2010 10:45 PM
 

Brad said:

Correct me if I'm wrong, but the following should guarantee a sequenced identity for the current statement.

INSERT myTableWithIdentity WITH(TABLOCK)

Yes, this will cause deadlocks issues under high concurrency, but the sequence will be guaranteed at a statement level.  This might work for some where conconcurrency isn't an issue.

March 28, 2011 2:39 PM
 

Paul White said:

Hi Brad,

INSERT...SELECT...ORDER BY provides some guarantees that the identity values will be allocated to rows in the order specified, but it doesn't solve most of the problems discussed.  Using TABLOCK is, as you say, horrible for concurrency.

Paul

March 28, 2011 8:41 PM
 

Dale Burrell said:

You're probably not even monitoring this blog anymore :) but on the off-chance, I need a sequence as you describe, however if the requested key doesn't exist I need to add it in a concurrency safe way. Currently I am doing this by locking the table (ouch). How would you go about that?

June 27, 2014 10:06 PM
 

Paul White said:

Hey Dale. I do monitor all my old posts for comments, but you're right that I haven't thought about this subject for quite a while :)

To achieve what you're asking, I would replace the UPDATE in the allocation procedure with a MERGE that inserts if the sequence name doesn't exist, and updates otherwise. The target table in the MERGE statement requires a SERIALIZABLE hint to make this pattern safe under high concurrency, but otherwise it's quite straightforward.

June 28, 2014 6:48 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement