THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

How In-Memory Database Objects Affect Database Design: Hybrid Code

In my first attempts at building my code, I strictly went with either native or on-disk code. I specifically wrote the on-disk code to only use features that worked in-memory. This lead to one majorly silly bit of code, used to create system assigned key values. How would I create a customer number that was unique. We can’t use the Max(value) + 1 approach because it will be very hideous with MVCC isolation levels, since 100 connections might see the same value, leading to lots of duplication. You can’t see other connections, so you would duplicate data quickly.  I was also limited to not using sequence objects because they too are not allowed in native code.

So, I used a random number generator, like this:

DECLARE @CustomerNumber CHAR(10)
    SET @customerNumber = 'CU' + RIGHT('00000000' + CAST(CAST(100000000 * RAND() AS INT) AS VARCHAR(8)),8)
    IF NOT EXISTS (SELECT * FROM Customers.Customer WHERE CustomerNumber = @CustomerNumber)


This actually worked quite well for the tens of thousand rows I generated. The likelihood of two connections getting the same random value, and getting past the NOT EXISTS block was very unlikely. Now, obviously the probability of clashes will rise greatly as you approach the limits of the 8 digit number, so you would have to monitor usage and change the CU to something else to increment the value. This is demo code, and as this blog is about, there are easier ways. But it was fun to write and test.

Of course the worst part of this code isn’t the random number generator, or even the looping (oh, the dreaded looping), no this code is not very optimal, because of the NOT EXISTS subquery.  Best case we have to do one probe into the table to see if that value doesn’t exist.  While it was sub optimal in interpreted code, in native code, it got more silly looking because you can’t break out of a while loop, and you can’t use subqueries (nor the RIGHT function). So the code changed to the following awkward (yet operational) bit of code:

DECLARE @customerNumber CHAR(10), @rowcount INT, @keepGoing BIT = 1, @baseNumber VARCHAR(30)
WHILE @keepGoing = 1
    SET @baseNumber = '00000000' + CAST(CAST(100000000 * RAND() AS INT) AS VARCHAR(8))
    SET @customerNumber = 'CU' + SUBSTRING(@baseNumber,LEN(@baseNumber) - 8,8)
    SELECT @rowcount = COUNT(*) FROM Customers.Customer WHERE CustomerNumber = @CustomerNumber
    IF @rowcount = 0
        SET @keepGoing = 0


The inefficiency of this code is the same as the interpreted code: that query on CustomerNumber.  Of course, the more typical solution to the problem of a system generated key would be to use a SEQUENCE object (possibly as a default), and format the number somehow. But you can’t use sequences in native code, so instead of going fully native code, I am using a hybrid approach.

First, I built a procedure that generated CustomerNumbers, it has a simple checkdigit appended to a 7 digit number (preceded by CU) (and sloughing off any customer numbers with 666 in the value from the string just to show what can be done).

CREATE SEQUENCE Customers.Customer$CustomerNumber$Sequence

--use a sequence and format the output a bit to avoid lots of duplication
CREATE  PROCEDURE Customers.Customer$getNextCustomerNumber
    @customerNumber char(10) OUTPUT

    --doing it in a loop is the simplest method when complex requirements.
    WHILE (1=1)
        --Get the base account number, which is just the next value from the stack
        SET @customerNumber = 'CU' + right(replicate ('0',8) +
                    CAST(NEXT VALUE FOR Customers.Customer$CustomerNumber$Sequence as varchar(7)), 6)

        --add a check digit to the account number (take some digits add together, take the first number)
        SELECT @customerNumber = CAST(@customerNumber AS varchar(8)) +
               CAST(SUBSTRING(@customerNumber, 3,1) AS TINYINT) +
               POWER(CAST(SUBSTRING(@customerNumber, 5,1) AS TINYINT),2) +
               CAST(SUBSTRING(@customerNumber, 8,1) AS TINYINT) * 3 +
               CAST(SUBSTRING(@customerNumber, 9,1) AS TINYINT) * 2 +
               CAST(SUBSTRING(@customerNumber, 10,1) AS TINYINT) +
               CAST(SUBSTRING(@customerNumber, 11,1) AS TINYINT) * 3  AS VARCHAR(10)),1)

        --if the number doesn't have these character string in it (including check digit)
        if            @customerNumber NOT LIKE '%00000%'
                AND @customerNumber NOT LIKE '%666%'
            BREAK -- we are done

Pretty simple, just loops until a good number is found. Not looping through rows, but looping through sequence values, which is very fast and can be done by many simultaneous callers with almost no contention.  If contention is an issue, you can choose your caching for a sequence object, which can avoid a couple of writes when it persists that you have burned through the previously cached values.

So now, the code simply says:

DECLARE @CustomerNumber CHAR(10)
EXEC Customers.Customer$getNextCustomerNumber @CustomerNumber OUTPUT

Rather than that loop. But I can’t use that in a natively compiled procedure, so we create an interpreted procedure that calls this procedure, then calls the native procedure:

CREATE PROCEDURE Customers.Customer$CreateAndReturn
@FirstName NVARCHAR(30),
@LastName NVARCHAR(30),
@MiddleName NVARCHAR(30),
@EmailAddress NVARCHAR(200)


   --see if the customer exists… We don’t do updates here
    DECLARE @customerId INT = (SELECT CustomerId
                                FROM  Customers.Customer
                                WHERE EmailAddress = @EmailAddress) --we are assuming validation is done elsewhere

    IF @customerId IS NULL
            DECLARE @customerNumber CHAR(10)
            EXEC  Customers.Customer$getNextCustomerNumber @customerNumber OUTPUT

            EXEC @CustomerId = Customers.Customer$SimpleInMemCreate
            @CustomerNumber = @CustomerNumber,
            @FirstName = @FirstName,
            @LastName = @LastName,
            @MiddleName = @MiddleName,
            @EmailAddress = @EmailAddress


      RETURN COALESCE(@customerId, -100)

I haven’t added error handling just yet, but this is nearly the final version. The procedure to do the actual insert is just a simple insert using native compilation:

CREATE PROCEDURE Customers.Customer$SimpleInMemCreate
@CustomerNumber CHAR(10),
@FirstName NVARCHAR(30),
@LastName NVARCHAR(30),
@MiddleName NVARCHAR(30),
@EmailAddress NVARCHAR(200)
        DECLARE @customerId int
            INSERT INTO Customers.Customer
                    ( FirstName , MiddleName ,LastName ,
                        CustomerNumber ,  EmailAddress,
                        RowCreateTime, RowLastModifiedTime
            VALUES  ( @FirstName , @MiddleName ,@LastName ,
                        @CustomerNumber ,  @EmailAddress,
                       SYSDATETIME(), SYSDATETIME()
            SELECT @customerId = SCOPE_IDENTITY()

      RETURN isnull(@customerId, -100)

So we get the benefits of the compiled procedure (if there is any in the actual case, my demo code is fairly simplistic) coupled with anything in the interpreted code that could not be done in native mode.

Published Friday, August 08, 2014 9:49 PM by drsql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS


No Comments

Leave a Comment


This Blog


Links to my other sites

Privacy Statement