THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

  • SQL Saturday 489 Tampa-“How to Design a Relational Database” Precon

    On February 26, I will be doing my "How to Design a Relational Database" pre-conference session in Tampa, FL.  You can see the abstract here if you are interested, and you can sign up there too, naturally. Currently priced at 158.19 it is a bargain and I totally promise it will be around 8 hours of talking about and designing databases (including lunch and break time), which will certainly be better than what you do on a normal work day, even a Friday.

    You can read the abstract, but what should you really expect?  First off, expect to not sit in one spot while I drone on and on for 7 hours of lecture. Honestly preparing to talk for 7 hours straight would actually be easier to prepare for, but the best part of the class time is doing some database design. This means I need requirements that will touch on some of the more interesting design patterns and be achievable enough in around 45 minutes that we can all learn something from the practical experience of doing a design.

    The day is broken up into 3 or 4 modules (depending on where the class wants to go), with several participation exercises along the way. The sections are:

    Section 1: Introduction with a bit of history

    We start with just enough introduction to the materials, the stuff you need to do *before* you design, and introduction to the history of the craft to make sure we are all on the same page. A big part of this section is just getting it straight why we design like we do and establishing some of the terminology, and understanding the process of designing and testing a database.

    Section 2: Modeling and structures

    This section will cover the fundamental building blocks of relational databases, like tables, columns, keys, etc; and how to create a data model of the constructs. This is by far the largest part of the lecture, and by the end we should all be on the same page as to what goes into the database, if not exactly "how" the final product should look.

    We will stop at this point, and I will get out my modeling camera (which sounds a LOT more glamorous than it will turn out to be) and we will do some modeling on paper, eliciting attendees to provide the parts of a simple database, and we will all decide what should go into the model.

    The document that I base this model on is VERY simple, but the complexities of translating the document to a even a primative database design are always quite interesting to experience as a class, and I get an idea of who is going to be the outgoing class members at this point too.

    Section 3: Model Standardization/Normalization

    In this section, we will look at the kinds of things you need to do to the model to prepare the model to be implementable by truly analyzing the structures to see if they make "sense" within the confines of the relational model. It is always interesting to me that all models are normalized to some degree, but people think that normalizing makes things slower. In fact, normalization is generally to make business databases faster. The misconceptions about the higher normal forms are always confusing to me as well because the pre-concieved notions that people have are often really wrong. (I think too often the problem is that a properly built database does not lend itself to a simple UI…so you either get messy UIs or messy databases. It is really hard to get a team together that can do both and build the software to make messy business needs into beautiful, easy to use interfaces.

    Once we are done with the slide material in this section, we will start a couple of exercises. The first exercise is planned as a full class exercise, where I will man the data model (first on paper, then in a modeling tool), and elicit input from the class, in a manner that make sure everyone gets a say.

    Then we will break up into small teams and build a final model on paper, which I will bring up to the projector and we will discuss the different solutions.

    Section 4: Physical Modeling Overview

    Assuming we still have time/energy (and we don’t have discussion that lasts the rest of the day), we will take the last part of the class and cover turning the model into a "real" database. Data types, domain implementations, constraints, testing, etc. will be covered.

    Due to the limitations of the limited time format (in a perfect scenario we might have 3 or 4 days), and a *strong* preference of previous classes towards actually doing some design, there are topics we won't cover. But honestly, if you can get the basic design correct and make the model close to what final model ought to, the rest is kind of gravy (and well documented in a lot more websites and books than mine!) When I am building a database for a customer, I spend a lot more time figuring out the shape of the database objects than I do on the implementation aspects. Almost always you will find the problems with database designs come down to an inability to match the user’s requirements to yhr tables created. Missing an index here and there can be caught in testing and even after implementation (and that is why God created DBAs & Data Programmers).

    What I really love about doing all of the designs is that we really get the flavor of a real design meeting. A few differing opinions, a few ideas that I hadn't planned for, and a few argumentative types who really want their own way. But none of the arguments so far have gotten out of hand so far, and they have all been very much like the typical data modeling meeting.

    I hope to see you in class, and stay tuned to this blog entry for any more details as they come around. And don’t forget to register for the class: https://www.eventbrite.com/e/sql-saturday-precon-2016-how-to-design-a-relational-database-tickets-21099706791, and definitely don’t miss Saturday’s event http://www.sqlsaturday.com/489/eventhome.aspx. This is my first year to go to Tampa, so I am stoked to see how they do an event!

  • Dynamic Data Masking - Part 1 - The Mechanism

    This is part 1 of my Dynamic Data Masking blogs for my upcoming book project.

    An interesting feature that is being added to SQL Server 2016 is Dynamic Data Masking. What it does is, allow you to show a user a column, but instead of showing them the actual data, it masks it from their view. Like if you have a table that has email addresses, you might want to mask the data so most users can’t see the actual data when they are querying the data. It falls under the head of security features in Books Online (https://msdn.microsoft.com/en-us/library/mt130841.aspx), but as we will see, it doesn’t behave like classic security features, as you will be adding some code to the DDL of the table, and (as of this writing in CTP3.2, the ability to fine tune who can and cannot see unmasked data isn’t really there.)

    In a 2016 database, I will be executing my code. I am using a database named SimpleDemos, but any database will do.  The version I am using is 3.2, running on a Hyper-V VM on my laptop.:

    SELECT @@version;

    If you haven’t seen it yet, note that they have added a lot of stuff to @@version these days. Great if you are only using it for purposes like this.

    Microsoft SQL Server 2016 (CTP3.2) - 13.0.900.73 (X64)   Dec 10 2015 18:49:31   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)

    Now, say we have the following simple table structure (note that I would probably use a domain table for StatusCode values in a real system, but I wanted this to be a single table.

    CREATE SCHEMA Demo;
    GO
    CREATE TABLE Demo.Person --warning, I am using very small column datatypes in this example to make formatting of the output easier
    (
        PersonId    int NOT NULL CONSTRAINT PKPerson PRIMARY KEY,
        FirstName    nvarchar(10) NULL,
        LastName    nvarchar(10) NULL,
        PersonNumber varchar(10) NOT NULL,
        StatusCode    varchar(10) CONSTRAINT DFLTPersonStatus DEFAULT ('New')
                                CONSTRAINT CHKPersonStatus CHECK (StatusCode in ('Active','Inactive','New')),
        EmailAddress nvarchar(40) NULL,
        InceptionTime date NOT NULL, --Time we first saw this person. Usually the row create time, but not always
        --a number that I didn't feel could insult anyone of any origin, ability, etc that I could put in this table
        YachtCount   tinyint NOT NULL CONSTRAINT DFLTPersonYachtCount DEFAULT (0)
                                CONSTRAINT CHKPersonYachtCount CHECK (YachtCount >= 0),
    );

    Into which I will be loading in some simple (silly) data:

    INSERT INTO Demo.Person (PersonId,FirstName,LastName,PersonNumber, StatusCode, EmailAddress, InceptionTime,YachtCount)
    VALUES(1,'Fred','Flintstone','0000000014','Active','fred@flintstone@slatequarry.net','1/1/1959',0),
          (2,'Barney','Rubble','0000000032','Active','barneyrubble@aol.com','8/1/1960',1),
          (3,'Wilma','Flintstone','0000000102','Active',NULL, '1/1/1959', 1);

    Next, we want to let anyone in the company see this data, and we have decided that row level security is too cumbersome, and
    (for argument's sake,) that a view wasn't the right method to use. (Note: I am not currently sure whether this method is better
    than a view, but I can see a few advantages, certainly for a reporting database).

    So we have the following rules:

    We have three users (which in real life, would be roles, but keeping it simple for the non-relevant details:
        1. Those who can see all data (Test User: "BigHat")
        2. Those who can only see the first and last name of the person. Test User: ("SmallHat")
       
    The goal is that we set up this system such that this scenario is handled. There are (currently) four different types of masks we can apply to data:

    • Default – Takes the default mask of the datatype (NOT the default of the column)
    • Email – Masks the email so you only see a few meaningful characters
    • Random – Puts a random number in place of an actual number (which is kind of weird, as we will see)
    • Partial – where you control what characters to keep and what to replace them with

    So let’s take a look at each:

    Default

    Datatypes have different default masks they will apply, for example: string types are X characters, integers are 0 (which as we will see is confusing.)  I will start by giving every column (other than the names) default masks:

    ALTER TABLE Demo.Person ALTER COLUMN PersonNumber
        ADD MASKED WITH (Function = 'default()');
    ALTER TABLE Demo.Person ALTER COLUMN StatusCode
        ADD MASKED WITH (Function = 'default()');
    ALTER TABLE Demo.Person ALTER COLUMN EmailAddress
        ADD MASKED WITH (Function = 'default()');
    ALTER TABLE Demo.Person ALTER COLUMN InceptionTime
        ADD MASKED WITH (Function = 'default()');
    ALTER TABLE Demo.Person ALTER COLUMN YachtCount
        ADD MASKED WITH (Function = 'default()');

    Note that you can do this in the CREATE statement if you wish as well. Now, to check to see what this looks like, execute:

    SELECT *
    FROM    Demo.Person;

    Which returns:

    PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
    ----------- ---------- ---------- ------------ ---------- ---------------------------------------- ------------- ----------
    1           Fred       Flintstone 0000000014   Active     fred@flintstone@slatequarry.net          1959-01-01    0
    2           Barney     Rubble     0000000032   Active     barneyrubble@aol.com                     1960-08-01    1
    3           Wilma      Flintstone 0000000102   Active     NULL                                     1959-01-01    1

    You are probably thinking, what a ripoff (and eventually you will be thinking, why isn’t this changing and it will drive you a bit crazy…) but as the dbo, you are not going to see the masking. So we need to create the users we initially set, and grant them rights:

    CREATE USER BigHat WITHOUT LOGIN;
    CREATE USER SmallHat WITHOUT LOGIN;

    We will get a bit deeper into security in the next entry in this series, but to start with we need to give our users the ability to select from the table.

    GRANT SELECT ON Demo.Person TO BigHat;
    GRANT SELECT ON Demo.Person TO SmallHat;

    Now we can run the following statements and see the effect:

    EXECUTE AS User='SmallHat';
    GO
    SELECT *
    FROM   Demo.Person;
    GO
    REVERT;

    Which now returns:

    PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
    ----------- ---------- ---------- ------------ ---------- ---------------------------------------- ------------- ----------
    1           Fred       Flintstone xxxx         xxxx       xxxx                                     1900-01-01    0
    2           Barney     Rubble     xxxx         xxxx       xxxx                                     1900-01-01    0
    3           Wilma      Flintstone xxxx         xxxx       NULL                                     1900-01-01    0

    Note that the NULL value still shows up as NULL (meaning ‘UNKNOWN’) and the other values show up as ‘xxxx’, meaning ‘UNKNOWN, other than there IS a value’, which may be all that many users need to actually see. Everyone seems to have started in the database on 1900-01-01 (which may be an issue for some applications, but is at least an unreasonable value for most databases), and YachtCount is all 0s, which was actually the value for Fred, but not the others. Note that I included a default in the table DDL of ‘Unknown’ for StatusCode, but it was ignored for the ‘xxxx’ value, which is mildly disappointing. We will be able to set it to whatever value we want later, but it is nice when you can use the table’s structure to make the data easier.

    Email

    Next we have a simple mask for Email, so we remove the masking from the EmailAddress column, and add it back (You can currently change the masking without dropping the current masking, but I have seen it act weird… this is just CTP3.2).

    ALTER TABLE Demo.Person ALTER COLUMN EmailAddress DROP MASKED;

    ALTER TABLE Demo.Person ALTER COLUMN EmailAddress
        ADD MASKED WITH (Function = 'email()');

    Now query the data:

    EXECUTE AS User='SmallHat';
    go 
    SELECT * 
    FROM   Demo.Person;
    go 
    REVERT;

    And you can see the email is masked by showing the first character of the email address, then XXX@XXXX.com:

    PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
    ----------- ---------- ---------- ------------ ---------- ---------------------------------------- ------------- ----------
    1           Fred       Flintstone xxxx         xxxx       fXXX@XXXX.com                            1900-01-01    0
    2           Barney     Rubble     xxxx         xxxx       bXXX@XXXX.com                            1900-01-01    0
    3           Wilma      Flintstone xxxx         xxxx       NULL                                     1900-01-01    0

    Looking at the base data, notice that Fred’s email address is ‘fred@flintstone@slatequarry.net’, so the only meaningful character you are giving the user is the first character.

    Random

    Random is a very interesting function, mostly because it would take a very specific use (that I can’t think of right now) to make it make sense. I added YachtCount to the model so I could demonstrate (and not offend anyone with a count of body parts, cars, etc. I figure the intersection of Yacht owners and people reading this blog to be low enough to risk it.)

    So, let's try the random(start, end) function, that will replace a numeric value with a value between two values:

    ALTER TABLE Demo.Person ALTER COLUMN YachtCount
        ADD MASKED WITH (Function = 'random(1,100)'); --make the value between 1 and 100. You could make it always the same value pretty easily by using the same value for start and end

    Now, check out the data:

    EXECUTE AS User='SmallHat';
    GO
    SELECT *
    FROM   Demo.Person;
    GO
    REVERT;

    Which will return something along these lines:

    PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
    ----------- ---------- ---------- ------------ ---------- ---------------------------------------- ------------- ----------
    1           Fred       Flintstone xxxx         xxxx       fXXX@XXXX.com                            1900-01-01    65
    2           Barney     Rubble     xxxx         xxxx       bXXX@XXXX.com                            1900-01-01    92
    3           Wilma      Flintstone xxxx         xxxx       NULL                                     1900-01-01    64

    Not the most useful function, but it is there if you need it. If you are like me (pity), then you are wondering about numeric types (I know I was). So let’s create a quick table, named for the Blackadder fans in the crowd:

    CREATE TABLE demo.bob
    (
        value decimal(15,5) MASKED WITH (Function = 'random(-999999999,999999999)')
    );
    GRANT SELECT ON demo.bob to SmallHat;
    INSERT INTO demo.bob
    VALUES (1),(2),(3);
    GO

    EXECUTE AS User='SmallHat';
    GO
    SELECT *
    FROM   Demo.Bob;
    GO
    REVERT;

    Returns:

    value
    ---------------------------------------
    222199941.87482
    -137196271.70996
    -498355155.12580

    Partial

    The most configurable of the masking functions is partial. It lets you mask the characters you want. For example, our person number. If you want to keep 1 leading character, and 2 trailing, you make a mask such as:

    ALTER TABLE Demo.Person ALTER COLUMN PersonNumber
        ADD MASKED WITH (Function = 'partial(1,"-------",2)');
     ––note the double quotes on the text

    Looking at the data as user Smallhat,

    EXECUTE AS User='SmallHat';
    GO
    SELECT *
    FROM   Demo.Person;
    GO
    REVERT;

    you will see:

    PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
    ----------- ---------- ---------- ------------ ---------- ---------------------------------------- ------------- ----------
    1           Fred       Flintstone 0-------14   xxxx       fXXX@XXXX.com                            1900-01-01    99
    2           Barney     Rubble     0-------32   xxxx       bXXX@XXXX.com                            1900-01-01    6
    3           Wilma      Flintstone 0-------02   xxxx       NULL                                     1900-01-01    89

    The size of the dashes is controlled by you. If you said (1,”-“2), the first row would be 0-14. If you said (1,”-------------------“,14), the first row would be 0---------, as the returned value always fits in the original datatype.

    Finally, we want StatusCode to default to ‘Unknown’ when the value is masked. You can't pass parameters to default(), so you need to use partial to replace the entire value (slight dig on the name, pattern might have
    been better, but it is what it is.)

    ALTER TABLE Demo.Person ALTER COLUMN StatusCode
        ADD MASKED WITH (Function = 'partial(0,"Unknown",0)');

    Checking out the data:

    EXECUTE AS User='SmallHat';
    GO
    SELECT *
    FROM   Demo.Person;
    GO
    REVERT;
    GO

    You see that the data now says: ‘Unknown’:

    PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
    ----------- ---------- ---------- ------------ ---------- ---------------------------------------- ------------- ----------
    1           Fred       Flintstone 0-------14   Unknown    fXXX@XXXX.com                            1900-01-01    39
    2           Barney     Rubble     0-------32   Unknown    bXXX@XXXX.com                            1900-01-01    18
    3           Wilma      Flintstone 0-------02   Unknown    NULL                                     1900-01-01    12

    Summary, Blog 1

    Of course, whether that, or any of the methods we have seen here make sense is really a matter of design. If a value doesn’t look masked, that may signal a user that it isn’t masked. This particular blog is all about how the data masking feature works in doing the masking. In the next blog we will leave the data masked as it is, and we will look at what we can do via security. How do we let non-dbo users access the data, and what happens with WHERE clauses and UPDATE statements? What happens when the masked table is accessed through a view or stored procedure? See you real soon with answers to these and other questions.

  • Is It Ever Better to Not Name Constraints?

    This is part 1 of my Tips and Tricks blogs for my upcoming book project.

    For all of my professional writing career, I have been forced to be on the top of my formatting game when writing code. Uppercasing this, spacing this, aligning that, formatting code just right. In my personal life, I also try to format my code in a readable manner, though I do format code differently (I pretty much LOATHE uppercase characters), but I work really hard to make code readable so my coworkers don’t need maze traversal skills to read my code (or Red-Gate SQL Prompt to reformat).

    It has long been a habit that I name my constraints, and even if it wasn’t useful for database comparisons, it just helps me to see the database structure all that much eaiser. The fact that I as I get more experience writing SQL and about SQL, I have grown to habitually format my code a certain way makes it all the more interesting to me that I had never come across this scenario to not name constraints.

    Consider the following table structure:

    CREATE TABLE #testName
    (
        testNameId int CONSTRAINT PKTestName PRIMARY KEY
    )

    Create it on one connection, in the context of any database. Seems harmless enough, right? Executes, works just fine. Now, without dropping that table, go to another connection. Run the code again.

    Msg 2714, Level 16, State 5, Line 1
    There is already an object named 'PKTestName' in the database.
    Msg 1750, Level 16, State 0, Line 1
    Could not create constraint or index. See previous errors.

    Well, doesn’t that just beat all. Looking in the system tables (I won’t lie, I had to go back and capitalize SELECT, FROM and WHERE to prove my earlier point abut formatting):

    SELECT name, object_id, type
    FROM   tempdb.sys.objects
    WHERE  name = 'PKTestName'

    You will see it right there in the tables:

    name           object_id   type
    -------------- ----------- ----
    PKTestName     -1450612125 PK

    If you want to check to see if the table exists only once, run the following query:

    SELECT name
    FROM   tempdb.sys.objects
    WHERE  name like '#testName%'

    You will get back something like:

    name                                                                                                                           
    --------------------------------------------------------------------------------------------------------------------------------
    #testName___________________________________________________________________________________________________________000000000009

    Now, changing the create table query to:

    CREATE TABLE #testName
    (
        testNameId int PRIMARY KEY
    )

    Re-execute the query on sys.objects for the temp table and you will see two rows with a root name of #testName.

    name
    --------------------------------------------------------------------------------------------------------------------------------
    #testName___________________________________________________________________________________________________________000000000009
    #testName___________________________________________________________________________________________________________00000000000B

    So the answer to the question of is it ever better to not name your constraints is “yes”, but only when your table will not outlive your connection. Clearly you don’t want to be stuck with names like: PK__#testNam__3EE579F50DAE3402 which is the delightful name that was assigned the primary key object the second time. The third time: PK__#testNam__3EE579F5088FC982, which is just plain ugly.

    Bonus points if you thought, can’t you just make the constraint name a temporary name by putting a # in front like so?

    CREATE TABLE #testName
    (
        testNameId int CONSTRAINT #PKTestName PRIMARY KEY
    )

    It doesn’t work, but it was a good idea.

    Msg 8166, Level 16, State 0, Line 1
    Constraint name '#PKTestName' not permitted. Constraint names cannot begin with a number sign (#).

  • Less than Two Weeks to SQL Saturday Nashville!

    No sooner does the official holiday season end than SQL Saturday season begins. Next week on January 9 is the Atlanta BI edition, but the week of the 16th is the event here in Nashville (well, technically we will be in Murfreesboro, but it is close!). Tamera Clark (@tameraclark) is again heading up the team that is making it happen, and I will be there again this year (sans walker, thankfully), and for the first year, I will be speaking (so I can’t use Nashville to fulfill #3 on my Pass Year’s Resolutions!)

    If you are interested in attending on Saturday or even one of the awesome looking Pre-Cons, head over to: http://www.sqlsaturday.com/480/eventhome.aspx and get registered immediately. It is sure to be a great time of nerdy fellowship and learning. Check out the schedule and if you are not new to the SQL Community you will see a lot of names you recognize and trust along with some you will hopefully come to know in the future

    My presentation is a new one (that I am finishing up writing in the next few days) on concurrency, and the many different ways you tune and handle concurrent operations that access the same data.  The abstract is:

    Let Me Finish... Isolating Write Operations

    OLTP databases can be constantly written to and reporting databases are written to at least periodically. In order to ensure consistent results, connections must be isolated from one another while executing, ideally with the lowest possible cost to concurrency. How this isolation is handled is based on the isolation level, whether the classic lock based or the newer optimistic scheme of the in-memory OLTP engine is used, or even if both engines are enlisted in the same transaction. In this session we will look at examples of how SQL Server isolates reading and writing operations from other writing operations to explore how this may affect your application through error messages and performance hits.

    I love giving giving presentations at SQL Saturdays, but it is never as much fun (or as scary) when presenting a new topic (if you can call concurrency “new”). Every night until the conference I will be editing, and presenting to myself working to get it just right with just enough material to fill an hour. I hope to see you there, and if I don’t see you, there are 11 other concurrent sessions going on at the same time…thankfully not in the same room

  • Dragged Kicking and Screaming into the Clouds

    When I first heard about the cloud I rolled my eyes. This will never take, it sounds very much like the mainframes we came to know and hate (my first database project was killing a mainframe that was costing our non-profit organization $15000 a month, which as a non-profit was simply horrible.) Even four years ago when I was writing "Pro SQL Server 2012 Relational Database Design and Implementation" I pretty much didn't mention the cloud at all

    My how times have changed. The cloud, and Azure in particular is becoming more and more the necessity to target our database work, because, in my opinion, it is filling two specific scenarios which are hard to provide service for (along with being generally useful for even more purposes):

    1. Very small organizations: Places where they do not have the resources (human and computing) to provide a proper computing platform. The cost is likely far less for these scenarios in most cases
    2. Very large computing needs: If you need massive power, it may be easier to use Microsoft's platform to do your computing, particularly if you don’t need the power for long periods of time.

    What is truly awesome about the way AzureDB works is that it is very very similar to the on-premise/box product. There are two versions of Azure that applies to us relational database types that I will work with. The first is running the box product on an Azure VM. For the normal database programmer, this will not change very much at all. For some, if not all of the book, this is how I will be running the code as I won't be loading SQL Server 2016 on my laptop directly until RTM. It is certainly possible that I will use a local VM, but we will see. I will of course include my system settings wherever I include performance statistics.

    The second is Azure SQL Database. Azure SQL Database is basically where you get a database exposed to the Internet instead of a virtual machine. You get a surface area the is very similar to typical SQL Server databases with far fewer knobs to adjust. If you want to know much more about the services the earlier link will give you an overview, I won't cover too much about the Azure SQL Database platform, either here or in the book, but I will provide code that works in either location.

    Both of the products, the On Premises versions and the Azure SQL Database versions are part of the Relational Database family of products. They share a common base, and a common purpose: to work with relational data. They look basically the same, and operate mostly the same, and serve (at their core) very same purposes.

    As such I will make sure that all of the scripts that end up in the final book have been validated on the different editions of SQL Server (as I have always done), and have been executed at least once on Azure SQL Database as well. What I won't do is go into many details of how to connect to an Azure SQL Database, mostly because it takes quite a few pages to do so (I just tech edited a book that covers such details, and I will direct readers to that book for details on how to connect… Peter Carter “Pro SQL Server Admin” http://www.springer.com/gp/book/9781484207116).

    As a quick test, I took the code from the previous version of Chapter 6 (which I will be working on after I have written my presentation on Concurrency for SQL Saturday Nashville). Chapter 6 is a simple database create that demonstrates the basic process of changing a normalized base model to a physical database and the finishing touches that make even a 6 table design better.

    All of the code for Chapter 6 executed with no change, except for a few of the fundamental differences between AzureDB and the On-Prem versions:

    1. Dropping and creating the database (So I changed the script to drop all objects first, a practice I use for all clean model databases that will be used for unit tests and compare scripting with active databases).
    2. USE statements to change from master to the user database

    I hope this change to the book will be useful to the readers as they start to make changes from on-prem to cloud databases to see where the differences lie. I expect many people, even a lot of seasoned pros are wary of how much work the cloud versions of database implementations will be. I will cover in later blogs and the books some differences you need when building your databases, and in building your applications to leverage a platform where there may be some exaggerated latency (and some cost) if you don’t do a good job of building the entire system with bandwidth as a concern.

  • Post PASS Summit Resolutions

    Many years I have tried early in the year to list a set of “New Year’s Resolutions” but one thing I have noted every year is it isn’t New Year’s Day that is the defining part of my year, but rather it is the PASS Summit that defines my year. Usually, by the time the end of October rolls around, I am pretty spent. The summer has ended, and all of the events of the year are winding down, heading into the holiday season (I like start the holidays 2 weeks before Halloween and end them on Super Bowl weekend!) So usually I have pretty much gotten to the point where I am thinking “should I do this another year?” about when the Summit rolls around. Admittedly, the past five years have been hard because the hip replacement have made getting around somewhat more difficult at times. But attending the PASS Summit has always been the shot in the arm that gets me thinking and planning. Even last year when I attended PASS strictly via PASS TV, it was where I decide what I wanted to try to do for the year.

    Today I want to put out my list of things I resolve to do differently (or the same) in the PASS Community, based on my renewed energy I have from the two weeks I spent in Seattle with all of my fellow SQL Community, SQL Family, and SQL Friends learning and sharing (blog on that topic forthcoming).

    1. Don’t submit to speak again next year – As much as I appreciate the surprise by a few people that I wasn’t speaking, and the kind words by a few people about liking my sessions, this was my favorite year. I am not a natural speaker, so I spend a lot of time preparing and I miss a lot. This was the first year I actually ventured out with people for a meal or two, caught a movie, and attended sessions in all but 2 slots (I saw my SQL Friend Joe Webb whom, I hadn’t seen in forever and I started talking to him and 5 other people who kept stopping by (Worth it!)

    2. Don’t stop submitting to speak at SQL Saturdays\User Groups – As much as I didn’t miss speaking at PASS, I do love speaking at smaller venues where I can do most prep at home, and the room size doesn’t give me heart palpitations. I plan to submit to 4-6 more events, and I have 2 or 3 user group presentations planned.  I have hopes of doing one pre-con this year as well possibly, and I will submit to a few (It won’t kill me if I don’t get chosen, it is a lot of work!) I also hope to submit for SQL In The City next year if RedGate does it (and I see the call for speakers before it closes).

    3. Possibly attend a SQL Saturday and don’t submit to speak – It might be nice to just come to one and just hang out. I would be more than happy to work a booth for PASS or RedGate or whoever would want me to!

    4. Get a kilt and a costume to wear for PASS next year – Okay, so this one is probably a long shot (consider this my “lose weight” resolution that may not make it, because I will have to lose weight for the first one to occur), but I have a fantastic costume idea that was inspired by Bob Pusateri’s (@SQLBob) this year (nothing to do with Dilbert, but that is all I will say.)

    5. Get well enough to walk without making me and others feel bad – I know that I made some people uncomfortable this year watching me try to stand up after sitting down for a few minutes, but I felt better this year than I have in 6 years or more. It was just that muscles tighten when you sit for a while and it hurt really bad. 

    6. Keep being involved locally – I may not be the primary chapter leader for Nashville this year because there are so many great people to let have a swing. I don’t know what I will be doing for SQL Saturday, but I will certainly be there if I am in town.

    7. Finish my book early so I can bring some to give away at PASS Summit next year – I will have four chapter done by next week, and hopefully will be done with the all of the logical design stuff before the start of the year. Then I have to work in columnstore, in-memory, and all of the other database design changes that are in 2014 and 2016. Sounds easy and a ton of work at the same time (thankfully, I use very few screen shots.)

    8. Continue volunteering for the SQL PASS Program Committee – I have greatly enjoyed working with the program committee for the past several years, and last year we started a service to allow submitters to get a review of their abstracts done before they submitted it for the year. This year I hope we can expand the service to help more people who are great speakers to get the writing stuff out of the way.

    9. Blog somewhat more – And not just these post PASS blogs either. I started blogging about new stuff in SQL Server 2016 until I had to get to work on the book (and some other writing projects,) and I will minimally continue that process as SQL Server 2016 matures. After the book is finished, I do hope to get some mastery of the stuff that Azure offers as well once the book is written (more about that in the next blog I will publish.)

    10. Mystery Project -  I have started shopping around an idea to some folks to see if I can do something kinda different and interesting in the coming year (or so). I have a lot of details to work out, but I want to wait until some other thing have past before I do too much else. (I may send out a survey in the coming months on twitter to see what more people thing too.)

    In some respects, the most interesting part about resolutions is they stare at you and shout “YOU ARE NUTS” because it looks like a lot. But really, it is what I do for fun. I like doing stuff for the community because I learn stuff. Will I do as much as I hope over the next year? Probably not, as I never have. Will I have fun doing it, and will hopefully a few people learn a thing or two from it? Let’s hope, or I could be paying attention to the football game that is on my TV right now instead of editing this list for a third time before hitting submit.

  • PASS/MVP Summits Reflections

    Wow, it was great actually getting out to the summits this year. Sitting here in the SEATAC airport, waiting for it to be time to get in a flying tin can, I wanted to jot down a few reflections I had from the two weeks. Some of the things I noticed (that aren’t under some sort of NDA, at least) this year over my two weeks are:

    1. The Cloud is really taking hold – The first MVP Summit where they started pushing the cloud was really hard on most of our eyes (rolling your eyes isn’t bad once, but hundreds of times starts to hurt!) but it has really matured. With services like Azure Database,  Data Warehouse, Data Lake, Blob Storage, VM, Machine Learning and more, be prepared to hear more about Azure not just from me, but from other bloggers, tweeters, and don’t be surprised when your boss says: “I think we need to use Azure for _______.” It won’t be perfect for everyone and every need for a long time, but the use cases will be growing. One of my next blog posts will be on the effect that Azure will be having on the fifth edition of my database design book.
    2. In Memory OLTP is going to be really great in SQL Server 2016. Check out CTP 3.0 (or whatever is latest when you have finally gotten around to reading this) for more details. But with support for stuff that the classic SQL programmer is going to love (Foreign Keys, > 1 Uniqueness Constraint, Check Constraints, 2TB per database, better Compiled code surface area) and stuff that only a few people will truly need (Updatable Columnstore Indexes on In Memory tables), those features plus blistering speed will change the feature from niche to generally useful for a larger segment of users. There are still limitations and architectural things one must do to ensure success, but it is getting closer to a solution that anyone might logically employ.
    3. Microsoft’s BI tooling is exploding. PowerBI, Reporting Services, R running near the engine, and the aforementioned Azure tools and I am sure even more stuff that I didn’t pay attention to (my boss was at PASS this year to look at reporting stuff, I am still an engine man myself.) I did get to play around with several of the Azure tools, which were pretty impressivel/scary, I must admit.
    4. The PASS community is an interesting society. I am generally an introvert, and for my first few years I was oblivious to people, even when I was a speaker I would speak, then keep to myself.  As year’s have passed, I have become more clued in to what goes on, and have a ton of people I am acquainted with. There is a moniker, #sqlfamily that is used frequently to describe it, but I think family is to simple of a term to use for it all. Much like the word “love” has four classic meanings, plus a few others that we have added over the years (“I love cake!”). Another of my future blogs will discuss this super important topic (hmmm, where is that sarcasm font?
    5. The Surface 3 is a GREAT travel computer - Don’t try to run Visual Studio on it (I did, and it took thirty minutes to be half installed!) but it is super light, great battery life (I easily got a day of note taking and moderate VM/Azure DB use out of it. The kickstand made it perfect for me to use on my lap at times, and the tearaway keyboard made it work fine for me on the airplane for watching some football on the flight here. Even better was that I could charge it from a portable battery that kept me going for 2 days without hitting a plug…
    6. I am still out of shape – Every night, after a mere 10 or so hours of walking, attending, and talking, I was worn the heck out. I went to bed around 10 every night and got up at 7, having slept most of the night. Then I see on twitter people I am sitting near (or up on stage) have tweeted not that long ago that they were going to bed, having left the gathering I was at and headed to another. I am getting old, but so are many of the people I was hanging with at one part of the day. While it probably didn’t look like it, I was getting around a LOT better than I ever have at PASS. I had never walked past my hotel (the Homewood Suites on Pike) before, taking taxis everywhere. It took its toll and near the end, and especially this morning at the airport I was barely upright, but it is still progress.
    7. Redgate throws an excellent conference – SQL In The City is a lot of fun, and not too commercial at all (yes, the only vendor is RedGate, duh, but the sessions were not marketing and only sometimes were heavily using the tools. I will be back again next year if at all possible.
    8. SQL Conferences are fun – If you are reading this blog and do not share some connection on my family tree, then you would enjoy the PASS conference and/or a SQL Saturday. User group meetings are great, and I rarely miss ours in Nashville, but the variety of people you meet, and the selection of sessions makes it a wonderful, almost vacation-like atmosphere at times (in case the aforementioned boss is reading, I said ALMOST).
    9. @SQLJen and @mark_AzureCAT are awesome – They showed us a great time along with the many other people that work with them. The MVP Summit was awesome due to their work.

    Well, I actually have been away from work so long that I am starting to miss it, so time to join our daily standup meeting that I have missed for the past two weeks. Next week back to reality as we start to upgrade our 2008 R2 SSIS packages to use the package deployment model… See you all next year or at some SQLSaturday!

  • Thank *you* for my SQL Server MVP Award…

    Today, I was awarded (at least) one more time with the Microsoft MVP Award. This makes 12 years, and with this, my 11th renewal, I have yet to feel like it is something I can take for granted (you don’t want to be around me the last week of September!). It has been a great time so far, and I have no idea how many years I will be back (admittedly, renewal day is the day I am least concerned about getting renewed.) Today though, I wanted to take a few minutes and thank some of the people who have helped me get to where I am today. I will not be naming specific people, because, well, I said a few minutes not a few days.

    You might think that my largest gratification is to Microsoft for giving me the award. While I am effusively appreciative to Microsoft, my various MVP leads throughout the years, the wonderful people who have run the SQL Server MVP program, and quite a few specific Microsoft employees…not to mention those folks who put on the MVP Summit; Microsoft is the least of the people I want to thank. Just because they are least is not meant to minimize my appreciation for what they have done for me, but I want to thank the people who have given me the opportunity to serve so I could earn a chance a the award.

    The following list are people who I want to thank, and many of them are also MVPs (and many many others are nearly there, or will be there soon.)

    • My Hip Surgeon – He went out of his way to fix me up last year, and spent 6 long hours doing emergency surgery after a day of doing his long scheduled surgeries. He spent extra time with me, and helped me through the process in a very different way the second time. I was able to get back out at events quickly and in a lot less pain than even the first time.
    • SQL Saturday Organizers – I spoke at 6 events last year (and worked at another), which meant seven groups of dedicated folks put together an entire conference… without getting paid a nickel. All of the events went off perfectly, and there were like 1500+ people at all of these events put together. And there were TONS more than 6  events in total, heck I skipped two events when I was recovering. Without these events, it would be a lot harder for people to get speaking experience, and a lot harder for a lot of people to attend a conference.
    • Our Nashville SQL Server User Group Leadership – This was my first (and maybe last) year as the primary chapter leader, but I was certainly not doing it alone. I shared the lead with Robert Verell (@SQLCowbell), with a lot of support from Jim Houchin (@JimHouchin) and Jersey Ross, and we all did a share, and have had a solid year. I say maybe last because the people mentioned and more may take over and swap out over time. 
    • People on the Program Committee (and especially those who helped me with the Abstract Review process) – I violated my rule of naming names in the previous (and first) bullets, but not here. There were a lot of people who made the process work and work smooth. The plan for next year is to give more and more advice on your abstracts before you have to press: <submit>
    • My editors – I have two sets of editors at Simple-Talk and Apress that have helped me to become a better and better writer (even if I don’t always get it purfect in blogs!) Without them, I would not be half as good as I have become, and they still teach me new writing concepts constantly.
    • The SQL Community/Family/Friends – Yeah, this is the catch-all, if I didn’t mention you somehow previously, you fall in here. But I have a lot of people out there who have been a great help in getting me where I am today. When I don’t know something, I read your blogs, your tweets, and your forum posts, or sometimes ask you stuff over twitter or some forum. Without your materials, I couldn’t write mine (and it is my hope of hope that you sometimes use my material in my books and blogs the very same way.
    • My employer – They give me time to get out and go to events, and supports me all the way. I know it doesn’t hurt that I get to share new stuff with my coworkers, but I suppose it wouldn’t be smart if they gave me time to go to Disney World instead of SQL events, now would it?
    • SQLBlog.com – Adam Machanic (@adammachanic)and Peter Debetta (@debettap) I haven’t seen Peter in years, but I still appreciate that they have kept the blog running here so I and other bloggers can keep posting new material on a solid blogging platform.
    • The reader of this and my other blogs – If no one paid attention to what I wrote, it would not be half as fun to write…

    Even thanking the Microsoft as a corporation and the entire SQL community I don’t feel like I have thanked enough people. I feel very blessed to have worked with everyone listed in the past, and again in the future. This is a fabulous community of people supporting each other to do what is essentially their jobs. I don’t know too many people outside of technology who come home and keep doing their day job as a hobby, much less spend their own money on occasion to travel out and get or give training.  I am very much looking forward to the upcoming PASS and MVP summit a the end of this month, I still haven’t recovered from missing the events last year!

    Will I get another MVP Award? I don’t know, and for today at least, I don’t care (if you are not reading this on 2015-10-01, I do care!). But if I don’t get it next year, I do know that there are 100s of people represented in my list who I figure deserve a shot as well. To you I say good luck, and I selfishly hope they add more awardees so we can both be in… Thank you!

  • Heading to Orlando for SQL Saturday again this year

    Last year, I attended SQL Saturday Orlando again after missing a few years (I did my first database design precon there a few years back). I love the way they run their event, and as speakers, we did some fun stuff like being the ones that served lunch to the attendees. This year, I wasn’t sure if I had a chance to go, because I had scheduled a family vacation in early October to Disney World, having no idea when their event might be. When they announced the date, I realized it was a win-win-win situation for me, as my original vacation plans were from Oct 3-7, and their event was on the 10th. Two more days at Disney, AND some SQL learning? Chance to hang out with Andy, Karla, Bradley, and Kendall (not to mention the other speakers) Sold!

    The session I will be doing is the same one I have been doing at the past few events:

    “How In-Memory Database Objects Affect Database Design

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.”

    And it will be the last time I do this session in its current form, since SQL Server 2016 is coming out next year, and the changes it is making is going to make a tremendous difference to the conclusions I will make when I rework it. However, the basic material in the session won’t change, as the logical implications of the In-Memory architecture will not change significantly, just some major features like the different constraint types make the use cases for In-Memory explode because some types of data protection that were possible in code using a pessimistic (lock based) concurrency model are completely impossible using an optimistic (version based) concurrency model, and how collisions are handled are really quite different as well.

    Next year, along with an update to this presentation, I plan on making a session completely on the different concurrency models and collisions as well because the differences in concurrency models are the primary differences in how you will need to tailor your code.

  • Snapshot Isolation Level and Concurrent Modification Collisions - On Disk and In Memory OLTP

    This is part 1 of my In Memory OLTP blogs for my upcoming book project.

    In this blog I want to take a (reasonably) quick look at how concurrent modifications are handled using SNAPSHOT isolation level. There are two different flavors of SNAPSHOT Isolation level you can deal with, and while they are similar, there are some very big differences internally, as well as how they work when you interact with them.

    Generally speaking, SNAPSHOT isolation level is meant to be an implementation of an optimistic concurrency control system. Optimistic concurrency control resolves isolation issues during the execution of a transaction, watching for cases where one transaction may be affected by another by letting it occur, and checking during the COMMIT of the transaction or in some cases, cancelling the transaction during the execution. This is unlike the lock based concurrency based isolation mechanism we have grown to love/hate.

    In SQL Server 2005, Microsoft implemented SNAPSHOT isolation level for on disk tables that used a limited version of optimistic concurrency control, in that it still used locks to isolate certain operations from one another. Now, in SQL Server 2014 and beyond, they have implemented a more complete version of SNAPSHOT based isolation that behaves in a manner that allows for no locks or latches to be involved, but significantly changes how you as a programmer needs to handle possible collisions.

    In this blog, I will demonstrate how isolation is implemented for simple inserts and updates that want to insert duplicated data or update the same row. In SQL Server 2014, the lone unique index/primary was the only real location that a collision could occur, but in 2016, they are hopefully adding FOREIGN KEYs and more than one uniqueness constraint per table. Collisions that arise from these constraints will almost certainly follow the patterns that are shown from the current crop of collisions.

    The scripts were executed on:

    SELECT @@version

    Microsoft SQL Server 2014 - 12.0.2269.0 (X64);     Jun 10 2015 03:35:45 ; Copyright (c) Microsoft Corporation; Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 10240: )

    Using a database that has been configured for in memory OLTP (my script can be found here), I start by creating the following two tables:

    create schema demo;
    go

    create table demo.OnDisk
    (
        onDiskId int CONSTRAINT PKOnDisk PRIMARY KEY,
        value varchar(20)
    );
    go

    create table demo.InMem
    (
        InMemId int NOT
    NULL,
        value varchar(20) NOT
    NULL,
        CONSTRAINT XPKInMem PRIMARY KEY NONCLUSTERED HASH
        ( InMemId) WITH ( BUCKET_COUNT = 20), --small table!
    ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA
    );

    On Disk SNAPSHOT

    To show how collisions are handled using on disk tables in snapshot isolation level, I will execute the following code on one connection. Note that in all cases I will be using an explicit transaction for the demonstrations to "pause" time. The same effects are seen in implicit and very fast running transactions, but they are MUCH harder to demo. If you want to see the effects of concurrency in "real time", build an example using Adam Machanic's SQL QUERY Stress http://www.datamanipulation.net/SQLQueryStress/, I used it when developing my in memory presentation, and given data sets with duplications and lots of concurrency, you can see the effects occur:

    --Connection1
    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    go

    BEGIN TRANSACTION;
    insert into demo.OnDisk (onDiskId,value)
    values (1, 'Connection 1');

    And then on a second connection, execute:

    --Connection2
    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    go

    BEGIN TRANSACTION;
    insert into demo.OnDisk (onDiskId,value)
    values (1, 'Connection 2');
    GO

    Connection 2 is blocked, waiting to see what happens in the other connection. This is a very major difference between the on disk snapshot isolation level and what we will see in memory's lockless version. COMMIT the version on connection 1, and in the second connection you will see the error you will expect to see:

    Msg 2627, Level 14, State 1, Line 6
    Violation of PRIMARY KEY constraint 'PKOnDisk'. Cannot insert duplicate key in object 'demo.OnDisk'. The duplicate key value is (1).
    The statement has been terminated.

    Be sure and clear all transactions on both connection. The data in the table should look like:

    select @@TRANCOUNT as trancount, *
    from demo.OnDisk;

    trancount   onDiskId    value
    ----------- ----------- --------------------
    0           1           Connection 1

    Now we will try to update the row in two different connections, and as you should expect since the insert was blocked, it should be obvious what is going to happen here too.

    Execute:

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT
    go

    BEGIN TRANSACTION
    update demo.OnDisk 
    set    value = 'Still Connection 1'
    where  onDiskId = 1

    On one connection, and then afterwards, the following in another:

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT
    go

    BEGIN TRANSACTION
    update demo.OnDisk 
    set    value = 'I said Connection 2'
    where onDiskId = 1

    Blocked. Now, commit the transaction in the first transaction, and you will see:

    Msg 3960, Level 16, State 2, Line 5
    Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'demo.OnDisk' directly or indirectly in database 'snapshotDemo' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.

    The key bit of information in there is the word "conflict". As you will see in the next section of the blog on in memory OLTP tables, how and when a transaction is canceled due to a modification collision is directly related to whether it is a resource conflict, or a conflict with the constraints of a system.

    In Memory SNAPSHOT

    Turning our attention to the other SNAPSHOT isolation level, let's take a look at how it does conflict resolution. The biggest difference you will notice is that there is never any connection waiting, unlike with on disk. In any location where you might see a waiting condition, it will either just defer the check until COMMIT time (if there are no resource conflicts) or immediately if there are. When the next iteration of my database design book is released, I will have some discussions about the internals of how in memory objects are stored, but I will not go into it here just yet. In either case, I will state that I will in both places include the following statement: "do you want to know more? Check out Kalen Delaney's book here: http://www.red-gate.com/community/books/sql-server-internals-in-memory-oltp", largely because she is a great friend and a great writer/teacher. Secondarily because in either case my plans are to show structures conceptually, strictly as a mental guide to help you understand how your designs are affected by the internals and I don't want to go any deeper.

    Starting with the insert case from before, using the demo.InMem table, on one connection, execute:

    --Connection 1
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 1')

    First thing you will probably notice is a really annoying error message (because later you will need to use a hint to allow the table to be accessed in SNAPSHOT isolation level). Since we had set the isolation level to SNAPSHOT earlier, the following error is thrown:

    Msg 41332, Level 16, State 0, Line 4
    Memory optimized tables and natively compiled stored procedures cannot be accessed or created when the session TRANSACTION ISOLATION LEVEL is set to SNAPSHOT.

    So we change the batch to:

    --Connection1
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 1')

    And then in a second connection:

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 2')

    Both complete immediately. Looking at the data, you will immediately want to try:

    select *
    from demo.inMem

    But this will result in an error message that will also end the transaction:

    Msg 41368, Level 16, State 0, Line 46
    Accessing memory optimized tables using the READ COMMITTED isolation level is supported only for autocommit transactions. It is not supported for explicit or implicit transactions. Provide a supported isolation level for the memory optimized table using a table hint, such as WITH (SNAPSHOT).

    So don't do that! Instead, use:

    select *
    from demo.inMem (SNAPSHOT)

    On connection 1, this returns:

    inMemId     value
    ----------- --------------------
    1           Connection 1

    Connection 2 believes otherwise:

    inMemId     value
    ----------- --------------------
    1           Connection 2

    Next COMMIT Connection 1, and you will see it succeeded. Going to Connection 2, check the data (and the @@trancount) again:

    select @@trancount as tranCount, *
    from demo.inMem (SNAPSHOT)

    On my first try, I really expected it to fail... But it did not (because the value only violates a constraint at a data level, as it is not a resource conflict), and still sticks with its expectation that what it knows about the state of the data is correct:

    tranCount   InMemId     value
    ----------- ----------- --------------------
    1           1           Connection 2

    Now commit the transaction on Connection 2, and you will see the following error:

    Msg 41325, Level 16, State 1, Line 17
    The current transaction failed to commit due to a serializable validation failure.

    Which could be considered as annoying, as it doesn't initially make sense. Why a serializable validation? Modifications will be done in a serializable isolation level (just like in read committed where they use exclusive locks regardless of you trying to use NOLOCK!). In this case, the other row with the PK of 1 fails the no phantom criteria of serializable, so the failure. Try again to insert the row, you get:

    Msg 2627, Level 14, State 1, Line 3
    Violation of PRIMARY KEY constraint 'XPKInMem'. Cannot insert duplicate key in object 'InMem'. The duplicate key value is (1).
    The statement has been terminated.

    I won't go through it, but if you are trying out my code, do the same experiment with different primary key values to see that inserting 1 for the PK, and 2 for the other PK do not collide.

    What is interesting is to start over, but this time, on Connection 2 insert the row, fetch it, and then delete it. When you commit the transaction, what happens? (I just decided to try this, and I am not 100% sure myself… I guess it succeeds)

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 2')
    GO

    select *
    from demo.inMem WITH (SNAPSHOT)
    GO

    delete from demo.inMem with (SNAPSHOT)
    where InMemId = 1

    So far so good, this executes, now go back to connection 1 and COMMIT. Go back to Connection 2, and you will see:

    Msg 41325, Level 16, State 1, Line 12
    The current transaction failed to commit due to a serializable validation failure.

    So I was wrong (and started to guess that I was wrong as I kept reasoning through the problem, which is a lot easier when blogging than when trying to solve a customer issue on a high throughput system!). Even though the net effect is that I did nothing to data, we touched the same resources.

    Now let's do the same with an update statement, after clearing the table:

    rollback -- may not be necessary
    go

    delete from demo.inMem;

    insert into demo.inMem(inMemId, value)
    values (1,'First Row'),(2,'Second Row');

    On connection 1, change the value to '1st Row':

    --Connection 1
    BEGIN TRANSACTION;

    update demo.inMem with (SNAPSHOT)
    set value = '1st Row'
    where inMemId = 1;

    Now, on a different connection, first set the second row's value:

    --Connection2
    BEGIN TRANSACTION;

    update demo.inMem with (SNAPSHOT)
    set value = '2nd Row'
    where inMemId = 2;

    Now, inside of the same transaction, try to update row 1…

    update demo.inMem with (SNAPSHOT)
    set value = 'Not the 2ndRow'
    where inMemId = 1;

    Immediately you are given an error (well, errors):

    Msg 41302, Level 16, State 110, Line 3
    The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.

    Msg 3998, Level 16, State 1, Line 1
    Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
    The statement has been terminated.

    Now, on connection 2 you can see that the data has reverted and the transaction has been rolled back (as the error message said it was!)

    select @@trancount as tranCount, *
    from demo.inMem (SNAPSHOT)

    tranCount   InMemId     value
    ----------- ----------- --------------------
    0           1           First Row
    0           2           Second Row

    Conclusion

    Concurrency handling/Isolation using optimistic concurrency controls (or semi-optimistic as in on disk's version of SNAPSHOT) for data modifications is far trickier than using the basic pessimistic lock based methods. In my previous books, I have suggested that you would likely want to shy away from SNAPSHOT for connections that are doing data modifications, mostly because they still used locks, but then fail if the other connection has touched their resources, making the performance gain not nearly worth the effort.

    However, In Memory OLTP can EASILY be worth your time as it uses a lock-less/latch-less mechanism that makes any waiting done negligible. The side effect is that you can easily get resource collisions depending on how well you tune your queries, and the likelihood of multiple users accessing the same data for modification (and more, as I will show whenever I get to how retrieval works with modification statements someday.) Add to this things like FOREIGN KEY constraints, multiple UNIQUE constraints, and multiple queries in a stored procedure, and things can get tricky.

    The one major change to your thinking is that you need to see the messages like:

    Msg 41325, Level 16, State 1, Line 17
    The current transaction failed to commit due to a serializable validation failure.

    Not as error messages, but merely messages like a deadlock.

    Transaction (Process ID %d) was deadlocked on {%Z} resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

    Warnings to try again. The fact is, that is what the deadlock message states explicitly in the text. You were tangled up with another user. An occasional deadlock (if handled by the code as a simple retry of the batch/transaction batches) is nothing to worry about. Large numbers of them are signs of issues, and I expect that if you see tons of serializable validation failure (not a fun to say as deadlock!) messages that the same will be true. You have concurrency issues.

     

  • Utility to reset a sequence next value based on a table/sequence relationship

    So I was building a database the other day, and I used sequence objects for all of the defaults because it was FAR easier to load data into the table not needing to set identity_insert on and off over and over and over. Worked great. Went through all of my tests, everything was great. Got out Redgate Data Generator to load up a few hundred thousand noise rows and boom, primary key error. “Blerg,” I said, realizing that sequences don’t manage themselves like identities. So I wrote some code that looked a lot like this:

    DECLARE @nextValue numeric(38) = COALESCE((SELECT MAX(singleSequenceId)
                                               FROM   dbo.singleSequence),0) +1
                                           
    DECLARE @statement NVARCHAR(1000) = CONCAT('ALTER SEQUENCE dbo.singleSequence_sequence RESTART WITH ',@nextValue)
    EXEC (@statement)

    Because, like every good programmer, I wanted to at least somewhat automate the task. But like very at least slightly better than that good programmer, I wanted something a bit more like DBCC CHECKIDENT so I didn’t have to do that work… And what if the sequence was not started with 1, or with an increment other than 1? So you know what comes next, right? We automate that.  In the end, I created three procedures that could serve the purpose.

    1. Utility.Sequence$ResetBasedOnSpecificTable – Which simply lets you pass in the table and column, along with a sequence, and it resets it based on the values of the table and the metadata of the sequence.

    2. Utility.Sequence$ResetBasedOnSequence – Takes the sequence object, looks to see if it is used in one default constraint, not zero, which would be useless, not two, as that would be too many, and three is right out.

    3. Utility.Sequence$ResetBasedOnTableAndColumn – Takes the name of a table and column, and if it has a default that is for a single table, it uses that.

    If you want the code as a downloadable package, it is available here: http://www.drsql.org/Pages/DownloadablePackages.aspx under the name “sequence reset objects.sql”. As usual, when downloading code from the internet, use at your own risk, test, test, test, test and test again before trusting my code in your production environment and I will do the same with yours.

    The second two procedures use the first one, but just do the work of fetching and verifying the metadata. Here is the first procedure in which you specify table and sequence:

    create procedure  Utility.Sequence$ResetBasedOnSpecificTable
        @tableSchema sysname,
        @tableName    sysname,
        @columnName sysname,
        @sequenceSchema sysname,
        @sequenceName sysname,
        @ignoreDataTypeFlag bit = 0 --allow the user to ignore the check for a proper datatype
                                    --note that it uses the metadata for restarts, but it does not care if your default is
                                    --NEXT VALUE FOR sequenceName * 100, so you need to determine if this fits your needs..
    AS
        SET NOCOUNT ON

        --make sure the datatype of the target column is numeric (as you can cast the NEXT VALUE FOR in the query)
        if @ignoreDataTypeFlag = 0 AND
            not exists (
                        select *
                        from   sys.columns
                                 join sys.types   
                                    on columns.system_type_id = types.system_type_id
                        where  columns.scale = 0
                         and   types.name in ('tinyint','smallint','int','bigint','numeric','decimal')
                         and   columns.object_id = object_id(concat(@tableschema,'.',@tableName))
                         and   columns.name = @columnName
                      )
          begin
            throw 50000,'Either column does not exist, or is not of the base type for a sequence',1
            return -100
          end

        --fetch the metadata for the sequence
        declare @increment numeric(38), @current_value numeric(38), @startWith numeric(38)
        select @increment = cast(increment as numeric(38)),
                @startWith = cast(start_value as numeric(38))
        from   sys.sequences
                 join sys.schemas   
                    on schemas.schema_id = sequences.schema_id
        where  sequences.name = @sequenceName
          and  schemas.name = @sequenceSchema


        --no checking for real object names, because the following statement will take care of it
        DECLARE @statement NVARCHAR(4000) =

        --I am pretty sure my new favorite function is CONCAT! Note that if your input is bad, this statement will fail, so it will be
        --up to you do make sure that doesn't occur, and check for errors.
        CONCAT('DECLARE @nextValue numeric(38) = COALESCE((SELECT MAX(',@columnName,')
                                           FROM   ',@tableSchema,'.',@tableName,'),',@startWith - 1,') + ',@increment,'
                                          
               DECLARE @statement NVARCHAR(1000) = CONCAT(''ALTER SEQUENCE ',@sequenceSchema,'.',@sequenceName,' RESTART WITH '',@nextValue)
               EXEC (@statement)
               '
                )

        EXEC (@Statement)
       
    GO

    Using it is simple, just take one sequence and one table (they needn’t be linked to use this first procedure, but it will be useful in later examples):

    create sequence singleSequence_sequence as int start with 20 increment by 2
    go
    create table singleSequence
    (
        singleSequenceId    int constraint pkSingleSequence primary key default (next value for singleSequence_sequence)
    )
    go

    Then run/test it:

    insert into singleSequence
    values (20),(22),(24),(26) --The first two values in sequence, so the first two default values inserts will be primary key violations
    go

    Then, try to insert the value using the defaults:

    insert into singleSequence
    default values
    go

    You will get the following error, Execute it twice, and you will see another duplicate for 22, and it will keep failing until it gets to 28 if you go that way.

    Msg 2627, Level 14, State 1, Line 56
    Violation of PRIMARY KEY constraint 'pkSingleSequence'. Cannot insert duplicate key in object 'dbo.singleSequence'. The duplicate key value is (20).
    The statement has been terminated.

    But, instead of trying again, execute the reset, and you will see it works:

    exec Utility.Sequence$ResetBasedOnSpecificTable 'dbo','singleSequence','singleSequenceId','dbo','singleSequence_sequence'
    go
    insert into singleSequence
    default values
    go

    You will see that a new row has been created:

    select *
    from   singleSequence

    singleSequenceId
    ----------------
    20
    22
    24
    26
    28

    Next, let’s create the procedure that looks up the sequence based on the sequence name.

    create procedure Utility.Sequence$ResetBasedOnSequence
        @sequenceSchema sysname,
        @sequenceName sysname
    AS
        SET NOCOUNT ON

        declare @tableSchema sysname,
        @tableName    sysname,
        @columnName sysname,
        @referenceCount int

        --count to see that this sequence is referenced by one default (note that it could be used in a procedure, and there is no reason why
        --you can't use the sequence along with the table as @keyValue = NEXT VALUE FOR... insert ... values (@keyvalue, as this is a better
        --pattern than using scope_identity() ever was
        SELECT @referenceCount = (SELECT COUNT(*)
                                  FROM   sys.dm_sql_referencing_entities (CONCAT(@sequenceSchema,'.',@sequenceName), 'Object') as dsre
                                           join sys.default_constraints
                                            on dsre.referencing_id = default_constraints.object_id)
        --note too that you may have cases where the sequence is referenced twice in the same table. This is another fringe case I would relegate
        --to manually specifying table and column along with sequence.

        if @referenceCount = 0
          begin
            throw 50000,'Sequence either does not exists, or is not referenced in a default',1
            return -100
          end

        if @referenceCount > 1
          begin
            throw 50000,'Sequence is referenced by more than one default constraint',1
            return -100
          end

        --since we know there is one table and column using the sequence object, we can use it.
        SELECT @tableSchema = schemas.name,
                @tableName = tables.name,
                @columnName = columns.name
        FROM
            sys.dm_sql_referencing_entities (concat(@sequenceSchema,'.',@sequenceName), 'Object') as dsre
                join sys.default_constraints
                    on dsre.referencing_id = default_constraints.object_id
                join sys.columns
                    on default_constraints.parent_object_id = columns.object_id
                       and default_constraints.parent_column_id = columns.column_id
                join sys.tables
                    on columns.object_id = tables.object_id       
                join sys.schemas
                    on schemas.schema_id = tables.schema_id

        execute Utility.Sequence$ResetBasedOnSpecificTable
                                        @tableSchema =@tableSchema,
                                        @tableName    = @tableName,
                                        @columnName = @columnName,
                                        @sequenceSchema = @sequenceSchema,
                                        @sequenceName = @sequenceName


    GO

    To test this, we use the same code as previous, but just specify the sequence name:

    insert into singleSequence
    values (20),(22),(24),(26) --The first two values in sequence, so the first two default values inserts will be primary key violations
    go

    insert into singleSequence
    default values
    go

    Same error:

    Msg 2627, Level 14, State 1, Line 56
    Violation of PRIMARY KEY constraint 'pkSingleSequence'. Cannot insert duplicate key in object 'dbo.singleSequence'. The duplicate key value is (20).
    The statement has been terminated.

    But, instead of trying again, execute the reset, and you will see it works:

    exec Utility.Sequence$ResetBasedOnSequence 'dbo','singleSequence_sequence'
    go
    insert into singleSequence
    default values
    go

    And again, you will see that a new row has been created:

    select *
    from   singleSequence

    singleSequenceId
    ----------------
    20
    22
    24
    26
    28

    Finally, the procedure to let you specify the table and column, which is what you probably want to do realistically (again with the caveat that sequences are independent objects, and as such can be used for multiple purposes, use with care.

    create procedure Utility.Sequence$ResetBasedOnTableAndColumn
        @tableSchema sysname,
        @tableName   sysname,
        @columnName sysname
    AS
        SET NOCOUNT ON

        declare @sequenceSchema sysname,
        @sequenceName sysname,
        @referenceCount int

        --using sys.sql_expresssion_dependencies, check to see if the column has a default that references
        --the sequence
        select @sequenceSchema = referenced_schema_name, @sequenceName = referenced_entity_name
        from   sys.schemas
                join sys.tables
                    on schemas.schema_id = tables.schema_id
                join sys.columns
                        on columns.object_id = tables.object_id       
                    join sys.default_constraints
                        on default_constraints.parent_object_id = columns.object_id
                           and default_constraints.parent_column_id = columns.column_id
                    join sys.sql_expression_dependencies
                        on default_constraints.object_id = sql_expression_Dependencies.referencing_id
        where schemas.name = @tableSchema
          and tables.name= @tableName
          and columns.name = @ColumnName

        --then use the referencing entitys dmv, and see if it is being referenced > 1 time. It is important to be careful or you may
        --have the case where you are referencing it by two or more tables and resetting it to the next value for one won't be the next
        --value for another. This is a very fringe case, and would be way too much work to do to automate for the few use cases.
        SELECT @referenceCount = (SELECT COUNT(*)
                                  FROM   sys.dm_sql_referencing_entities (CONCAT(@sequenceSchema,'.',@sequenceName), 'Object') as dsre
                                               join sys.default_constraints
                                                on dsre.referencing_id = default_constraints.object_id)


        if @referenceCount = 0
          begin
            throw 50000,'Sequence either does not exists, or is not referenced in a default',1
            return -100
          end

        if @referenceCount > 1
          begin
            throw 50000,'Sequence used by column is referenced by more than one default constraint',1
            return -100
          end

        execute Utility.Sequence$ResetBasedOnSpecificTable
                                        @tableSchema =@tableSchema,
                                        @tableName    = @tableName,
                                        @columnName = @columnName,
                                        @sequenceSchema = @sequenceSchema,
                                        @sequenceName = @sequenceName

    go

    I won’t repeat that test code, as it is just repetitive to the others, but suffice it to say that replace the call for resetting the sequence to:

    exec Utility.Sequence$ResetBasedOnTableAndColumn 'dbo','singleSequence','singleSequenceId'

    And you get the same results. Finally, let’s test a few fringe cases, so I have proved that I tested them:

    What if the sequence is used in multiple tables? It would be bad to reset based on one table, but not the other..  So the procedures (other than the first one, where you explicitly provide object names to be used) check the metadata of the sequence object to make sure you aren't shooting yourself in the foot.

    create sequence doubleSequence_sequence as int start with 1 increment by 2
    go

    create table doubleSequence1
    (
        doubleSequence1Id    int constraint pkdoubleSequence1 primary key default (next value for doubleSequence_sequence)
    )
    GO
    create table doubleSequence2
    (
        doubleSequence2Id    int constraint pkdoubleSequence2 primary key default (next value for doubleSequence_sequence)
    )
    GO

    Trying to use the reset procedures:

    exec Utility.Sequence$ResetBasedOnSequence 'dbo','doubleSequence_sequence'

    Gives you this error:

    Msg 50000, Level 16, State 1, Procedure Sequence$ResetBasedOnSequence, Line 249
    Sequence is referenced by more than one default constraint

    And the other, a slightly different wording of the same error:

    exec Utility.Sequence$ResetBasedOnTableAndColumn 'dbo','doubleSequence1','doubleSequence1Id'

    Msg 50000, Level 16, State 1, Procedure Sequence$ResetBasedOnTableAndColumn, Line 272
    Sequence used by column is referenced by more than one default constraint


    Finally, let’s take the case where you have the same sequenced referenced twice in the same table:

    create sequence oneMoreDouble_sequence as int start with 1
    go
    create table oneMoreDouble
    (
        oneMoreDoubleId    int constraint pkoneMoreDouble primary key default (next value for oneMoreDouble_sequence),
        anotherColumn    varchar(10) default ('T' + cast(next value for oneMoreDouble_sequence as varchar(9)))
    )
    GO

    The first usage is a surrogate key value, and the other is a formatted version of that value (A ‘T’ prepended to it, fancy!) The calls used in the last double test will give you the same errors:

    exec Utility.Sequence$ResetBasedOnSequence 'dbo','oneMoreDouble_sequence'

    exec Utility.Sequence$ResetBasedOnTableAndColumn 'dbo','oneMoreDouble','oneMoreDoubleId', 'dbo','oneMoreDouble_sequence'

    But what if you specify the column? Specifying the text column will not work:

    Utility.Sequence$ResetBasedOnSpecificTable  'dbo','oneMoreDouble','anotherColumn', 'dbo','oneMoreDouble_sequence'

    Which will return the following error:

    Msg 50000, Level 16, State 1, Procedure Sequence$ResetBasedOnSpecificTable, Line 269
    Either column does not exist, or is not of the base type for a sequence

    But this call will work just fine, as it would have technically for the doubleSequence tables, though in this case it is a safe operation, as it is just referenced by the one table.

    Utility.Sequence$ResetBasedOnSpecificTable  'dbo','oneMoreDouble','oneMoreDoubleId', 'dbo','oneMoreDouble_sequence'

  • Upcoming Speaking Engagements in Indianapolis and Louisville

    Mercy, July is flying past at breakneck pace isn't it? The best part of July passing into the history books is that August has me speaking at two SQL Saturday's, one of them being a first time for me.

    On August 8 I will travel up to Indianapolis for SQL Saturday #402, a city I have been to a few times previously but not for a SQL Saturday. My most notable trip up included a stop at the Murat Theater to catch Elvis Costello back in 2002 with Laura Cantrell opening (When he gets an opening act, he does it right.) Coincidentally, Elvis will be in Nashville on the same day as SQL Saturday, and I will miss it, because, as I say when it fits my mood. SQL First!

    Then I will be up in Louisville for SQL Saturday #403 for what has become an annual favorite for several reasons. Firstly, the conference is always great, and Mala and team always do a good job. But the speaker dinner has always been excellent, and introduced at least one new favorite guilty pleasure, the yummy hot brown.

    Both conferences I will be presenting my presentation on In Memory OLTP:

    How In-Memory Database Objects Affect Database Design

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

    I hope to see everyone who reads this blog at both events, though you don't need to come to my session both times, only I will be forced to do that.

    After these events, September will be the last month of the year with any rest, as I have Orlando SQL Saturday #442 (with a Disney vacation with children attached, naturally), PASS, the MVP Summit (hopefully!), and the the holidays. Plus, I start the revision of the design book with hopes to have the first half done by the time we take down the Christmas tree!) SQL First!

  • Temporal Tables – Part 2 – Changing history

    After my post yesterday, I had planned to take a break for Independence Day holiday, but a commenter asked a question that piqued my interest. How might you edit history? Well, the answer is simple, but the process isn’t exactly simple. Basically, you have to turn off system versioning, update history, and reapply system versioning. Not difficult, but not something you can easily do just by editing a row in history as the temporal structures protect themselves.

    select @@version
    go
    Microsoft SQL Server 2016 (CTP2.0) - 13.0.200.172 (X64)   May 21 2015 10:53:07   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

    First warning, this is pre-release code, and is subject to change. I will try to make sure it works sooner or later with RTM, probably as I am doing my edits for the book.

    To see the metadata, let me create a table that I will not turn on system versioning:

    create table notTemporal
    (
        notTemporalId int primary key
    )

    Now, sys.tables has a few new columns for temporal, including temporal_type_desc and history_table_id which I will use to list the tables and their temporal usage (filtering out history tables).

    select CONCAT(schemas.name,'.',tables.name) as table_name,
           historyTableSchema.name + '.' + historyTable.name as history_table_name,
           tables.temporal_type_desc
    from   sys.tables
             join sys.schemas
                on tables.schema_id = schemas.schema_id
             left outer join sys.tables as historyTable
                join sys.schemas as historyTableSchema
                    on historyTable.schema_id = historyTableSchema.schema_id
                on historyTable.object_id = tables.history_table_id
    where tables.temporal_type_desc <> 'HISTORY_TABLE'

    go

    table_name           history_table_name                          temporal_type_desc
    -------------------- ------------------------------------------- -------------------------------------------
    dbo.company          dbo.MSSQL_TemporalHistoryFor_565577053      SYSTEM_VERSIONED_TEMPORAL_TABLE
    dbo.notTemporal      NULL                                        NON_TEMPORAL_TABLE

    The dbo.company table is the one from part 1, and you can see its history table. We will see that this contains all of the historical changes, and not the current row, still with a gap between the top two rows because I deleted the row temporarily.

    select *
    from   dbo.MSSQL_TemporalHistoryFor_565577053

    This returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045

    And the current row is in the primary table:

    select *
    from   dbo.company

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999

    Ok, the next stuff I am going to try is to see what happens if you monkey with the data in this table as is, here in CTP 2.0. Use cases might be to remove bad rows, errored data, etc. But mischief is also kind of fun when you come upon a new technology too.)

    update dbo.company
    set   SysEndTime = '2015-06-30 00:37:08'

    Cool, it protects this data

    Msg 13537, Level 16, State 1, Line 55
    Cannot update SYSTEM_TIME PERIOD columns in table 'testTemporal.dbo.company'.

    What about the SysStartTime?

    update dbo.company
    set   SysStartTime = '2015-06-30 00:37:08'

    Same deal

    Msg 13537, Level 16, State 1, Line 62
    Cannot update SYSTEM_TIME PERIOD columns in table 'testTemporal.dbo.company'.

    Not allowed either. And modifying the history table directly is also not allowed. Here I want to make the row appear to have existed since the 15th of June, rather than the 30th:

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-15'
    where  companyId = 1
    and   sysEndTime = '2015-06-30 00:16:19.1614451'

    Wah, wah, wah….

    Msg 13561, Level 16, State 1, Line 70
    Cannot update rows in a temporal history table 'testTemporal.dbo.MSSQL_TemporalHistoryFor_565577053'.

    However, if you want to make a change to the data, or load in old history data, you can do this by turning off versioning, building the table as it is needed, and then turning versioning back on, making sure to specify the table to use.

    alter table dbo.company
        set (SYSTEM_VERSIONING = OFF);

    Now you can run your update query to set the start time back to June 15, so this row existed longer in history than it really did.

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-15'
    where  companyId = 1
    and   sysEndTime = '2015-06-30 00:16:19.1614451'

    Succeeds. Now, we turn back on system versioning, being sure to specify the history table, or you will get another new table (second repeat, but it is important or you will have a lot of these tables around). If the table is large, you may not want to do the consistency check, but I generally would myself  because structures like this can be tricky to get right, so it will be safer to let it check the structure.

    ALTER TABLE dbo.Company
        SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.MSSQL_TemporalHistoryFor_565577053, DATA_CONSISTENCY_CHECK = ON));

    Now looking at the entire history, you see the row started at 6-15, not 6-30.

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    Which returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-15 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Now you want to see the row as of 6-15:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-15' as priorCompany

    It is there!

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-15 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Still not on 6-14:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-14' as priorCompany

    Update: Security Concurrency

    A commenter asked about how this works with concurrency, and I also wanted to note about security. Generally speaking, this is definitely not something that just anyone should do. The security of turning off system versioning will require the use to have rights to alter the table’s structure. Not general purpose rights to be sure, though you could wrap the functionality in a stored procedure if it were desired…

    For concurrency, a concern is that while you have system versioning turned off, you will not be gathering history. A way to prevent other connections from messing with the table while you are altering history is to use a transaction. Start the transaction, and turn off system versioning:

    BEGIN TRANSACTION

    alter table dbo.company
        set (SYSTEM_VERSIONING = OFF);

    Now go off to another connection and try to execute INSERT INTO dbo.company (name, companyNumber) VALUES ('Company 2','00002') and on another, SELECT * FROM dbo.company (NOLOCK)
    You will be blocked on both connections due to the exclusive schema lock from the table alter, yes, even NOLOCK has to wait for a schema lock. So you can make the history change, safely, while everyone else waits patiently.

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-14'
    where  companyId = 1
    and   sysStartTime = '2015-06-15 00:00:00.0000000'

    Now, turn back on system versioning:

    ALTER TABLE dbo.Company
        SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.MSSQL_TemporalHistoryFor_565577053, DATA_CONSISTENCY_CHECK = ON));

    Now check the history structure:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-14 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Now commit the transaction on the other connection and run the statement, again:

    COMMIT TRANSACTION
    GO
    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by CompanyNumber, SysStartTime desc
    go

    You can see that the new row was added from your other connection:


    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-14 00:00:00.0000000 2015-06-30 00:16:19.1614451
    2           Company 2                      00002         2015-07-01 22:42:10.3875967 9999-12-31 23:59:59.9999999

    Clearly not something you want to do in a highly concurrent system too often, but it will be data safe in any case.

    Very cool stuff… More to come… Standard caveat: I won’t be doing any guessing in these blogs because if I guess right and NDA information is out there that I missed, I could get in trouble accidentally. But I will try to make sure if there are big changes in syntax that I note it, or redo the blog syntax so I don’t end up with bad information out there.

  • Temporal Tables – Part 1 - Simple Single Table Example

    In my first entry in my series of posts on DB Design Book Topics, I want to start with a pretty interesting topic, Temporal Tables. It is a very cool new feature that takes something that used to be very painful (capturing history of changes to a table, allowing the user to see the data at a point in time), and makes it very easy.

    So if you have a row in a table, and it is created, updated, and then deleted, knowing how the row looked at a given period of time can be very useful. I wanted to start with a very basic example, to show how thing work, and later entries in this series will expand to multiple rows and tables.

    select @@version --Features are apt to change. Test if you are using a later CTP

    ---------------------------------------------------------------------------------------------------------------------------
    Microsoft SQL Server 2016 (CTP2.0) - 13.0.200.172 (X64)   May 21 2015 10:53:07   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

    First off, we need to create a workspace. I will just call the database testTemporal:

    create database testTemporal
    go
    use testTemporal
    go

    Nothing needed to be done to allow temporal, just create a database on the 2016 instance. The table needs to have a few new things, highlighted in the next example:

    create table dbo.company
    (
        companyId    int identity(1,1) primary key,
        name        varchar(30) unique,
        companyNumber char(5) unique,
        SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL, --the time when this row becomes in effect
        SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,   --the time when this row becomes no longer in effect
        PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime)  
    )
    WITH (SYSTEM_VERSIONING = ON); --Note that you can use a table of your own. More on that in a later blog entry…
    go

    Simple enough, and if you want to see more about the create table syntax, check BOL here (https://msdn.microsoft.com/en-us/library/ms174979.aspx) as this is a CTP edition and subject to change.

    So let’s create a row in the table:

    insert into dbo.company (name, companyNumber)
    values ('Company1','00001')

    select SCOPE_IDENTITY(); --If you don't mess up, this will be 1. We will use this in our examples
    go

    Now we change something in the table a few times to let us have a few changes to see in the example:

    update company
    set    name = 'Company Name 1'
    where  companyId = 1

    And update it again:

    update company
    set    name = 'Company Name 2'
    where  companyId = 1

    This time update with no changes:

    update company
    set    name = 'Company Name 2'
    where  companyId = 1

    To see the row exactly as it currently exists, just use a normal select statement:

    select *
    from   company
    where  companyId = 1

    You will see that looks exactly as you expect:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-05 02:15:32.7938411 9999-12-31 23:59:59.9999999

    To see all versions of the rows, use the FOR SYSTEM_TIME clause with CONTAINED IN (read more here about temporal and FOR SYSTEM_TIME): https://msdn.microsoft.com/en-us/library/dn935015(v=sql.130).aspx):

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order by SysEndTime Desc

    This returns all of the row versions that have been created:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451

    There are a few things of note. The first SysStartTime value will be when the row is inserted. The last row will be to 9999-12-31 23:59:59.9999999. Note too that when we updated the row with no actual data changes, we still get a new version.

    Also when working with the times and the FOR SYSTEM_TIME clause, be careful to include the time up to the fractional seconds or you may not get what you expect. When using CONTAINED IN, if you don’t put the nines out to all seven decimal places, you won't get the current row due to roundoff:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.999999') –Only six decimal places
    order by SysEndTime Desc

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451


    The more interesting use will be to work with a row (or rows) at a certain point in time, like to get the second version from 2015-06-30 00:16:19.1614451, we will use FOR SYSTEM_TIME AS OF, which takes a datetime2 value, and returns the row where SysStartTime >= PassedValue > SysEndTime. (The PassedValue can also be a variable.)

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany

    Note that the time was from the SysStartTime, which also was the SysEndTime for a different row. This returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199

    You can also use FOR SYSTEM_TIME in a JOIN criteria and see multiple versions of the row in your query:

    select company.Name, priorCompany.Name as PriorName
    from   company
             join company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany
                on company.companyId = priorCompany.companyId

    This will return:

    Name                           PriorName
    ------------------------------ ------------------------------
    Company Name 2                 Company Name 1

    I expect that may be a pattern that gets used in a later blog to calculate changes since a time period! Finally, lets take a quick look at what happens in a delete:
           
    delete from company
    where  companyId = 1

    Now, checking the data, we see that there is no 9999 row version:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')

    --There are the same four rows, but now all rows have an end time in the same millienium as we live in:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045

    So looking for a row at a past time, the row did still exist:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany

    With the only difference being that the SysEndTime isn’t what is once was:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199

    But looking at the table currently, no row:

    select *
    from   company

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------

    So finally, what happens when we replace the row using the same surrogate key value? (Not discussing here if this is a good idea, or bad idea…And this has led me to wonder if we can adjust history if the delete was accidental… Ah, fodder for later)

    set identity_insert dbo.company on
    go
    insert into dbo.company (companyId, name, companyNumber)
    values (1, 'Company1','00001')
    go
    set identity_insert dbo.company off
    go

    And then look at all of the row versions that exist now?

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    --You can see that the row now exists, but there is now a gap between the top two rows:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451

    Looking at the data at the current row’s SysStartTime:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:37:07.1375063' as priorCompany

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999

    But a very very small amount of time before, not there:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:37:07.1375062' as priorCompany

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------

    So there is a quick look at some of the basic functionality that we can expect with temporal data in 2016.

    Note: I won’t be doing any guessing in these blogs because if I guess right and NDA information is out there that I missed, I could get in trouble accidentally. But I will try to make sure if there are big changes in syntax that I note it, or redo the blog syntax so I don’t end up with bad information out there.

    Continue to Part 2 – Changing History

  • Speaking this week at #sqlsatChattanooga and for the PASS Performance Virtual Chapter

    The topic for both sessions will be: How In-Memory Database Objects Affect Database Design and the following is the abstract:

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

    The PASS Performance Virtual Chapter session will be 1:00 PM Central Time on Thursday the 25th. You can see more details and register here: http://performance.sqlpass.org/

    The SQL Saturday Chattanooga session will be at 3:00 PM Eastern Time on Saturday the 27th. You can see details about the entire conference here: http://www.sqlsaturday.com/410/EventHome.aspx 

    I am pretty excited about talking about SQL Server In-Memory Technologies, but also it is a bit interesting because of how much SQL Server 2016 is going to change thing. I will do my best to make notes on how it will change the usage and design aspects of In-Memory as I go. You can see a list of how things are changing here in Aaron Bertrand’s blog entry. In 2016, it won’t change the In-Memory OLTP stuff to be general purpose, but it certainly will change how effective it can be used in certain scenarios.

    I do wish I had more time to play with 2016 before my presentations, but I don’t believe it would change any of my current opinions on usage (other than having quite a bit more data integrity built in, but I certainly will be testing out the new stuff in the coming months as I work on my set of pre-book blogs.

More Posts Next page »

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement