THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

  • Dragged Kicking and Screaming into the Clouds

    When I first heard about the cloud I rolled my eyes. This will never take, it sounds very much like the mainframes we came to know and hate (my first database project was killing a mainframe that was costing our non-profit organization $15000 a month, which as a non-profit was simply horrible.) Even four years ago when I was writing "Pro SQL Server 2012 Relational Database Design and Implementation" I pretty much didn't mention the cloud at all

    My how times have changed. The cloud, and Azure in particular is becoming more and more the necessity to target our database work, because, in my opinion, it is filling two specific scenarios which are hard to provide service for (along with being generally useful for even more purposes):

    1. Very small organizations: Places where they do not have the resources (human and computing) to provide a proper computing platform. The cost is likely far less for these scenarios in most cases
    2. Very large computing needs: If you need massive power, it may be easier to use Microsoft's platform to do your computing, particularly if you don’t need the power for long periods of time.

    What is truly awesome about the way AzureDB works is that it is very very similar to the on-premise/box product. There are two versions of Azure that applies to us relational database types that I will work with. The first is running the box product on an Azure VM. For the normal database programmer, this will not change very much at all. For some, if not all of the book, this is how I will be running the code as I won't be loading SQL Server 2016 on my laptop directly until RTM. It is certainly possible that I will use a local VM, but we will see. I will of course include my system settings wherever I include performance statistics.

    The second is Azure SQL Database. Azure SQL Database is basically where you get a database exposed to the Internet instead of a virtual machine. You get a surface area the is very similar to typical SQL Server databases with far fewer knobs to adjust. If you want to know much more about the services the earlier link will give you an overview, I won't cover too much about the Azure SQL Database platform, either here or in the book, but I will provide code that works in either location.

    Both of the products, the On Premises versions and the Azure SQL Database versions are part of the Relational Database family of products. They share a common base, and a common purpose: to work with relational data. They look basically the same, and operate mostly the same, and serve (at their core) very same purposes.

    As such I will make sure that all of the scripts that end up in the final book have been validated on the different editions of SQL Server (as I have always done), and have been executed at least once on Azure SQL Database as well. What I won't do is go into many details of how to connect to an Azure SQL Database, mostly because it takes quite a few pages to do so (I just tech edited a book that covers such details, and I will direct readers to that book for details on how to connect… Peter Carter “Pro SQL Server Admin” http://www.springer.com/gp/book/9781484207116).

    As a quick test, I took the code from the previous version of Chapter 6 (which I will be working on after I have written my presentation on Concurrency for SQL Saturday Nashville). Chapter 6 is a simple database create that demonstrates the basic process of changing a normalized base model to a physical database and the finishing touches that make even a 6 table design better.

    All of the code for Chapter 6 executed with no change, except for a few of the fundamental differences between AzureDB and the On-Prem versions:

    1. Dropping and creating the database (So I changed the script to drop all objects first, a practice I use for all clean model databases that will be used for unit tests and compare scripting with active databases).
    2. USE statements to change from master to the user database

    I hope this change to the book will be useful to the readers as they start to make changes from on-prem to cloud databases to see where the differences lie. I expect many people, even a lot of seasoned pros are wary of how much work the cloud versions of database implementations will be. I will cover in later blogs and the books some differences you need when building your databases, and in building your applications to leverage a platform where there may be some exaggerated latency (and some cost) if you don’t do a good job of building the entire system with bandwidth as a concern.

  • Post PASS Summit Resolutions

    Many years I have tried early in the year to list a set of “New Year’s Resolutions” but one thing I have noted every year is it isn’t New Year’s Day that is the defining part of my year, but rather it is the PASS Summit that defines my year. Usually, by the time the end of October rolls around, I am pretty spent. The summer has ended, and all of the events of the year are winding down, heading into the holiday season (I like start the holidays 2 weeks before Halloween and end them on Super Bowl weekend!) So usually I have pretty much gotten to the point where I am thinking “should I do this another year?” about when the Summit rolls around. Admittedly, the past five years have been hard because the hip replacement have made getting around somewhat more difficult at times. But attending the PASS Summit has always been the shot in the arm that gets me thinking and planning. Even last year when I attended PASS strictly via PASS TV, it was where I decide what I wanted to try to do for the year.

    Today I want to put out my list of things I resolve to do differently (or the same) in the PASS Community, based on my renewed energy I have from the two weeks I spent in Seattle with all of my fellow SQL Community, SQL Family, and SQL Friends learning and sharing (blog on that topic forthcoming).

    1. Don’t submit to speak again next year – As much as I appreciate the surprise by a few people that I wasn’t speaking, and the kind words by a few people about liking my sessions, this was my favorite year. I am not a natural speaker, so I spend a lot of time preparing and I miss a lot. This was the first year I actually ventured out with people for a meal or two, caught a movie, and attended sessions in all but 2 slots (I saw my SQL Friend Joe Webb whom, I hadn’t seen in forever and I started talking to him and 5 other people who kept stopping by (Worth it!)

    2. Don’t stop submitting to speak at SQL Saturdays\User Groups – As much as I didn’t miss speaking at PASS, I do love speaking at smaller venues where I can do most prep at home, and the room size doesn’t give me heart palpitations. I plan to submit to 4-6 more events, and I have 2 or 3 user group presentations planned.  I have hopes of doing one pre-con this year as well possibly, and I will submit to a few (It won’t kill me if I don’t get chosen, it is a lot of work!) I also hope to submit for SQL In The City next year if RedGate does it (and I see the call for speakers before it closes).

    3. Possibly attend a SQL Saturday and don’t submit to speak – It might be nice to just come to one and just hang out. I would be more than happy to work a booth for PASS or RedGate or whoever would want me to!

    4. Get a kilt and a costume to wear for PASS next year – Okay, so this one is probably a long shot (consider this my “lose weight” resolution that may not make it, because I will have to lose weight for the first one to occur), but I have a fantastic costume idea that was inspired by Bob Pusateri’s (@SQLBob) this year (nothing to do with Dilbert, but that is all I will say.)

    5. Get well enough to walk without making me and others feel bad – I know that I made some people uncomfortable this year watching me try to stand up after sitting down for a few minutes, but I felt better this year than I have in 6 years or more. It was just that muscles tighten when you sit for a while and it hurt really bad. 

    6. Keep being involved locally – I may not be the primary chapter leader for Nashville this year because there are so many great people to let have a swing. I don’t know what I will be doing for SQL Saturday, but I will certainly be there if I am in town.

    7. Finish my book early so I can bring some to give away at PASS Summit next year – I will have four chapter done by next week, and hopefully will be done with the all of the logical design stuff before the start of the year. Then I have to work in columnstore, in-memory, and all of the other database design changes that are in 2014 and 2016. Sounds easy and a ton of work at the same time (thankfully, I use very few screen shots.)

    8. Continue volunteering for the SQL PASS Program Committee – I have greatly enjoyed working with the program committee for the past several years, and last year we started a service to allow submitters to get a review of their abstracts done before they submitted it for the year. This year I hope we can expand the service to help more people who are great speakers to get the writing stuff out of the way.

    9. Blog somewhat more – And not just these post PASS blogs either. I started blogging about new stuff in SQL Server 2016 until I had to get to work on the book (and some other writing projects,) and I will minimally continue that process as SQL Server 2016 matures. After the book is finished, I do hope to get some mastery of the stuff that Azure offers as well once the book is written (more about that in the next blog I will publish.)

    10. Mystery Project -  I have started shopping around an idea to some folks to see if I can do something kinda different and interesting in the coming year (or so). I have a lot of details to work out, but I want to wait until some other thing have past before I do too much else. (I may send out a survey in the coming months on twitter to see what more people thing too.)

    In some respects, the most interesting part about resolutions is they stare at you and shout “YOU ARE NUTS” because it looks like a lot. But really, it is what I do for fun. I like doing stuff for the community because I learn stuff. Will I do as much as I hope over the next year? Probably not, as I never have. Will I have fun doing it, and will hopefully a few people learn a thing or two from it? Let’s hope, or I could be paying attention to the football game that is on my TV right now instead of editing this list for a third time before hitting submit.

  • PASS/MVP Summits Reflections

    Wow, it was great actually getting out to the summits this year. Sitting here in the SEATAC airport, waiting for it to be time to get in a flying tin can, I wanted to jot down a few reflections I had from the two weeks. Some of the things I noticed (that aren’t under some sort of NDA, at least) this year over my two weeks are:

    1. The Cloud is really taking hold – The first MVP Summit where they started pushing the cloud was really hard on most of our eyes (rolling your eyes isn’t bad once, but hundreds of times starts to hurt!) but it has really matured. With services like Azure Database,  Data Warehouse, Data Lake, Blob Storage, VM, Machine Learning and more, be prepared to hear more about Azure not just from me, but from other bloggers, tweeters, and don’t be surprised when your boss says: “I think we need to use Azure for _______.” It won’t be perfect for everyone and every need for a long time, but the use cases will be growing. One of my next blog posts will be on the effect that Azure will be having on the fifth edition of my database design book.
    2. In Memory OLTP is going to be really great in SQL Server 2016. Check out CTP 3.0 (or whatever is latest when you have finally gotten around to reading this) for more details. But with support for stuff that the classic SQL programmer is going to love (Foreign Keys, > 1 Uniqueness Constraint, Check Constraints, 2TB per database, better Compiled code surface area) and stuff that only a few people will truly need (Updatable Columnstore Indexes on In Memory tables), those features plus blistering speed will change the feature from niche to generally useful for a larger segment of users. There are still limitations and architectural things one must do to ensure success, but it is getting closer to a solution that anyone might logically employ.
    3. Microsoft’s BI tooling is exploding. PowerBI, Reporting Services, R running near the engine, and the aforementioned Azure tools and I am sure even more stuff that I didn’t pay attention to (my boss was at PASS this year to look at reporting stuff, I am still an engine man myself.) I did get to play around with several of the Azure tools, which were pretty impressivel/scary, I must admit.
    4. The PASS community is an interesting society. I am generally an introvert, and for my first few years I was oblivious to people, even when I was a speaker I would speak, then keep to myself.  As year’s have passed, I have become more clued in to what goes on, and have a ton of people I am acquainted with. There is a moniker, #sqlfamily that is used frequently to describe it, but I think family is to simple of a term to use for it all. Much like the word “love” has four classic meanings, plus a few others that we have added over the years (“I love cake!”). Another of my future blogs will discuss this super important topic (hmmm, where is that sarcasm font?
    5. The Surface 3 is a GREAT travel computer - Don’t try to run Visual Studio on it (I did, and it took thirty minutes to be half installed!) but it is super light, great battery life (I easily got a day of note taking and moderate VM/Azure DB use out of it. The kickstand made it perfect for me to use on my lap at times, and the tearaway keyboard made it work fine for me on the airplane for watching some football on the flight here. Even better was that I could charge it from a portable battery that kept me going for 2 days without hitting a plug…
    6. I am still out of shape – Every night, after a mere 10 or so hours of walking, attending, and talking, I was worn the heck out. I went to bed around 10 every night and got up at 7, having slept most of the night. Then I see on twitter people I am sitting near (or up on stage) have tweeted not that long ago that they were going to bed, having left the gathering I was at and headed to another. I am getting old, but so are many of the people I was hanging with at one part of the day. While it probably didn’t look like it, I was getting around a LOT better than I ever have at PASS. I had never walked past my hotel (the Homewood Suites on Pike) before, taking taxis everywhere. It took its toll and near the end, and especially this morning at the airport I was barely upright, but it is still progress.
    7. Redgate throws an excellent conference – SQL In The City is a lot of fun, and not too commercial at all (yes, the only vendor is RedGate, duh, but the sessions were not marketing and only sometimes were heavily using the tools. I will be back again next year if at all possible.
    8. SQL Conferences are fun – If you are reading this blog and do not share some connection on my family tree, then you would enjoy the PASS conference and/or a SQL Saturday. User group meetings are great, and I rarely miss ours in Nashville, but the variety of people you meet, and the selection of sessions makes it a wonderful, almost vacation-like atmosphere at times (in case the aforementioned boss is reading, I said ALMOST).
    9. @SQLJen and @mark_AzureCAT are awesome – They showed us a great time along with the many other people that work with them. The MVP Summit was awesome due to their work.

    Well, I actually have been away from work so long that I am starting to miss it, so time to join our daily standup meeting that I have missed for the past two weeks. Next week back to reality as we start to upgrade our 2008 R2 SSIS packages to use the package deployment model… See you all next year or at some SQLSaturday!

  • Thank *you* for my SQL Server MVP Award…

    Today, I was awarded (at least) one more time with the Microsoft MVP Award. This makes 12 years, and with this, my 11th renewal, I have yet to feel like it is something I can take for granted (you don’t want to be around me the last week of September!). It has been a great time so far, and I have no idea how many years I will be back (admittedly, renewal day is the day I am least concerned about getting renewed.) Today though, I wanted to take a few minutes and thank some of the people who have helped me get to where I am today. I will not be naming specific people, because, well, I said a few minutes not a few days.

    You might think that my largest gratification is to Microsoft for giving me the award. While I am effusively appreciative to Microsoft, my various MVP leads throughout the years, the wonderful people who have run the SQL Server MVP program, and quite a few specific Microsoft employees…not to mention those folks who put on the MVP Summit; Microsoft is the least of the people I want to thank. Just because they are least is not meant to minimize my appreciation for what they have done for me, but I want to thank the people who have given me the opportunity to serve so I could earn a chance a the award.

    The following list are people who I want to thank, and many of them are also MVPs (and many many others are nearly there, or will be there soon.)

    • My Hip Surgeon – He went out of his way to fix me up last year, and spent 6 long hours doing emergency surgery after a day of doing his long scheduled surgeries. He spent extra time with me, and helped me through the process in a very different way the second time. I was able to get back out at events quickly and in a lot less pain than even the first time.
    • SQL Saturday Organizers – I spoke at 6 events last year (and worked at another), which meant seven groups of dedicated folks put together an entire conference… without getting paid a nickel. All of the events went off perfectly, and there were like 1500+ people at all of these events put together. And there were TONS more than 6  events in total, heck I skipped two events when I was recovering. Without these events, it would be a lot harder for people to get speaking experience, and a lot harder for a lot of people to attend a conference.
    • Our Nashville SQL Server User Group Leadership – This was my first (and maybe last) year as the primary chapter leader, but I was certainly not doing it alone. I shared the lead with Robert Verell (@SQLCowbell), with a lot of support from Jim Houchin (@JimHouchin) and Jersey Ross, and we all did a share, and have had a solid year. I say maybe last because the people mentioned and more may take over and swap out over time. 
    • People on the Program Committee (and especially those who helped me with the Abstract Review process) – I violated my rule of naming names in the previous (and first) bullets, but not here. There were a lot of people who made the process work and work smooth. The plan for next year is to give more and more advice on your abstracts before you have to press: <submit>
    • My editors – I have two sets of editors at Simple-Talk and Apress that have helped me to become a better and better writer (even if I don’t always get it purfect in blogs!) Without them, I would not be half as good as I have become, and they still teach me new writing concepts constantly.
    • The SQL Community/Family/Friends – Yeah, this is the catch-all, if I didn’t mention you somehow previously, you fall in here. But I have a lot of people out there who have been a great help in getting me where I am today. When I don’t know something, I read your blogs, your tweets, and your forum posts, or sometimes ask you stuff over twitter or some forum. Without your materials, I couldn’t write mine (and it is my hope of hope that you sometimes use my material in my books and blogs the very same way.
    • My employer – They give me time to get out and go to events, and supports me all the way. I know it doesn’t hurt that I get to share new stuff with my coworkers, but I suppose it wouldn’t be smart if they gave me time to go to Disney World instead of SQL events, now would it?
    • SQLBlog.com – Adam Machanic (@adammachanic)and Peter Debetta (@debettap) I haven’t seen Peter in years, but I still appreciate that they have kept the blog running here so I and other bloggers can keep posting new material on a solid blogging platform.
    • The reader of this and my other blogs – If no one paid attention to what I wrote, it would not be half as fun to write…

    Even thanking the Microsoft as a corporation and the entire SQL community I don’t feel like I have thanked enough people. I feel very blessed to have worked with everyone listed in the past, and again in the future. This is a fabulous community of people supporting each other to do what is essentially their jobs. I don’t know too many people outside of technology who come home and keep doing their day job as a hobby, much less spend their own money on occasion to travel out and get or give training.  I am very much looking forward to the upcoming PASS and MVP summit a the end of this month, I still haven’t recovered from missing the events last year!

    Will I get another MVP Award? I don’t know, and for today at least, I don’t care (if you are not reading this on 2015-10-01, I do care!). But if I don’t get it next year, I do know that there are 100s of people represented in my list who I figure deserve a shot as well. To you I say good luck, and I selfishly hope they add more awardees so we can both be in… Thank you!

  • Heading to Orlando for SQL Saturday again this year

    Last year, I attended SQL Saturday Orlando again after missing a few years (I did my first database design precon there a few years back). I love the way they run their event, and as speakers, we did some fun stuff like being the ones that served lunch to the attendees. This year, I wasn’t sure if I had a chance to go, because I had scheduled a family vacation in early October to Disney World, having no idea when their event might be. When they announced the date, I realized it was a win-win-win situation for me, as my original vacation plans were from Oct 3-7, and their event was on the 10th. Two more days at Disney, AND some SQL learning? Chance to hang out with Andy, Karla, Bradley, and Kendall (not to mention the other speakers) Sold!

    The session I will be doing is the same one I have been doing at the past few events:

    “How In-Memory Database Objects Affect Database Design

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.”

    And it will be the last time I do this session in its current form, since SQL Server 2016 is coming out next year, and the changes it is making is going to make a tremendous difference to the conclusions I will make when I rework it. However, the basic material in the session won’t change, as the logical implications of the In-Memory architecture will not change significantly, just some major features like the different constraint types make the use cases for In-Memory explode because some types of data protection that were possible in code using a pessimistic (lock based) concurrency model are completely impossible using an optimistic (version based) concurrency model, and how collisions are handled are really quite different as well.

    Next year, along with an update to this presentation, I plan on making a session completely on the different concurrency models and collisions as well because the differences in concurrency models are the primary differences in how you will need to tailor your code.

  • Snapshot Isolation Level and Concurrent Modification Collisions - On Disk and In Memory OLTP

    This is part 1 of my In Memory OLTP blogs for my upcoming book project.

    In this blog I want to take a (reasonably) quick look at how concurrent modifications are handled using SNAPSHOT isolation level. There are two different flavors of SNAPSHOT Isolation level you can deal with, and while they are similar, there are some very big differences internally, as well as how they work when you interact with them.

    Generally speaking, SNAPSHOT isolation level is meant to be an implementation of an optimistic concurrency control system. Optimistic concurrency control resolves isolation issues during the execution of a transaction, watching for cases where one transaction may be affected by another by letting it occur, and checking during the COMMIT of the transaction or in some cases, cancelling the transaction during the execution. This is unlike the lock based concurrency based isolation mechanism we have grown to love/hate.

    In SQL Server 2005, Microsoft implemented SNAPSHOT isolation level for on disk tables that used a limited version of optimistic concurrency control, in that it still used locks to isolate certain operations from one another. Now, in SQL Server 2014 and beyond, they have implemented a more complete version of SNAPSHOT based isolation that behaves in a manner that allows for no locks or latches to be involved, but significantly changes how you as a programmer needs to handle possible collisions.

    In this blog, I will demonstrate how isolation is implemented for simple inserts and updates that want to insert duplicated data or update the same row. In SQL Server 2014, the lone unique index/primary was the only real location that a collision could occur, but in 2016, they are hopefully adding FOREIGN KEYs and more than one uniqueness constraint per table. Collisions that arise from these constraints will almost certainly follow the patterns that are shown from the current crop of collisions.

    The scripts were executed on:

    SELECT @@version

    Microsoft SQL Server 2014 - 12.0.2269.0 (X64);     Jun 10 2015 03:35:45 ; Copyright (c) Microsoft Corporation; Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 10240: )

    Using a database that has been configured for in memory OLTP (my script can be found here), I start by creating the following two tables:

    create schema demo;
    go

    create table demo.OnDisk
    (
        onDiskId int CONSTRAINT PKOnDisk PRIMARY KEY,
        value varchar(20)
    );
    go

    create table demo.InMem
    (
        InMemId int NOT
    NULL,
        value varchar(20) NOT
    NULL,
        CONSTRAINT XPKInMem PRIMARY KEY NONCLUSTERED HASH
        ( InMemId) WITH ( BUCKET_COUNT = 20), --small table!
    ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA
    );

    On Disk SNAPSHOT

    To show how collisions are handled using on disk tables in snapshot isolation level, I will execute the following code on one connection. Note that in all cases I will be using an explicit transaction for the demonstrations to "pause" time. The same effects are seen in implicit and very fast running transactions, but they are MUCH harder to demo. If you want to see the effects of concurrency in "real time", build an example using Adam Machanic's SQL QUERY Stress http://www.datamanipulation.net/SQLQueryStress/, I used it when developing my in memory presentation, and given data sets with duplications and lots of concurrency, you can see the effects occur:

    --Connection1
    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    go

    BEGIN TRANSACTION;
    insert into demo.OnDisk (onDiskId,value)
    values (1, 'Connection 1');

    And then on a second connection, execute:

    --Connection2
    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    go

    BEGIN TRANSACTION;
    insert into demo.OnDisk (onDiskId,value)
    values (1, 'Connection 2');
    GO

    Connection 2 is blocked, waiting to see what happens in the other connection. This is a very major difference between the on disk snapshot isolation level and what we will see in memory's lockless version. COMMIT the version on connection 1, and in the second connection you will see the error you will expect to see:

    Msg 2627, Level 14, State 1, Line 6
    Violation of PRIMARY KEY constraint 'PKOnDisk'. Cannot insert duplicate key in object 'demo.OnDisk'. The duplicate key value is (1).
    The statement has been terminated.

    Be sure and clear all transactions on both connection. The data in the table should look like:

    select @@TRANCOUNT as trancount, *
    from demo.OnDisk;

    trancount   onDiskId    value
    ----------- ----------- --------------------
    0           1           Connection 1

    Now we will try to update the row in two different connections, and as you should expect since the insert was blocked, it should be obvious what is going to happen here too.

    Execute:

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT
    go

    BEGIN TRANSACTION
    update demo.OnDisk 
    set    value = 'Still Connection 1'
    where  onDiskId = 1

    On one connection, and then afterwards, the following in another:

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT
    go

    BEGIN TRANSACTION
    update demo.OnDisk 
    set    value = 'I said Connection 2'
    where onDiskId = 1

    Blocked. Now, commit the transaction in the first transaction, and you will see:

    Msg 3960, Level 16, State 2, Line 5
    Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'demo.OnDisk' directly or indirectly in database 'snapshotDemo' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.

    The key bit of information in there is the word "conflict". As you will see in the next section of the blog on in memory OLTP tables, how and when a transaction is canceled due to a modification collision is directly related to whether it is a resource conflict, or a conflict with the constraints of a system.

    In Memory SNAPSHOT

    Turning our attention to the other SNAPSHOT isolation level, let's take a look at how it does conflict resolution. The biggest difference you will notice is that there is never any connection waiting, unlike with on disk. In any location where you might see a waiting condition, it will either just defer the check until COMMIT time (if there are no resource conflicts) or immediately if there are. When the next iteration of my database design book is released, I will have some discussions about the internals of how in memory objects are stored, but I will not go into it here just yet. In either case, I will state that I will in both places include the following statement: "do you want to know more? Check out Kalen Delaney's book here: http://www.red-gate.com/community/books/sql-server-internals-in-memory-oltp", largely because she is a great friend and a great writer/teacher. Secondarily because in either case my plans are to show structures conceptually, strictly as a mental guide to help you understand how your designs are affected by the internals and I don't want to go any deeper.

    Starting with the insert case from before, using the demo.InMem table, on one connection, execute:

    --Connection 1
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 1')

    First thing you will probably notice is a really annoying error message (because later you will need to use a hint to allow the table to be accessed in SNAPSHOT isolation level). Since we had set the isolation level to SNAPSHOT earlier, the following error is thrown:

    Msg 41332, Level 16, State 0, Line 4
    Memory optimized tables and natively compiled stored procedures cannot be accessed or created when the session TRANSACTION ISOLATION LEVEL is set to SNAPSHOT.

    So we change the batch to:

    --Connection1
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 1')

    And then in a second connection:

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 2')

    Both complete immediately. Looking at the data, you will immediately want to try:

    select *
    from demo.inMem

    But this will result in an error message that will also end the transaction:

    Msg 41368, Level 16, State 0, Line 46
    Accessing memory optimized tables using the READ COMMITTED isolation level is supported only for autocommit transactions. It is not supported for explicit or implicit transactions. Provide a supported isolation level for the memory optimized table using a table hint, such as WITH (SNAPSHOT).

    So don't do that! Instead, use:

    select *
    from demo.inMem (SNAPSHOT)

    On connection 1, this returns:

    inMemId     value
    ----------- --------------------
    1           Connection 1

    Connection 2 believes otherwise:

    inMemId     value
    ----------- --------------------
    1           Connection 2

    Next COMMIT Connection 1, and you will see it succeeded. Going to Connection 2, check the data (and the @@trancount) again:

    select @@trancount as tranCount, *
    from demo.inMem (SNAPSHOT)

    On my first try, I really expected it to fail... But it did not (because the value only violates a constraint at a data level, as it is not a resource conflict), and still sticks with its expectation that what it knows about the state of the data is correct:

    tranCount   InMemId     value
    ----------- ----------- --------------------
    1           1           Connection 2

    Now commit the transaction on Connection 2, and you will see the following error:

    Msg 41325, Level 16, State 1, Line 17
    The current transaction failed to commit due to a serializable validation failure.

    Which could be considered as annoying, as it doesn't initially make sense. Why a serializable validation? Modifications will be done in a serializable isolation level (just like in read committed where they use exclusive locks regardless of you trying to use NOLOCK!). In this case, the other row with the PK of 1 fails the no phantom criteria of serializable, so the failure. Try again to insert the row, you get:

    Msg 2627, Level 14, State 1, Line 3
    Violation of PRIMARY KEY constraint 'XPKInMem'. Cannot insert duplicate key in object 'InMem'. The duplicate key value is (1).
    The statement has been terminated.

    I won't go through it, but if you are trying out my code, do the same experiment with different primary key values to see that inserting 1 for the PK, and 2 for the other PK do not collide.

    What is interesting is to start over, but this time, on Connection 2 insert the row, fetch it, and then delete it. When you commit the transaction, what happens? (I just decided to try this, and I am not 100% sure myself… I guess it succeeds)

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    insert into demo.inMem (inMemId,value)
    values (1, 'Connection 2')
    GO

    select *
    from demo.inMem WITH (SNAPSHOT)
    GO

    delete from demo.inMem with (SNAPSHOT)
    where InMemId = 1

    So far so good, this executes, now go back to connection 1 and COMMIT. Go back to Connection 2, and you will see:

    Msg 41325, Level 16, State 1, Line 12
    The current transaction failed to commit due to a serializable validation failure.

    So I was wrong (and started to guess that I was wrong as I kept reasoning through the problem, which is a lot easier when blogging than when trying to solve a customer issue on a high throughput system!). Even though the net effect is that I did nothing to data, we touched the same resources.

    Now let's do the same with an update statement, after clearing the table:

    rollback -- may not be necessary
    go

    delete from demo.inMem;

    insert into demo.inMem(inMemId, value)
    values (1,'First Row'),(2,'Second Row');

    On connection 1, change the value to '1st Row':

    --Connection 1
    BEGIN TRANSACTION;

    update demo.inMem with (SNAPSHOT)
    set value = '1st Row'
    where inMemId = 1;

    Now, on a different connection, first set the second row's value:

    --Connection2
    BEGIN TRANSACTION;

    update demo.inMem with (SNAPSHOT)
    set value = '2nd Row'
    where inMemId = 2;

    Now, inside of the same transaction, try to update row 1…

    update demo.inMem with (SNAPSHOT)
    set value = 'Not the 2ndRow'
    where inMemId = 1;

    Immediately you are given an error (well, errors):

    Msg 41302, Level 16, State 110, Line 3
    The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.

    Msg 3998, Level 16, State 1, Line 1
    Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
    The statement has been terminated.

    Now, on connection 2 you can see that the data has reverted and the transaction has been rolled back (as the error message said it was!)

    select @@trancount as tranCount, *
    from demo.inMem (SNAPSHOT)

    tranCount   InMemId     value
    ----------- ----------- --------------------
    0           1           First Row
    0           2           Second Row

    Conclusion

    Concurrency handling/Isolation using optimistic concurrency controls (or semi-optimistic as in on disk's version of SNAPSHOT) for data modifications is far trickier than using the basic pessimistic lock based methods. In my previous books, I have suggested that you would likely want to shy away from SNAPSHOT for connections that are doing data modifications, mostly because they still used locks, but then fail if the other connection has touched their resources, making the performance gain not nearly worth the effort.

    However, In Memory OLTP can EASILY be worth your time as it uses a lock-less/latch-less mechanism that makes any waiting done negligible. The side effect is that you can easily get resource collisions depending on how well you tune your queries, and the likelihood of multiple users accessing the same data for modification (and more, as I will show whenever I get to how retrieval works with modification statements someday.) Add to this things like FOREIGN KEY constraints, multiple UNIQUE constraints, and multiple queries in a stored procedure, and things can get tricky.

    The one major change to your thinking is that you need to see the messages like:

    Msg 41325, Level 16, State 1, Line 17
    The current transaction failed to commit due to a serializable validation failure.

    Not as error messages, but merely messages like a deadlock.

    Transaction (Process ID %d) was deadlocked on {%Z} resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

    Warnings to try again. The fact is, that is what the deadlock message states explicitly in the text. You were tangled up with another user. An occasional deadlock (if handled by the code as a simple retry of the batch/transaction batches) is nothing to worry about. Large numbers of them are signs of issues, and I expect that if you see tons of serializable validation failure (not a fun to say as deadlock!) messages that the same will be true. You have concurrency issues.

     

  • Utility to reset a sequence next value based on a table/sequence relationship

    So I was building a database the other day, and I used sequence objects for all of the defaults because it was FAR easier to load data into the table not needing to set identity_insert on and off over and over and over. Worked great. Went through all of my tests, everything was great. Got out Redgate Data Generator to load up a few hundred thousand noise rows and boom, primary key error. “Blerg,” I said, realizing that sequences don’t manage themselves like identities. So I wrote some code that looked a lot like this:

    DECLARE @nextValue numeric(38) = COALESCE((SELECT MAX(singleSequenceId)
                                               FROM   dbo.singleSequence),0) +1
                                           
    DECLARE @statement NVARCHAR(1000) = CONCAT('ALTER SEQUENCE dbo.singleSequence_sequence RESTART WITH ',@nextValue)
    EXEC (@statement)

    Because, like every good programmer, I wanted to at least somewhat automate the task. But like very at least slightly better than that good programmer, I wanted something a bit more like DBCC CHECKIDENT so I didn’t have to do that work… And what if the sequence was not started with 1, or with an increment other than 1? So you know what comes next, right? We automate that.  In the end, I created three procedures that could serve the purpose.

    1. Utility.Sequence$ResetBasedOnSpecificTable – Which simply lets you pass in the table and column, along with a sequence, and it resets it based on the values of the table and the metadata of the sequence.

    2. Utility.Sequence$ResetBasedOnSequence – Takes the sequence object, looks to see if it is used in one default constraint, not zero, which would be useless, not two, as that would be too many, and three is right out.

    3. Utility.Sequence$ResetBasedOnTableAndColumn – Takes the name of a table and column, and if it has a default that is for a single table, it uses that.

    If you want the code as a downloadable package, it is available here: http://www.drsql.org/Pages/DownloadablePackages.aspx under the name “sequence reset objects.sql”. As usual, when downloading code from the internet, use at your own risk, test, test, test, test and test again before trusting my code in your production environment and I will do the same with yours.

    The second two procedures use the first one, but just do the work of fetching and verifying the metadata. Here is the first procedure in which you specify table and sequence:

    create procedure  Utility.Sequence$ResetBasedOnSpecificTable
        @tableSchema sysname,
        @tableName    sysname,
        @columnName sysname,
        @sequenceSchema sysname,
        @sequenceName sysname,
        @ignoreDataTypeFlag bit = 0 --allow the user to ignore the check for a proper datatype
                                    --note that it uses the metadata for restarts, but it does not care if your default is
                                    --NEXT VALUE FOR sequenceName * 100, so you need to determine if this fits your needs..
    AS
        SET NOCOUNT ON

        --make sure the datatype of the target column is numeric (as you can cast the NEXT VALUE FOR in the query)
        if @ignoreDataTypeFlag = 0 AND
            not exists (
                        select *
                        from   sys.columns
                                 join sys.types   
                                    on columns.system_type_id = types.system_type_id
                        where  columns.scale = 0
                         and   types.name in ('tinyint','smallint','int','bigint','numeric','decimal')
                         and   columns.object_id = object_id(concat(@tableschema,'.',@tableName))
                         and   columns.name = @columnName
                      )
          begin
            throw 50000,'Either column does not exist, or is not of the base type for a sequence',1
            return -100
          end

        --fetch the metadata for the sequence
        declare @increment numeric(38), @current_value numeric(38), @startWith numeric(38)
        select @increment = cast(increment as numeric(38)),
                @startWith = cast(start_value as numeric(38))
        from   sys.sequences
                 join sys.schemas   
                    on schemas.schema_id = sequences.schema_id
        where  sequences.name = @sequenceName
          and  schemas.name = @sequenceSchema


        --no checking for real object names, because the following statement will take care of it
        DECLARE @statement NVARCHAR(4000) =

        --I am pretty sure my new favorite function is CONCAT! Note that if your input is bad, this statement will fail, so it will be
        --up to you do make sure that doesn't occur, and check for errors.
        CONCAT('DECLARE @nextValue numeric(38) = COALESCE((SELECT MAX(',@columnName,')
                                           FROM   ',@tableSchema,'.',@tableName,'),',@startWith - 1,') + ',@increment,'
                                          
               DECLARE @statement NVARCHAR(1000) = CONCAT(''ALTER SEQUENCE ',@sequenceSchema,'.',@sequenceName,' RESTART WITH '',@nextValue)
               EXEC (@statement)
               '
                )

        EXEC (@Statement)
       
    GO

    Using it is simple, just take one sequence and one table (they needn’t be linked to use this first procedure, but it will be useful in later examples):

    create sequence singleSequence_sequence as int start with 20 increment by 2
    go
    create table singleSequence
    (
        singleSequenceId    int constraint pkSingleSequence primary key default (next value for singleSequence_sequence)
    )
    go

    Then run/test it:

    insert into singleSequence
    values (20),(22),(24),(26) --The first two values in sequence, so the first two default values inserts will be primary key violations
    go

    Then, try to insert the value using the defaults:

    insert into singleSequence
    default values
    go

    You will get the following error, Execute it twice, and you will see another duplicate for 22, and it will keep failing until it gets to 28 if you go that way.

    Msg 2627, Level 14, State 1, Line 56
    Violation of PRIMARY KEY constraint 'pkSingleSequence'. Cannot insert duplicate key in object 'dbo.singleSequence'. The duplicate key value is (20).
    The statement has been terminated.

    But, instead of trying again, execute the reset, and you will see it works:

    exec Utility.Sequence$ResetBasedOnSpecificTable 'dbo','singleSequence','singleSequenceId','dbo','singleSequence_sequence'
    go
    insert into singleSequence
    default values
    go

    You will see that a new row has been created:

    select *
    from   singleSequence

    singleSequenceId
    ----------------
    20
    22
    24
    26
    28

    Next, let’s create the procedure that looks up the sequence based on the sequence name.

    create procedure Utility.Sequence$ResetBasedOnSequence
        @sequenceSchema sysname,
        @sequenceName sysname
    AS
        SET NOCOUNT ON

        declare @tableSchema sysname,
        @tableName    sysname,
        @columnName sysname,
        @referenceCount int

        --count to see that this sequence is referenced by one default (note that it could be used in a procedure, and there is no reason why
        --you can't use the sequence along with the table as @keyValue = NEXT VALUE FOR... insert ... values (@keyvalue, as this is a better
        --pattern than using scope_identity() ever was
        SELECT @referenceCount = (SELECT COUNT(*)
                                  FROM   sys.dm_sql_referencing_entities (CONCAT(@sequenceSchema,'.',@sequenceName), 'Object') as dsre
                                           join sys.default_constraints
                                            on dsre.referencing_id = default_constraints.object_id)
        --note too that you may have cases where the sequence is referenced twice in the same table. This is another fringe case I would relegate
        --to manually specifying table and column along with sequence.

        if @referenceCount = 0
          begin
            throw 50000,'Sequence either does not exists, or is not referenced in a default',1
            return -100
          end

        if @referenceCount > 1
          begin
            throw 50000,'Sequence is referenced by more than one default constraint',1
            return -100
          end

        --since we know there is one table and column using the sequence object, we can use it.
        SELECT @tableSchema = schemas.name,
                @tableName = tables.name,
                @columnName = columns.name
        FROM
            sys.dm_sql_referencing_entities (concat(@sequenceSchema,'.',@sequenceName), 'Object') as dsre
                join sys.default_constraints
                    on dsre.referencing_id = default_constraints.object_id
                join sys.columns
                    on default_constraints.parent_object_id = columns.object_id
                       and default_constraints.parent_column_id = columns.column_id
                join sys.tables
                    on columns.object_id = tables.object_id       
                join sys.schemas
                    on schemas.schema_id = tables.schema_id

        execute Utility.Sequence$ResetBasedOnSpecificTable
                                        @tableSchema =@tableSchema,
                                        @tableName    = @tableName,
                                        @columnName = @columnName,
                                        @sequenceSchema = @sequenceSchema,
                                        @sequenceName = @sequenceName


    GO

    To test this, we use the same code as previous, but just specify the sequence name:

    insert into singleSequence
    values (20),(22),(24),(26) --The first two values in sequence, so the first two default values inserts will be primary key violations
    go

    insert into singleSequence
    default values
    go

    Same error:

    Msg 2627, Level 14, State 1, Line 56
    Violation of PRIMARY KEY constraint 'pkSingleSequence'. Cannot insert duplicate key in object 'dbo.singleSequence'. The duplicate key value is (20).
    The statement has been terminated.

    But, instead of trying again, execute the reset, and you will see it works:

    exec Utility.Sequence$ResetBasedOnSequence 'dbo','singleSequence_sequence'
    go
    insert into singleSequence
    default values
    go

    And again, you will see that a new row has been created:

    select *
    from   singleSequence

    singleSequenceId
    ----------------
    20
    22
    24
    26
    28

    Finally, the procedure to let you specify the table and column, which is what you probably want to do realistically (again with the caveat that sequences are independent objects, and as such can be used for multiple purposes, use with care.

    create procedure Utility.Sequence$ResetBasedOnTableAndColumn
        @tableSchema sysname,
        @tableName   sysname,
        @columnName sysname
    AS
        SET NOCOUNT ON

        declare @sequenceSchema sysname,
        @sequenceName sysname,
        @referenceCount int

        --using sys.sql_expresssion_dependencies, check to see if the column has a default that references
        --the sequence
        select @sequenceSchema = referenced_schema_name, @sequenceName = referenced_entity_name
        from   sys.schemas
                join sys.tables
                    on schemas.schema_id = tables.schema_id
                join sys.columns
                        on columns.object_id = tables.object_id       
                    join sys.default_constraints
                        on default_constraints.parent_object_id = columns.object_id
                           and default_constraints.parent_column_id = columns.column_id
                    join sys.sql_expression_dependencies
                        on default_constraints.object_id = sql_expression_Dependencies.referencing_id
        where schemas.name = @tableSchema
          and tables.name= @tableName
          and columns.name = @ColumnName

        --then use the referencing entitys dmv, and see if it is being referenced > 1 time. It is important to be careful or you may
        --have the case where you are referencing it by two or more tables and resetting it to the next value for one won't be the next
        --value for another. This is a very fringe case, and would be way too much work to do to automate for the few use cases.
        SELECT @referenceCount = (SELECT COUNT(*)
                                  FROM   sys.dm_sql_referencing_entities (CONCAT(@sequenceSchema,'.',@sequenceName), 'Object') as dsre
                                               join sys.default_constraints
                                                on dsre.referencing_id = default_constraints.object_id)


        if @referenceCount = 0
          begin
            throw 50000,'Sequence either does not exists, or is not referenced in a default',1
            return -100
          end

        if @referenceCount > 1
          begin
            throw 50000,'Sequence used by column is referenced by more than one default constraint',1
            return -100
          end

        execute Utility.Sequence$ResetBasedOnSpecificTable
                                        @tableSchema =@tableSchema,
                                        @tableName    = @tableName,
                                        @columnName = @columnName,
                                        @sequenceSchema = @sequenceSchema,
                                        @sequenceName = @sequenceName

    go

    I won’t repeat that test code, as it is just repetitive to the others, but suffice it to say that replace the call for resetting the sequence to:

    exec Utility.Sequence$ResetBasedOnTableAndColumn 'dbo','singleSequence','singleSequenceId'

    And you get the same results. Finally, let’s test a few fringe cases, so I have proved that I tested them:

    What if the sequence is used in multiple tables? It would be bad to reset based on one table, but not the other..  So the procedures (other than the first one, where you explicitly provide object names to be used) check the metadata of the sequence object to make sure you aren't shooting yourself in the foot.

    create sequence doubleSequence_sequence as int start with 1 increment by 2
    go

    create table doubleSequence1
    (
        doubleSequence1Id    int constraint pkdoubleSequence1 primary key default (next value for doubleSequence_sequence)
    )
    GO
    create table doubleSequence2
    (
        doubleSequence2Id    int constraint pkdoubleSequence2 primary key default (next value for doubleSequence_sequence)
    )
    GO

    Trying to use the reset procedures:

    exec Utility.Sequence$ResetBasedOnSequence 'dbo','doubleSequence_sequence'

    Gives you this error:

    Msg 50000, Level 16, State 1, Procedure Sequence$ResetBasedOnSequence, Line 249
    Sequence is referenced by more than one default constraint

    And the other, a slightly different wording of the same error:

    exec Utility.Sequence$ResetBasedOnTableAndColumn 'dbo','doubleSequence1','doubleSequence1Id'

    Msg 50000, Level 16, State 1, Procedure Sequence$ResetBasedOnTableAndColumn, Line 272
    Sequence used by column is referenced by more than one default constraint


    Finally, let’s take the case where you have the same sequenced referenced twice in the same table:

    create sequence oneMoreDouble_sequence as int start with 1
    go
    create table oneMoreDouble
    (
        oneMoreDoubleId    int constraint pkoneMoreDouble primary key default (next value for oneMoreDouble_sequence),
        anotherColumn    varchar(10) default ('T' + cast(next value for oneMoreDouble_sequence as varchar(9)))
    )
    GO

    The first usage is a surrogate key value, and the other is a formatted version of that value (A ‘T’ prepended to it, fancy!) The calls used in the last double test will give you the same errors:

    exec Utility.Sequence$ResetBasedOnSequence 'dbo','oneMoreDouble_sequence'

    exec Utility.Sequence$ResetBasedOnTableAndColumn 'dbo','oneMoreDouble','oneMoreDoubleId', 'dbo','oneMoreDouble_sequence'

    But what if you specify the column? Specifying the text column will not work:

    Utility.Sequence$ResetBasedOnSpecificTable  'dbo','oneMoreDouble','anotherColumn', 'dbo','oneMoreDouble_sequence'

    Which will return the following error:

    Msg 50000, Level 16, State 1, Procedure Sequence$ResetBasedOnSpecificTable, Line 269
    Either column does not exist, or is not of the base type for a sequence

    But this call will work just fine, as it would have technically for the doubleSequence tables, though in this case it is a safe operation, as it is just referenced by the one table.

    Utility.Sequence$ResetBasedOnSpecificTable  'dbo','oneMoreDouble','oneMoreDoubleId', 'dbo','oneMoreDouble_sequence'

  • Upcoming Speaking Engagements in Indianapolis and Louisville

    Mercy, July is flying past at breakneck pace isn't it? The best part of July passing into the history books is that August has me speaking at two SQL Saturday's, one of them being a first time for me.

    On August 8 I will travel up to Indianapolis for SQL Saturday #402, a city I have been to a few times previously but not for a SQL Saturday. My most notable trip up included a stop at the Murat Theater to catch Elvis Costello back in 2002 with Laura Cantrell opening (When he gets an opening act, he does it right.) Coincidentally, Elvis will be in Nashville on the same day as SQL Saturday, and I will miss it, because, as I say when it fits my mood. SQL First!

    Then I will be up in Louisville for SQL Saturday #403 for what has become an annual favorite for several reasons. Firstly, the conference is always great, and Mala and team always do a good job. But the speaker dinner has always been excellent, and introduced at least one new favorite guilty pleasure, the yummy hot brown.

    Both conferences I will be presenting my presentation on In Memory OLTP:

    How In-Memory Database Objects Affect Database Design

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

    I hope to see everyone who reads this blog at both events, though you don't need to come to my session both times, only I will be forced to do that.

    After these events, September will be the last month of the year with any rest, as I have Orlando SQL Saturday #442 (with a Disney vacation with children attached, naturally), PASS, the MVP Summit (hopefully!), and the the holidays. Plus, I start the revision of the design book with hopes to have the first half done by the time we take down the Christmas tree!) SQL First!

  • Temporal Tables – Part 2 – Changing history

    After my post yesterday, I had planned to take a break for Independence Day holiday, but a commenter asked a question that piqued my interest. How might you edit history? Well, the answer is simple, but the process isn’t exactly simple. Basically, you have to turn off system versioning, update history, and reapply system versioning. Not difficult, but not something you can easily do just by editing a row in history as the temporal structures protect themselves.

    select @@version
    go
    Microsoft SQL Server 2016 (CTP2.0) - 13.0.200.172 (X64)   May 21 2015 10:53:07   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

    First warning, this is pre-release code, and is subject to change. I will try to make sure it works sooner or later with RTM, probably as I am doing my edits for the book.

    To see the metadata, let me create a table that I will not turn on system versioning:

    create table notTemporal
    (
        notTemporalId int primary key
    )

    Now, sys.tables has a few new columns for temporal, including temporal_type_desc and history_table_id which I will use to list the tables and their temporal usage (filtering out history tables).

    select CONCAT(schemas.name,'.',tables.name) as table_name,
           historyTableSchema.name + '.' + historyTable.name as history_table_name,
           tables.temporal_type_desc
    from   sys.tables
             join sys.schemas
                on tables.schema_id = schemas.schema_id
             left outer join sys.tables as historyTable
                join sys.schemas as historyTableSchema
                    on historyTable.schema_id = historyTableSchema.schema_id
                on historyTable.object_id = tables.history_table_id
    where tables.temporal_type_desc <> 'HISTORY_TABLE'

    go

    table_name           history_table_name                          temporal_type_desc
    -------------------- ------------------------------------------- -------------------------------------------
    dbo.company          dbo.MSSQL_TemporalHistoryFor_565577053      SYSTEM_VERSIONED_TEMPORAL_TABLE
    dbo.notTemporal      NULL                                        NON_TEMPORAL_TABLE

    The dbo.company table is the one from part 1, and you can see its history table. We will see that this contains all of the historical changes, and not the current row, still with a gap between the top two rows because I deleted the row temporarily.

    select *
    from   dbo.MSSQL_TemporalHistoryFor_565577053

    This returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045

    And the current row is in the primary table:

    select *
    from   dbo.company

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999

    Ok, the next stuff I am going to try is to see what happens if you monkey with the data in this table as is, here in CTP 2.0. Use cases might be to remove bad rows, errored data, etc. But mischief is also kind of fun when you come upon a new technology too.)

    update dbo.company
    set   SysEndTime = '2015-06-30 00:37:08'

    Cool, it protects this data

    Msg 13537, Level 16, State 1, Line 55
    Cannot update SYSTEM_TIME PERIOD columns in table 'testTemporal.dbo.company'.

    What about the SysStartTime?

    update dbo.company
    set   SysStartTime = '2015-06-30 00:37:08'

    Same deal

    Msg 13537, Level 16, State 1, Line 62
    Cannot update SYSTEM_TIME PERIOD columns in table 'testTemporal.dbo.company'.

    Not allowed either. And modifying the history table directly is also not allowed. Here I want to make the row appear to have existed since the 15th of June, rather than the 30th:

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-15'
    where  companyId = 1
    and   sysEndTime = '2015-06-30 00:16:19.1614451'

    Wah, wah, wah….

    Msg 13561, Level 16, State 1, Line 70
    Cannot update rows in a temporal history table 'testTemporal.dbo.MSSQL_TemporalHistoryFor_565577053'.

    However, if you want to make a change to the data, or load in old history data, you can do this by turning off versioning, building the table as it is needed, and then turning versioning back on, making sure to specify the table to use.

    alter table dbo.company
        set (SYSTEM_VERSIONING = OFF);

    Now you can run your update query to set the start time back to June 15, so this row existed longer in history than it really did.

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-15'
    where  companyId = 1
    and   sysEndTime = '2015-06-30 00:16:19.1614451'

    Succeeds. Now, we turn back on system versioning, being sure to specify the history table, or you will get another new table (second repeat, but it is important or you will have a lot of these tables around). If the table is large, you may not want to do the consistency check, but I generally would myself  because structures like this can be tricky to get right, so it will be safer to let it check the structure.

    ALTER TABLE dbo.Company
        SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.MSSQL_TemporalHistoryFor_565577053, DATA_CONSISTENCY_CHECK = ON));

    Now looking at the entire history, you see the row started at 6-15, not 6-30.

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    Which returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-15 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Now you want to see the row as of 6-15:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-15' as priorCompany

    It is there!

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-15 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Still not on 6-14:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-14' as priorCompany

    Update: Security Concurrency

    A commenter asked about how this works with concurrency, and I also wanted to note about security. Generally speaking, this is definitely not something that just anyone should do. The security of turning off system versioning will require the use to have rights to alter the table’s structure. Not general purpose rights to be sure, though you could wrap the functionality in a stored procedure if it were desired…

    For concurrency, a concern is that while you have system versioning turned off, you will not be gathering history. A way to prevent other connections from messing with the table while you are altering history is to use a transaction. Start the transaction, and turn off system versioning:

    BEGIN TRANSACTION

    alter table dbo.company
        set (SYSTEM_VERSIONING = OFF);

    Now go off to another connection and try to execute INSERT INTO dbo.company (name, companyNumber) VALUES ('Company 2','00002') and on another, SELECT * FROM dbo.company (NOLOCK)
    You will be blocked on both connections due to the exclusive schema lock from the table alter, yes, even NOLOCK has to wait for a schema lock. So you can make the history change, safely, while everyone else waits patiently.

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-14'
    where  companyId = 1
    and   sysStartTime = '2015-06-15 00:00:00.0000000'

    Now, turn back on system versioning:

    ALTER TABLE dbo.Company
        SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.MSSQL_TemporalHistoryFor_565577053, DATA_CONSISTENCY_CHECK = ON));

    Now check the history structure:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-14 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Now commit the transaction on the other connection and run the statement, again:

    COMMIT TRANSACTION
    GO
    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by CompanyNumber, SysStartTime desc
    go

    You can see that the new row was added from your other connection:


    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-14 00:00:00.0000000 2015-06-30 00:16:19.1614451
    2           Company 2                      00002         2015-07-01 22:42:10.3875967 9999-12-31 23:59:59.9999999

    Clearly not something you want to do in a highly concurrent system too often, but it will be data safe in any case.

    Very cool stuff… More to come… Standard caveat: I won’t be doing any guessing in these blogs because if I guess right and NDA information is out there that I missed, I could get in trouble accidentally. But I will try to make sure if there are big changes in syntax that I note it, or redo the blog syntax so I don’t end up with bad information out there.

  • Temporal Tables – Part 1 - Simple Single Table Example

    In my first entry in my series of posts on DB Design Book Topics, I want to start with a pretty interesting topic, Temporal Tables. It is a very cool new feature that takes something that used to be very painful (capturing history of changes to a table, allowing the user to see the data at a point in time), and makes it very easy.

    So if you have a row in a table, and it is created, updated, and then deleted, knowing how the row looked at a given period of time can be very useful. I wanted to start with a very basic example, to show how thing work, and later entries in this series will expand to multiple rows and tables.

    select @@version --Features are apt to change. Test if you are using a later CTP

    ---------------------------------------------------------------------------------------------------------------------------
    Microsoft SQL Server 2016 (CTP3.3) - 13.0.1000.281 (X64)   Jan 28 2016 15:11:40   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)

    First off, we need to create a workspace. I will just call the database testTemporal:

    create database testTemporal
    go
    use testTemporal
    go

    Nothing needed to be done to allow temporal, just create a database on the 2016 instance. The table needs to have a few new things, highlighted in the next example:

    create table dbo.company
    (
        companyId    int identity(1,1) primary key,
        name        varchar(30) unique,
        companyNumber char(5) unique,
        SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL, --the time when this row becomes in effect
        SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,   --the time when this row becomes no longer in effect
        PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime)  
    )
    WITH (SYSTEM_VERSIONING = ON); --Note that you can use a table of your own. More on that in a later blog entry…
    go

    Simple enough, and if you want to see more about the create table syntax, check BOL here (https://msdn.microsoft.com/en-us/library/ms174979.aspx) as this is a CTP edition and subject to change.

    So let’s create a row in the table:

    insert into dbo.company (name, companyNumber)
    values ('Company1','00001')

    select SCOPE_IDENTITY(); --If you don't mess up, this will be 1. We will use this in our examples
    go

    Now we change something in the table a few times to let us have a few changes to see in the example:

    update company
    set    name = 'Company Name 1'
    where  companyId = 1

    And update it again:

    update company
    set    name = 'Company Name 2'
    where  companyId = 1

    This time update with no changes:

    update company
    set    name = 'Company Name 2'
    where  companyId = 1

    To see the row exactly as it currently exists, just use a normal select statement:

    select *
    from   company
    where  companyId = 1

    You will see that looks exactly as you expect:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-05 02:15:32.7938411 9999-12-31 23:59:59.9999999

    To see all versions of the rows, use the FOR SYSTEM_TIME clause with CONTAINED IN (read more here about temporal and FOR SYSTEM_TIME): https://msdn.microsoft.com/en-us/library/dn935015(v=sql.130).aspx:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order by SysEndTime Desc

    This returns all of the row versions that have been created:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451

    There are a few things of note. The first SysStartTime value will be when the row is inserted. The last row will be to 9999-12-31 23:59:59.9999999. Note too that when we updated the row with no actual data changes, we still get a new version.

    Also when working with the times and the FOR SYSTEM_TIME clause, be careful to include the time up to the fractional seconds or you may not get what you expect. When using CONTAINED IN, if you don’t put the nines out to all seven decimal places, you won't get the current row due to roundoff:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.999999') –Only six decimal places
    order by SysEndTime Desc

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451


    The more interesting use will be to work with a row (or rows) at a certain point in time, like to get the second version from 2015-06-30 00:16:19.1614451, we will use FOR SYSTEM_TIME AS OF, which takes a datetime2 value, and returns the row where SysStartTime >= PassedValue > SysEndTime. (The PassedValue can also be a variable.)

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany

    Note that the time was from the SysStartTime, which also was the SysEndTime for a different row. This returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199

    You can also use FOR SYSTEM_TIME in a JOIN criteria and see multiple versions of the row in your query:

    select company.Name, priorCompany.Name as PriorName
    from   company
             join company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany
                on company.companyId = priorCompany.companyId

    This will return:

    Name                           PriorName
    ------------------------------ ------------------------------
    Company Name 2                 Company Name 1

    I expect that may be a pattern that gets used in a later blog to calculate changes since a time period! Finally, lets take a quick look at what happens in a delete:
           
    delete from company
    where  companyId = 1

    Now, checking the data, we see that there is no 9999 row version:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')

    --There are the same four rows, but now all rows have an end time in the same millienium as we live in:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045

    So looking for a row at a past time, the row did still exist:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany

    With the only difference being that the SysEndTime isn’t what is once was:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199

    But looking at the table currently, no row:

    select *
    from   company

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------

    So finally, what happens when we replace the row using the same surrogate key value? (Not discussing here if this is a good idea, or bad idea…And this has led me to wonder if we can adjust history if the delete was accidental… Ah, fodder for later)

    set identity_insert dbo.company on
    go
    insert into dbo.company (companyId, name, companyNumber)
    values (1, 'Company1','00001')
    go
    set identity_insert dbo.company off
    go

    And then look at all of the row versions that exist now?

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    --You can see that the row now exists, but there is now a gap between the top two rows:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451

    Looking at the data at the current row’s SysStartTime:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:37:07.1375063' as priorCompany

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999

    But a very very small amount of time before, not there:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:37:07.1375062' as priorCompany

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------

    So there is a quick look at some of the basic functionality that we can expect with temporal data in 2016.

    Note: I won’t be doing any guessing in these blogs because if I guess right and NDA information is out there that I missed, I could get in trouble accidentally. But I will try to make sure if there are big changes in syntax that I note it, or redo the blog syntax so I don’t end up with bad information out there.

    Continue to Part 2 – Changing History

  • Speaking this week at #sqlsatChattanooga and for the PASS Performance Virtual Chapter

    The topic for both sessions will be: How In-Memory Database Objects Affect Database Design and the following is the abstract:

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

    The PASS Performance Virtual Chapter session will be 1:00 PM Central Time on Thursday the 25th. You can see more details and register here: http://performance.sqlpass.org/

    The SQL Saturday Chattanooga session will be at 3:00 PM Eastern Time on Saturday the 27th. You can see details about the entire conference here: http://www.sqlsaturday.com/410/EventHome.aspx 

    I am pretty excited about talking about SQL Server In-Memory Technologies, but also it is a bit interesting because of how much SQL Server 2016 is going to change thing. I will do my best to make notes on how it will change the usage and design aspects of In-Memory as I go. You can see a list of how things are changing here in Aaron Bertrand’s blog entry. In 2016, it won’t change the In-Memory OLTP stuff to be general purpose, but it certainly will change how effective it can be used in certain scenarios.

    I do wish I had more time to play with 2016 before my presentations, but I don’t believe it would change any of my current opinions on usage (other than having quite a bit more data integrity built in, but I certainly will be testing out the new stuff in the coming months as I work on my set of pre-book blogs.

  • Time to Start Plotting My 2016 Database Design Book

    Note: A new version has been talked about for years, but it is not a guarantee. This post is my way of introducing the new features of SQL Server, and follow on posts will be code filled example of how the features will work. I am making no promises at this time, but when I can, I will.

    With the recent release of a CTP version of what is called SQL Server 2016, it is time for me to start thinking about what to add to the next edition of the SQL Server Design Book. In 2014, we decided that there really wasn’t enough change to merit a new edition of the book, with just a few key features being added that changed how one might design their database application. One major feature cropped up, that being In Memory OLTP databases, and while it is a game changer, it was very new and we decided to wait. I went to Kalen Delaney’s precon, as well as producing a presentation on the feature, doing my best to get a feel for how it changed database design.

    SQL Server 2016, even in the CTP2 version I am basing this blog post on, contains quite a few new features that will change how we may implement databases, including one that is going to be a tremendous change for the better. In this first post, I want to list the features I think are important to address in my book, and ask you to use the comments to tell me if I am wrong and what other features I missed.  As I progress, I will come back and try to include links to the actual posts (to make it easy for me to use this a a reference later!)

    New Features To Possibly Cover

    This should be the easiest list to create. There are some very particularly interesting features that are coming to 2016 and that were in 2014, with a brief introduction to what they are and how they fit in my vision.  Each topic will be the header of a linked list of blogs as I write new content on each feature.

    • In-Memory OLTP – Allows you to house your database in RAM, not just caching the existing structures in the way on-disk tables are, but using completely new plumbing that is lock and latch free. It is a MAJOR difference architecturally, and may turn out to be the most work making all of the terminology fit (what used to be a table will quite often need to be an “on-disk” table.) Existed in 2014, and enhanced in 2016.
    • Temporal Data – This is probably the most exiting feature I have seen. It will make the scenario of keeping up with changes in your tables over time, and automate it. Queries using the tables will be able to be executed in the context of a time period (VERY simplified explanation). New to 2016, and very exiting.
    • Columnstore Index Improvements – There is a chapter on reporting structures written by Jessica Moss that touched on Columnstore indexes, at the time a new feature that had just begun to be understood. At this point, with four more years of experience, I know I have a great admiration for what they can do. 2014 gave us Clustered Columnstores, and 2016 enhances how they work to make managing them easier.
    • Row Level Security – Allows you to limit access to rows based on the details of the query executor. Not DDL-esque, like GRANT EXECUTE TO Rows Like This, but more like a security policy, where you build a function that is bound and executed on access to the table. It works very similar to the more manual (always go through a stored procedure!) methods that have existed in the past. New for 2016.
    • JSON Support –  While I have purposefully avoided JSON in the same manner I avoided XML, I will at least need to cover the existence of this feature.
    • Stretch Database – Allows you to put some of your data on the Microsoft Azure Cloud storage (or something along these lines). Allows you to put infrequently used data in the cloud, with very active data locally. Probably not going to be covered heavily, but will bear a mention in the physical design sections of the book.
    • Security DDL Enhancements – There were a few server level permissions added to 2014 that were the bomb. CONNECT ANY DATABASE, SELECT ALL USER SECURABLES, and IMPERSONATE ANY LOGIN. The first two allow you to say: Login X can access all database and see all data. Something that a support person may be allowed. In the past, the login would have been given sysadmin rights because, you know, it is just too much work. In 2016, a few new security bits are added to support the new features.
    • Always Encrypted – Allows for SQL Server columns to be encrypted based on a key that does not reside on the SQL Server.
    • Dynamic Data Masking – A feature that makes column level security palatable by defining a masked output that occurs when a user without certain permissions accesses the column (even though they do have select rights).
    • Durability Changes - In SQL Server 2014, along with In-Memory OLTP allowed for non-durable tables. But in addition to In-Memory durability, they included a feature at the database level called delayed durability that makes log writes asynchronous.
    • Incremental Statistics - Lets you update stats on a single partition of a table, rather than the entire table (assuming it is partitioned). Probably only gets a brief mention, as I don't dig into partitioning that much, but it is definitely something that might affect your physical design.

    New Material To Add To the Book

    This section is about concepts that I need to add that are not based on new features, but rather just concepts that I feel like I missed or didn’t cover enough last time.

    • Design Testing – Basically making sure that the architect of a system, as they progress through the requirements start thinking about testing before code is created, and apply the tests to the conceptual, logical, and physical models to make sure they meet the requirements. Too often you see designs that just don’t make sense because the entire process wasn’t walked through before the customer has a UI on their web browser.
    • Hierarchies – Enhance the sections about hierarchies a bit to include performance testing characteristics of hierarchy code patterns. This is what my presentation on hierarchies is based on, as I had the idea as the last edition went to print.
    • General Tips and Tricks - Other bits and pieces of information about SQL Server and T-SQL programming that come into play.

    Stuff to Not Cover

    Everyone needs limits, and unless I truly get more space in the book, the following topics will probably stay only the briefest of mentions in this edition.

    • Hadoop integration with SQL Server - From what I currently understand, this will allow you to access Hadoop data from SQL Server. Probably will end up in the list of features that may not fit the "Relational Database Design Book" paradigm.

    Stuff to Remove

    For now, this section will be blank. I personally hope to just add new material, and leave most of the existing stuff intact (unless it only pertains to 2005 or earlier, as they are out of support now, even by me!). But last edition, we were getting extremely close to the 700 page limit I have had for most editions of the book. If you are a reader, or have ideas even from the list above about what needn’t be covered, please hit the comments. I am very interested to know. I have ideas about how to expand the book, but it will not be easy to see too much of an expansion, and it will be very hard to shrink what is there without a LOT of editing/rewriting, which is also not a beloved idea.

    One idea I have is to remove the internals stuff about how indexes and data is stored, as other people do it better, and with In-Memory to contend with, it will really need to expand. It is however something that I find people feel like it is a selling point in that it provides a deep enough dive to give the reader a picture of internals, but not nearly as good as quite a few others do.

    I look forward to comments from you as much as I hope you look forward to the months of upcoming blog posts with queries that show how each of these features work as I figure them out myself.

    Note: fixed a few issues with formatting.. I hate HTML :).
    Update: Added Hadoop Integration as a do not cover topic. 
     
    Update: Added Delayed Durability. 
    Update: Added link to Temporal Topic 
    Update: Incremental Statistics
    Update: Added link to In Memory OLTP article(s)
     
    Update: Added Tips and Tricks along with a link to first Tips and Tricks blog 
    Update: Added Dynamic Data Masking link 
    Updated: Added Row Level Security link 

  • ANY and ALL, Two Keywords I Had Never Taken the Time to Understand

    As I am starting to prepare for an upcoming blog series on database designs and associated query patterns (stay tuned!) I was thinking about what query keywords I don't really know well enough. ANY and ALL were two keywords that immediately popped into my mind that I had never really used (and I probably should have.)

    ANY and ALL are used as options on scalar comparison operators like =, !=, >, etc to allow them to be used with multi-row sub-queries.

    You know if you do something like the following:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ( select SalesOrderId
                           from Sales.SalesOrderDetail
                           where SalesOrderId in (43659, 43660, 43661))

    You will get the following error:

    Msg 512, Level 16, State 1, Line 11
    Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

    However, you can change this to be:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ANY (  select SalesOrderId
                                from Sales.SalesOrderDetail
                                where SalesOrderId in (43659, 43660, 43661))

    And now each value in the left input will be checked against the set on the right.

    Admittedly, this isn't that useful, as = ANY is basically equivalent to IN, as in the following statement. Heck, I did use IN in the example to get the 3 rows for the test for a reason:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId in (43659, 43660, 43661)

    One big difference between IN and = ANY, is that it only works with subqueries so this will not work

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ANY (43659, 43660, 43661)

    Will result in:

    Msg 102, Level 15, State 1, Line 3
    Incorrect syntax near '43659'.

    The next thing you might think (I did) is that if = ANY is the same as IN, then != ANY must be the same as NOT in, right? No, because the != ANY is the operator, and you have to think about what “Does not equal ANY” means. Actually the following query will return every row in the table:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId != ANY (select SalesOrderId
                               from Sales.SalesOrderDetail
                               where SalesOrderId in 43659, 43660, 43661))

    Why? What the “!=ANY” operator says is to return a row that the row values doesn't match any subquery row value, so if even one row doesn't match, then it will be successful. So take 43659. It does match 43659, but it doesn't match the other 2 rows. Or as this page in Technet puts it, NOT IN says != value1 and != value2 and != value3, where != ANY is != value1 or != value2 or != value3.

    In this case, you could use the ALL keyword, which says to check the value against every value returned, and the value returned must match all values (Not tremendously interesting with an equal operator, but essential to understanding the operators), but if any row matches, it doesn't match all of them. When no rows are returned by the subquery, it returns true. So the following (with the subquery negated with the 1=2 will return all rows in the table:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ALL (select SalesOrderId                
                              from Sales.SalesOrderDetail
                              where SalesOrderId in (43659, 43660, 43661)
                                and 1=2)

    But, if this is the case, then != ALL should return = rows, right? Wrong.

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId != ALL (select SalesOrderId
                               from Sales.SalesOrderDetail
                               where SalesOrderId in (43659, 43660, 43661)
                                 and /* again */ 1=2)

    Both return the exact same set of rows. The problem (and why I could wrap my head around these operators) is that the operator is truly = ALL and != ALL. So = ALL says that you must match ALL values, but != ALL means that you must be different than all individual values. So in the following, it will return all rows in the table except the three values from the subquery:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId != ALL (select SalesOrderId
                               from Sales.SalesOrderDetail
                               where SalesOrderId in (43659, 43660, 43661))

    What I particularly like about the ANY and ALL keywords, is that they are named in a very straightforward manner, once you get the idea of what they do, but as noted, it wasn’t necessarily intuitive to start with for me. On a practical side, what if we want to compare one value to all of the values in a related set. In my example (using ye olde AdventureWorks database), I want to see what orders do not have any sales order items where the UnitPrice is less than 500. A classic way of making this comparison would be to use an aggregate on the salesOrderItems to get the max UnitPrice for all items and use it as a filter:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId in (    select SalesOrderId
                               from Sales.SalesOrderDetail
                               group by SalesOrderId
                               having max(UnitPrice) <= 500)

    It is a technique I have used many times over. But, really, what would be easier would be to check each item against the scalar value, rather than doing an aggregate. Using the >= ALL operator, we can do this in a direct method. Each row is compared to the value on the left side of the operator.

    select *
    from Sales.SalesOrderHeader
    where 500 >= ALL (    select UnitPrice
                          from Sales.SalesOrderDetail
                          where SalesOrderHeader.SalesOrderId = SalesOrderDetail.SalesOrderId)

    Using AdventureWorks2014 as it is, freshly downloaded, the performance for both queries is very much evenly matched, based on the plan:

    clip_image002

    And Statistics IO and Time are very much similar as well. In my tries, the IN version took a few milliseconds more than than the >= ALL version. But note that the >= ALL query suggested an index. This is the index (with a slightly better name!):

    CREATE NONCLUSTERED INDEX unitPrice_include_salesOrderId
                          ON [Sales].[SalesOrderDetail] ([UnitPrice]) INCLUDE ([SalesOrderID])

    From there, you will see a fairly large (though not necessarily, amazing) improvement:

    clip_image004

    The CPU and reads were quite a bit better, with the >= ALL version needing:

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahea

    Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0,

    Table 'SalesOrderDetail'. Scan count 1, logical reads 96, physical reads 0, r

    SQL Server Execution Times:

    CPU time = 78 ms, elapsed time = 318 ms.

    While the IN version needed:

    Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0,

    Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 0,

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 361 ms.

    The elapsed times were similar, but there was an improvement over pretty much equal as well.

    To test to make sure you have the correct answer we can use the following query to look at the values that are returned. It takes the basic query and looks at the data for each of the salesOrderDetail rows for a salesOrder:

    select SalesOrderId, max(UnitPrice) as MaxUnitPrice, min(UnitPrice) as MinUnitPrice
    from sales.SalesOrderDetail
    where salesOrderId = ANY ( select SalesOrderId
                               from Sales.SalesOrderHeader
                               where 500 >= ALL (select UnitPrice
                                                 from Sales.SalesOrderDetail
                                                 where SalesOrderHeader.SalesOrderId =
                                                             SalesOrderDetail.SalesOrderId)

                                                          )
    group by salesOrderId
    order by MaxUnitPrice desc, MinUnitPrice desc

    Which will return a result set like:

    SalesOrderId MaxUnitPrice          MinUnitPrice
    ------------ --------------------- ---------------------
    46641        469.794               469.794
    46934        469.794               469.794
    46963        469.794               469.794
    47022        469.794               469.794
    47048        469.794               469.794

    73273          2.29                  2.29
    73040          2.29                  2.29
    51782         1.374                  1.374
    53564         1.374                  1.374
    65214         1.374                  1.374

    Which you can see, all of the max values are <  500 (So 500 is greater than all of the values).

    I haven't done a tremendous amount of performance testing (as you can tell), but it is clear to me that ANY and ALL have the propensity to be of some value in queries on occasion (and not just to get a blog entry posted so I don't vanish from the SQLBlog role again this month.

  • Classic Music, Attractions, and Presentations: Presenting on Database Design at SQL Saturday Atlanta 2015

    This next two weeks are going to be awesome. May 11, I get to see the Who for probably the last time ever. Saturday I am speaking on Database Design at SQL Saturday in Atlanta, then heading to Disney World.  It rarely gets even close to this good for a two week span, and SQL Saturday is a big part of that. I love going to SQL Saturdays, seeing all of the people who I usually only see as a handle in a twitter feed or blog post.

     

    As I considered this week, I couldn't help but try to find a somewhat interesting way to blog about it. The presentation I am doing is my oldest, and truly most favorite presentation. I have done it many times over, and it only changes a little bit here and there every time I give it.

     

    Baba O'Reilly, Carousel of Progress, and Database Design?

     

    The Who is celebrating 50 years, and a few songs they sing will be close to that age…and they are still relevant today and connect with the audience young and old. A (begrudgingly) favorite attraction of mine at Disney World is the Carousel of Progress, largely because it is charmingly old but the message still relevant (if dated,) particularly to a computer scientist who wants to affect technology for the future. Relational database design is a fundamental tool for producing software, even in the 30+ years since it was started. Much like you won't appreciate music and theme parks without knowing the classics, understanding relational database concepts will help you design any solution using a database.

     

    As a speaker and writer, you want to produce new and interesting stuff, but it may be the oldies that people want to hear. I love to do this presentation largely because the topic, and there are plenty of people who still need to learn the fundamentals of database design.  Of course, I won't entertain you like The Who,  and the classroom will almost certainly not rotate after each section of the presentation, but if you are hearing about design for the first or fiftieth time, I feel certain you will get something from it.

     

    If Database Design interests you, I hope to see you in my session this Saturday! If you are still on the fence, here is the abstract:

     

    Database Design Fundamentals

    Data should be easy to work with in SQL Server if the database has been organized as close as possible to the standards of normalization that have been proven for many years, but are often thought of as old-fashioned. Many common T-SQL programming "difficulties" are the result of struggling against these standards and can be avoided by understanding the requirements, applying normalization, as well as a healthy dose of simple common sense. In this session I will give an overview of how to design a relational database, allowing you to work with the data structures instead of against them. This will let you use SQL naturally, enabling the query engine internals to optimize your output needs without you needing to spend a lot of time thinking about it. This will mean less time trying to figure out why SUBSTRING(column,3,1) = 'A' is killing your performance, and more time for solving the next customer problem.

  • Off to Richmond for SQL Saturday

    I got the email prodding speakers to blog about our upcoming sessions, so I got myself up and started to write this blog. It has been such a long time since I have done much side SQL work (other than doing quite a bit of tech editing, along with doing some work for PASS leading up to speaker submissions), that my blog didn't even show up in the blogs list on sqlblog.com. My last blog was right after PASS when I had attended the Summit from my hospital bed.

    Since then, it has been quite a bumpy road. For a person who usually travels as much as I do for fun (SQL and Disney) and work, not having left the Nashville area since vacation in September has been weird. But all sorts of stuff have gotten in the way, mostly that I just haven't felt like blogging (heck, I haven't had an entry on my simple-talk blog since then either, though a few editorials were posted on sqlservercentral.com by my editor that I wrote pre-surgery).

    But now, finally, it is time to wake the heck up. I am leaving Nashville this Sunday, heading for a week of work in Virginia Beach, not coincidentally the same week as SQL Saturday Richmond where I will be talking about In-Memory OLTP tables and how they affect your database design.  It is also pretty cool that Jessica Moss will be presenting down in Virginia Beach while I am in town, so a stop at the Hampton Roads SQL Server User Group is definitely in order for most of the people I work with.

    Here is the abstract for my presentation (if you want Jessica’s go here):

     

    How In-Memory Database Objects Affect Database Design

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

     

    Does this mean I am fully back and over my funk? Good grief, I don't know. But I have submitted for 4 other SQL Saturdays over the rest of this year, and I have projects that are just waiting for me to get started. Some days I want to just lay down and not get up until it is time to go back to bed. Others I want to write a new book, travel to the ends of the earth and talk about SQL Server. The fact is, I am taking this one task at a time, and I look forward to talking about SQL Server for you at 9:45 on the 21st of March. And when that is over, I am going to Dollywood's opening weekend and let the Tennessee Tornado spin some sense into my head. Hope I see you there (SQL Saturday or Dollywood, either way we can have some fun!)

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement