THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

  • Pro SQL Server 2012 Practices Chapter 8: Release Management Review

    This past year, I contributed a chapter to an anthology book of best practices for working with SQL Server 2012 entitled Pro SQL Server 2012 Practices (http://www.apress.com/9781430247708). As authors, for publicity we decided to do summary reviews one another's chapters. There are lots of great technical sounding chapters, but when I picked, I picked a chapter that I hoped to help me learn more about a process that is not in my favorite normal design or coding techniques area. Of the parts of the software development process I despise, release management is definitely one of them. As an architect, my primary love in software development starts with design, and starts to really drop off during testing. And I certainly did learn more about the process… TJay Belt (https://twitter.com/tjaybelt) wrote his chapter on release management. (I should also divulge that I have been friends with TJay through SQL PASS for quite some time, along with many of the authors of the book too.)

    TJay does a great job of describing the process of release management, talking about the process he uses and even admitting mistakes he and his teams have made along the way as well. The focus of the chapter is very much from the point of view of the releasing DBA role in the process (most of the book is very DBA centric topics) and contains a lot of tremendously good advice about getting release management right starting with having great documentation and a rollback plan be able to restart or put a release on hold if things go awry. In addition, he covers many of the the topics around the entire process of coding/releasing software, including version control, proper (and quite reasonable) documentation, coding standards, and most of all a set of well-defined processes that all of the varied players in the process have agreed to and work on as a team.

    My favorite part of the chapter was the approximately four pages of thought provoking questions that should be covered when doing a release, ranging from understanding the databases that will be affected, capacity planning, tuning, standards, code, jobs, etc. etc. Great food for thought for defining or refining your own release process.

    Of course, one of the concerns of a book with lots of different topics is that you don't get tremendously deep coverage of any subject (and this is also true in my chapter on database design.) However, in some ways this liberates the writer from having to cover every detail and instead provide a thoughtful discussion of the overall release management process. This is very much a blessing, because every organization is different and already has some process in place already. Maybe your defined process is awesome or awful, but this chapter can help you think of ways to refine your process. You are left to find your own tools and processes to meet your company's needs, but what you get is quite thought provoking and will likely be useful whether this is your first time doing a release, or if it your hundredth.

  • A wee bit exhausted… time to reenergize

    I admit it. I am tired and I have not blogged nearly enough. This has been a crazy year, with the book I finished writing, the pre-cons I have done (teaching is NOT my primary profession so I do a lot more prep than some others probably do), lots of training on Data Warehousing topics (from Ralph Kimball, Bob Becker, and Stacia Misner, to name three of the great teachers I have had), SQL Rally, SQL PASS, SQL Saturdays and I have gotten a lot more regular with my simple-talk blog as well… Add to this the fact that my daughter added a new grandchild to the family, and my mother has started to get so weak she is starting to fall down quite often (I am writing this blog entry from a spare bedroom at my mother-in-law’s house while my mom is in rehab!) and I am getting exhausted.

    Am I whining? Probably, but it is my blog! No, seriously I figure that occasionally you have to poke your head out from under the covers and write something and this is my something until after the New Year (other than posting a few already written and edited simple-talk blogs). I am on vacation from work for 2.5 weeks, and I don’t plan to do much with this laptop of mine for those two weeks unless the spirit hits me with an idea for a blog that I just have to write, but usually most of my blogs that have any technical or artistic merit take weeks to complete.  On the second of January, I hope to be back at it, analyzing my resolutions from last year, and making good on a few of them, particularly “Blog about my other (computer) love occasionally” and review some of the gadgets I have acquired as they pertain to doing my job as a writer/data architect. (Hint: My mother-in-law does not have Internet access, so some of the devices I have here are instrumental in my ability to work untethered for weeks on end.)

    So until next year, Merry Christmas, Happy Holidays, Happy New Year!  I hope your holidays are restful and fun.  I know part of mine will be because I intend to replicate this picture at least one or two more times next week, hopefully with a Turkey Leg in the hand that isn’t holding the camera taking the picture (all with my Windows Phone set on Battery Saver Mode, which delightfully turns off all syncing :)

    image

  • PASS Precon Countdown… See some of you Monday, and others on Tuesday Night

    As I finish up the plans for Monday’s database design precon, I am getting pretty excited for the day. This is the third time I have done this precon, and where the base slides are very similar, I have a few new twists in mind. One of my big ideas for my Database Design Workshop precon has always been to give people to do some design. So I am even now trying to go through and whittle down the slides and make sure that we have the time for design.

    If you are attending, be prepared to be a team player. I have 3 team exercise that you will do in teams. When we reach the first exercise, we will break up into 8 individual teams. Don’t try to figure out who to sit by, because I am going to randomly choose how to split up into teams when I see how the tables are (and I know that there will be at least one person there that I would want on my team :). The teams will be slightly important  because the most enthusiastic teams will get the first crack at the pile of swag, of which I have a lot. I have 20 physical and 15 ebooks of my new database design book, 15 8GB SD cards with the PowerPoint and code on them, 3 Joe Celko books, the Apress Applied Mathematics for Database Professionals book and a very nice Lego set and if this blog entices more people to show up than I have giveaways, well, then I will pick up some gift cards to even out the swag.

    While the lecture will take up a lot of time, the exercises will be most fun part of the day. The exercises I have planned are of the following genre:

    1.  Modeling from requirements: Taking a set of written requirements and producing the initial database design (20 minutes)

    2.  Normalizing a set: Taking a flat file and normalizing it into a set (~20 minutes)

    3.  Applying what we have discussed: Taking a set of requirements and producing a normalized database design (45 minutes)

    The first two exercises, every team will have the same requirements, but the third will see me having 4 separate designs. So we will have 4 different designs to to discuss and review. I am bringing my camera along to use to display to the team’s work on the screen. After I print the requirement packs for the teams, I plan to go through and do my own design for comparison. It will be interesting to see how different each team’s design is, and to see what I might miss when I do the design. I am going to encourage people to go beyond the specific requirements and build the system they think will be awesome while meeting the requirements.

    If all works out, my hope is to do a series of blogs next year using the requirements and designs that we produce as a result. I also (finally remembered to) put a request on the slide that students could do one of a couple of design ideas and I would review them (yes, with plans to turn that into a blog someday too.)

    So hope to see you Monday… And if I don’t see you in the class Monday, see you Tuesday night when we do our annual Quiz Bowl. Tim has come up with a slew of interesting questions including another round of Before and After questions to blow the mind of several SQL Server professionals…

  • 24 Hours of PASS next week, pre-con preview style

    I will be doing my Characteristics of a Great Relational Database, which is a session that I haven’t done since last PASS. When I was asked about doing this Summit Preview version of 24 hours of PASS, I decided that I would do this session, largely because it is kind of light and fun, but also because it is either going to be the basis of the end section of my pre-con at the summit or it is going to be the section of the pre-con we don’t get to because we are so involved in working out designs that we forget the time and the next day’s precon people start arriving and kick us out.

    The session is basically a discussion about the finishing touches that make a database better than average, something you can rave about, something you can brag to your significant other about, something your company will run a Super Bowl ad just thanking you for… Well, ok, seriously, a database that won’t cause you and your coworkers to ralph each time you use it is a solid step towards the way you will want to develop your databases. 

    The goal is to be reasonably like at a little bit fun, since I am doing the presentation at 11:00 PM Central Time in the US, and well, that isn’t exactly prime SQL time for most people. In Europe it will be the middle of the night, and in half of the US I will be competing with the national news and the end of the football game between the New York Giants and Carolina Panthers. If the game is close, I will be happy to share your attention, and heck, if my internet connection would support streaming video and the sharing client I would probably be watching the game myself (as it is, I will probably TiVo it and watch it on my phone via SlingBox when we are done…yes, I have a little bit of a football problem.)

    If you want to attend my session, click here and register. Even if database design isn’t your thing, 24 hours of PASS has (hold on to your hat) 24 different sessions in a 24 hour period to choose from. So click on over to the 24HOP Speaker/Session list and pick your sessions and register for them. I look forward to seeing you (well your name in the list) at the event.

    But db design is your thing (or you want it to be!), and you want to get a full day dose on the Monday before PASS, try my pre-con on Relational Database Design. It is going to be a great day, there will be plenty of learning, lots of swag (including at least 30 copies of my book to give away,) and some practical experience doing a bit of team based design. In any case it will be better than a normal Monday at the office.

  • Utility Queries–Structure of Tables with Identity Column

    Edit: At the suggestion of a much knowledgable commenter who shall remain named Aaron, I changed from using schema_name() function to using sys.tables. When writing code that is expected to have reuse, it can be safer to use the tables rather than functions because the tables will work in the context of the database that is in the from clause, so if you changed the code to database1.sys.tables because you wanted the tables from database1, and you were executing the code in database2, the columns of the table would give you the answer you expected, but the functions would be context of database2.

    I have been doing a presentation on sequences of late (last planned version of that presentation was last week, but should be able to get the gist of things from the slides and the code posted here on my presentation page), and as part of that process, I started writing some queries to interrogate the structure of tables. I started with tables using an identity column for some purpose because they are considerably easier to do than sequences, specifically because the limitations of identity columns make determining how they are used easier.

    In the future (which will probably be after PASS, since I have a lot of prep and 3 more presentations to do before PASS), I will start trying to discern the different cases where you might want to use a sequence and writing queries to make sure the table structures are as I desire. The queries presented here are really the first step in this direction, as in most cases I foresee a mixture of identity and sequence based surrogate keys even once people get to SQL Server 2012 as a typical set up. The queries I am presenting here will look for tables that meet certain conditions, including:

    • Tables with no primary key – Very common scenario, no idea about uniqueness, or sometimes that identity property alone makes the table an adequate table.
    • Tables with no identity column – Abolutely nothing wrong with this scenario, as the pattern of using an identity based primary key is just a choice\preference.  However, if you you expect all of your tables to have identity columns, running this query can show you where you are wrong.  I usually use this sort of query as part of a release, making sure that the tables I expected to have a surrogate actually do.
    • Tables with identity column and PK, identity column in AK – This query is interesting for looking at other people’s databases sometimes.  Not everyone uses the identity value as a surrogate primary key, and finding cases where it is in a non-key usage can help you find “interesting” cases.
    • Tables with an identity based column in the primary key along with other columns – In this case, the key columns are illogical. The identity value should always be unique and be a sufficient surrogate key on it's own.  By putting other columns in the key, you end up with a false sense of uniqueness. Ideally, you want your tables to have at least one key where all of the values are created outside of SQL Server. Sometimes people with use this for an invoice line item and make the pk the invoiceId and an identity value like invoiceLineItemId.

      I can’t say that this is “wrong” but if the only key includes a system generated value, it means that you can have duplicated data along with the system generated value. So you need to monitor the data more carefully.
    • Tables with a single column identity based primary key but no alternate key. – This is the classic ‘bad’ use of surrogate key abuse. Just drop a surrogate key on the table and viola!, uniqueness. If you can’t see why this wouldn’t be the desirable case, it is like the previous case, except the only uniqueness criteria is a monotonically increasing value.

    You can download the code directly from here  or you can see all my downloadable queries on my downloadable package page: DownloadablePackages.

    The queries:

    --Tables with no primary key

    SELECT  schemas.name + '.' + tables.name AS tableName
    FROM    sys.tables
              JOIN sys.schemas
                 ON tables.schema_id = schemas.schema_id
    WHERE   tables.type_desc = 'USER_TABLE'
            --no PK key constraint exists
        AND NOT EXISTS ( SELECT *
                            FROM   sys.key_constraints
                            WHERE  key_constraints.type = 'PK'
                                AND key_constraints.parent_object_id = tables.object_id )


    --Tables with no identity column

    SELECT  schemas.name + '.' + tables.name AS tableName
    FROM    sys.tables
               JOIN sys.schemas
                   ON tables.schema_id = schemas.schema_id
    WHERE   tables.type_desc = 'USER_TABLE'
    --no column in the table has the identity property
        AND NOT EXISTS ( SELECT *
                         FROM   sys.columns
                         WHERE  tables.object_id = columns.object_id
                           AND is_identity = 1 )

    --Tables with identity column and PK, identity column in AK

    SELECT schemas.name + '.' + tables.name AS tableName
    FROM   sys.tables
            JOIN sys.schemas
                ON tables.schema_id = schemas.schema_id
    WHERE tables.type_desc = 'USER_TABLE'
            -- table does have identity column 
      AND   EXISTS (    SELECT *
                        FROM   sys.columns
                        WHERE  tables.object_id = columns.object_id
                            AND is_identity = 1 ) 
            -- table does have primary key 
      AND   EXISTS (    SELECT *
                        FROM   sys.key_constraints
                        WHERE  key_constraints.type = 'PK'
                          AND key_constraints.parent_object_id = tables.object_id )
            -- but it is not the PK 
      AND   EXISTS (    SELECT *
                        FROM   sys.key_constraints
                            JOIN sys.index_columns
                                ON index_columns.object_id = key_constraints.parent_object_id
                                    AND index_columns.index_id = key_constraints.unique_index_id
                            JOIN sys.columns
                                ON columns.object_id = index_columns.object_id
                                    AND columns.column_id = index_columns.column_id
                        WHERE  key_constraints.type = 'UQ'
                            AND key_constraints.parent_object_id = tables.object_id
                            AND columns.is_identity = 1 )

    --Tables with an identity based column in the primary key along with other columns

    SELECT schemas.name + '.' + tables.name AS tableName
    FROM   sys.tables
             JOIN sys.schemas
                ON tables.schema_id = schemas.schema_id
    WHERE tables.type_desc = 'USER_TABLE'
            -- table does have identity column
      AND   EXISTS ( SELECT *
                     FROM   sys.columns
                     WHERE  tables.object_id = columns.object_id
                       AND is_identity = 1 )
            --any PK has identity column
      AND   EXISTS( SELECT  *
                    FROM    sys.key_constraints
                               JOIN sys.index_columns
                                    ON index_columns.object_id = key_constraints.parent_object_id
                                       AND index_columns.index_id = key_constraints.unique_index_id
                               JOIN sys.columns
                                    ON columns.object_id = index_columns.object_id
                                       AND columns.column_id = index_columns.column_id
                    WHERE    key_constraints.type = 'PK'
                      AND    key_constraints.parent_object_id = tables.object_id
                      AND    columns.is_identity = 1 )
        --and there are > 1 columns in the PK constraint
        AND (  SELECT  COUNT(*)
               FROM    sys.key_constraints
                          JOIN sys.index_columns
                              ON index_columns.object_id = key_constraints.parent_object_id
                                 AND index_columns.index_id = key_constraints.unique_index_id
                WHERE   key_constraints.type = 'PK'
                  AND   key_constraints.parent_object_id = tables.object_id
            ) > 1


    --Tables with a single column identity based primary key but no alternate key

    SELECT schemas.name + '.' + tables.name AS tableName
    FROM sys.tables
             JOIN sys.schemas
                 ON tables.schema_id = schemas.schema_id
    WHERE tables.type_desc = 'USER_TABLE'
            --a PK key constraint exists 
      AND   EXISTS ( SELECT * 
                     FROM   sys.key_constraints 
                     WHERE  key_constraints.type = 'PK' 
                       AND key_constraints.parent_object_id = tables.object_id )
        --any PK only has identity column 
      AND ( SELECT COUNT(*) 
            FROM   sys.key_constraints 
                      JOIN sys.index_columns 
                          ON index_columns.object_id = key_constraints.parent_object_id 
                             AND index_columns.index_id = key_constraints.unique_index_id 
                      JOIN sys.columns 
                          ON columns.object_id = index_columns.object_id 
                             AND columns.column_id = index_columns.column_id 
            WHERE  key_constraints.type = 'PK' 
              AND  key_constraints.parent_object_id = tables.object_id 
              AND columns.is_identity = 0
            ) = 0 --must have > 0 columns in pkey, can only have 1 identity column 

      --but no Unique Constraint Exists 
      AND NOT EXISTS ( SELECT * 
                       FROM   sys.key_constraints 
                       WHERE  key_constraints.type = 'UQ' 
                         AND key_constraints.parent_object_id = tables.object_id )
      

    --Test Cases

    --The following are some sample tables that can be built to test these queries. If you have other ideas
    --for cases (or find errors, email louis@drsql.org)

    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoPrimaryKey'))
            DROP TABLE dbo.NoPrimaryKey;
    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoIdentityColumn'))
            DROP TABLE dbo.NoIdentityColumn;
    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.IdentityButNotInPkey'))
            DROP TABLE dbo.IdentityButNotInPkey;
    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.TooManyColumnsInPkey'))
            DROP TABLE dbo.TooManyColumnsInPkey;
    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.MultipleColumnsInPkeyOk'))
            DROP TABLE dbo.MultipleColumnsInPkeyOk;
    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoAlternateKey'))
            DROP TABLE dbo.NoAlternateKey;
    IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.IdentityInAlternateKey'))
            DROP TABLE dbo.IdentityInAlternateKey;

    --very common scenario, assuming identity makes the table great
    CREATE TABLE NoPrimaryKey
    (
        NoPrimaryKeyId int not null identity,
        AnotherColumnId int not null
    )
    go

    --absolutely nothing wrong with this scenario, unless you expect all of your
    --tables to have identity columns, of course...
    CREATE TABLE NoIdentityColumn
    (
        NoIdentityColumnId int primary key,
        AnotherColumnId int not null
    )
    go

    --absolutely nothing wrong with this scenario either, as this could be desired.
    --usually it is some form of mistake in a database using surrogate keys though
    CREATE TABLE IdentityButNotInPkey
    (
        IdentityButNotInPkeyId int primary key,
        AnotherColumnId int identity not null
    )
    go

    --absolutely nothing wrong with this scenario either, as this could be desired.
    --usually it is some form of mistake in a database using surrogate keys though
    CREATE TABLE IdentityInAlternateKey
    (
        IdentityInAlternateKeyId int primary key,
        AnotherColumnId int identity not null unique
    )
    go


    --In this case, the key columns are illogical. The identity value should always be unique and
    --be a sufficient primary surrogate key. I definitely want to know why this is built this
    --way.  Sometimes people with use this for an invoice line item and make the pk the
    --invoiceId and an identity value like invoiceLineItemId. I generally prefer the surrogate key
    --to stand alone and have the multi-part key to be something that makes sense for the user
    CREATE TABLE TooManyColumnsInSurrogatePkey
    (
        TooManyColumnsInPkeyId int identity,
        AnotherColumnId int,
        primary key (TooManyColumnsInPkeyId,AnotherColumnId)
    )
    go

    CREATE TABLE MultipleColumnsInPkeyOk
    (
        TooManyColumnsInPkeyId int not null,
        AnotherColumnId int not null,
        primary key (TooManyColumnsInPkeyId,AnotherColumnId)
    )
    go

    --this is my pet peeve, and something that should be avoided. You could end up having
    --duplicate rows that are not logical.
    CREATE TABLE NoAlternateKey
    (
        NoAlternateKeyId int not null identity primary key,
        AnotherColumnThatShouldBeUnique int not null
    )
    go

  • SQLPASS DB Design Precon Preview

    It is just a few months left before SQLPASS and I am pushing to get my precon prepped for you. While it will be the second time I produce this on the year, I listened to the feedback and positive comments I have heard from potential attendees, so I am making a couple of big changes to fit what people really liked.

    1. Lots more design time. We will do more designs in some form, as a group, teams, and individually, depending on the room and people in attendance. (Figure on a lot of databases centered around toys, theme parks, and other situations that are not exactly serious, since they provide a limited enough case where no one will get hung up on how their company does it, but broad enough to mimic real business cases. )
    2. Pattern and code walkthroughs. I have a set of patterns (like uniqueness, hierarchies, and data driven design) that we can walk through and see how to translate from a database design to a physical implementation. It is based on the presentations I have done for the Data Architecture Virtual Chapter, and at Devlink this year, but we will not blast through any of it and will cover the code and designs in a deliberate pace and then consider designs where the pattern would make sense.

    So if you want an interactive experience where you get a chance to think for yourself (at least part of the time) come join me on November 5 in Seattle, Washington, don't think it, just do it: http://www.sqlpass.org/summit/2012/Registration.aspx (Not responsible if you actually get carried away and your employer won't cover your expenses)

    As a bonus, I have at least 30 books I will give away to attendees, 15 electronic and 15 ex-tree versions to share with you. The precon is largely taken from the book, but it would take me more than 7 hours to just read the book to, and I am afraid that would not impress anyone at all.

    If you want to see more of my thoughts on the pre-con, check out the interview PASS did for the pre-con here, it is wordy, but if it wasn't it would be mathematics: http://www.sqlpass.org/summit/2012/Sessions/PreConferenceSessions/PreConPreviews/LouisDavidson.aspx

    If you have any questions to ask me about what we will cover, or what you want to cover, or want to know I think I am funny, don't hesitate to go ahead and register…send me an email to drsql@hotmail.com.

  • And interview, an online session, a long drive and a SQL Saturday… This week!

    Later this week I will be doing an episode of the Greg Low’s excellent SQL Down Under podcast (http://www.sqldownunder.com/Resources/Podcast.aspx), something I did once before back in 2006.  If you haven’t listened to any of the previous editions, there are some amazing people who have been on his podcast.

    On Thursday at 12:00 Central Time, I will be doing a presentation entitled Designing for Common Problems in SQL Server for the PASS Data Architecture Virtual Chapter.

    Friday I will be driving up to Cleveland, OH for SQL Saturday 164. I will be doing the Designing for Common Problems in SQL Server session, along with the Sequences session that I have done at several SQL Saturdays so far.  Saturday I will give away two copies of my brand new book, one in each session, so if you want to be the first person I give one to, be there!

    Right now, the biggest issue is that the Designing for Common Problems session is WAY too long. In my prep so far, I have gotten halfway through with the patterns and code in one and a half hours. So who knows what I will do to cut down the time, either limit the patterns, or perhaps split the session? I will figure something out… at least on Saturday when I have real people I can poll the audience to see what they want to see in detail. Online pretty much all you see are people’s names and the clock ticking away.

    I have a few other things coming up, including picking speakers for Nashville’s SQL Saturday, shipping out books to my SQL Rally attendees, and Devlink at the end of the month (when I will have a bit longer to the Common Problems session, thankfully), but more on that after this weekend.

  • Louisville SQL Saturday…

    One more week until we get to SQL Saturday 122 in Louisville KY. I have a couple of sessions on the calendar this time. First, the one on Sequences:

    ------------------------------------

    What Sequence objects are (and are not)

    SQL Server 2012 adds a new object to our arsenal called a sequence that can will give us the capability to implement automatically incrementing values. However, it cannot replace a lot of functionality that we have used a numbers table and windowing functions for (though they can be complimentary to one another). In this session I will demonstrate the uses and performance characteristics of sequences, including how they compliment the use of number tables and windowing functions to create surrogate key and sorting values, and more.

    ------------------------------------

    The second session is my professional development session that goes along with my What Counts for a DBA blog series on SimpleTalk. Come with your ideas about what makes a great DBA so we can all get into the conversation (and not just so I won’t have to create too many slides). I will have my topic spinning wheel with me, so who knows exactly what we will discuss, not even I know.

    ------------------------------------
    What Counts For a DBA

    The world of a DBA can be daunting for a person, either as a new or old, because not only do they need to keep up with new and emerging technologies, but also with the code and designs of their coworkers. In this highly participation driven session, we will employ a random topic chooser to pick several of these traits for discussion as a group. Possible topics include past blog topics such as Logic, Curiosity, Failure, Humility, Skill and Passion, as well as any other topics that might be added for that day. So come prepared to participate and voice your opinion about what counts for a DBA.

    ------------------------------------

    Hope to see you there! (and if you can’t make it to Louisville, I will be in Cleveland OH for SQL Saturday #164 on August 16,  and then in Chattanooga for Devlink the last week of August. Chattanooga, TN.

  • Utility Queries–Database Files, (and Filegroups)

    It has been a while since I last posted a utility query, and today, to avoid other work I am supposed to be doing, I decided to go ahead and work on another post.  Today, I went ahead and worked on a server configuration type query. One query I find I use pretty often is the following one that lists the files in the database. In this blog I will include 3 queries.  The first will deal with files and databases, and the second runs in a database to see the files and their filegroups (If there is an easy way to get the filegroups at a server level, I am not sure of it…let me know).

    Database Files, All Databases – File Level (Download source)

    It is a pretty simple query, and it returns the following columns. (A value of '@TOTAL' indicates that the row is a summary row, and some file_types will not report a file size. ):

    • database_name – The name of the database
    • database_file_name – The file name that was set when the file was added to the database (the logical name, not the physical name)
    • size_in_kb – The size of the file in kilobytes, such that it matches the file size in the Windows Explorer
    • size_in_mb – The size of the file in megabytes, a size that is more typical the people want to see
    • size_in_gb – The size of the file in gigabytes, useful when looking at really large files
    • file_type – How the file is used in the server
    • filesystem_drive_letter – the drive letter where the file is located
    • filesystem_file_name – name of the physical file
    • filesystem_path – the path where the files are located.

    --Get the files and total size of files for all databases

    SELECT  --the name of the database
            CASE WHEN GROUPING(DB_NAME(database_id)) = 1 THEN '@TOTAL'
                 ELSE DB_NAME(database_id)
            END AS database_name ,

            --the logical name of the file
            CASE WHEN GROUPING(master_files.name) = 1 THEN '@TOTAL'
                 ELSE master_files.name
            END AS database_file_name ,

            --the size of the file is stored in # of pages
            SUM(master_files.size * 8.0) AS size_in_kb,
            SUM(master_files.size * 8.0) / 1024.0 AS size_in_mb,
            SUM(master_files.size * 8.0) / 1024.0 / 1024.0 AS size_in_gb,

            --the physical filename only
            CASE WHEN GROUPING(master_files.name) = 1 THEN ''
                 ELSE MAX(master_files.type_desc)
            END AS file_type , 
           
            --the physical filename only
            CASE WHEN GROUPING(master_files.name) = 1 THEN ''
                 ELSE MAX(UPPER(SUBSTRING(master_files.physical_name, 1, 1)))
            END AS filesystem_drive_letter ,              


           --thanks to Phillip Kelley from http://stackoverflow.com/questions/1024978/find-index-of-last-occurrence-of-a-sub-string-using-t-sql
           --for the REVERSE code to get the filename and path.

            --the physical filename only
            CASE WHEN GROUPING(master_files.name) = 1 THEN ''
                 ELSE MAX(REVERSE(LEFT(REVERSE(master_files.physical_name),
                         CHARINDEX('\', REVERSE(physical_name)) - 1)))
            END AS filesystem_file_name ,

            --the path of the file only
           cASE WHEN GROUPING(master_files.name) = 1 THEN ''
                 ELSE MAX(REPLACE(master_files.physical_name,
                    REVERSE(LEFT(REVERSE(master_files.physical_name),
                                 CHARINDEX('\', REVERSE(physical_name)) - 1)), ''))
                 END AS filesystem_path

    FROM    sys.master_files
    GROUP BY DB_NAME(database_id) , --the database and filegroup and the file (all of the parts)
             master_files.name WITH rollup
    ORDER BY database_name, database_file_name

    Single Database By Filegroup (Download source)

    Edited: Added code based on one of the comments here: http://www.sqlblog.lv/2011/05/ka-apskatit-datu-bazes-failu-izmeru-un.html.  His post does all db’s with sizing, but I preferred to have this query only work on one database. I added columns for available space, used space, as well as on disk space

    In the second query, it will, for one database, list all of the row and log filegroups and their files. Like the previous query, it may list filegroups that have a 0 size for types like full text. It uses sys.database_files for the files. This has one downside, and that is that if the database is read only, it is possible that the results will not be correct and will reflect a previous version of the metadata. Use master_files if you want to get current values, but there is no guarantees that it will match the filegroups. 

    It will return:

    • filegroup_name – The name of the filegroup in the database
    • database_file_name – The file name that was set when the file was added to the database (the logical name, not the physical name)
    • size_in_kb – The size of the file in kilobytes, such that it matches the file size in the Windows Explorer
    • size_in_mb – The size of the file in megabytes, a size that is more typical the people want to see (Commented Out)
    • size_in_gb – The size of the file in gigabytes, useful when looking at really large files
    • used_size_in_kb – The amount of the file that has data allocated, in kilobytes
    • used_size_in_mb – The amount of the file that has data allocated in megabytes, a size that is more typical the people want to see (Commented Out)
    • used_size_in_gb – The amount of the file that has data allocated, in gigabytes, useful when looking at really large files
    • available_size_in_kb – The amount of free space in kilobytes, such that it matches the file size in the Windows Explorer
    • available_size_in_mb – The amount of free space in megabytes, a size that is more typical the people want to see (Commented Out)
    • available_size_in_gb – The amount of free space in gigabytes, useful when looking at really large files
    • size_on_disk_kb – The amount of space the file takes in the file system (reported from the DMVs)
    • file_type – How the file is used in the server
    • filesystem_drive_letter – the drive letter where the file is located
    • filesystem_file_name – name of the physical file
    • filesystem_path – the path where the files are located.

    SELECT  --the name of the database

                   --the name of the filegroup (or Log for the log file, which doesn't have a filegroup)
                   CASE WHEN GROUPING(filegroups.name) = 1 THEN '@TOTAL'
                             WHEN filegroups.name IS NULL THEN 'LOGS'
                             ELSE filegroups.name
                    END AS filegroup_name ,
           
                   --the logical name of the file
                   CASE WHEN GROUPING(database_files.name) = 1 THEN '@TOTAL'
                            ELSE database_files.name
                   END AS database_file_name ,

                   --the size of the file is stored in # of pages
                   SUM(database_files.size * 8.0) AS size_in_kb,
                   --SUM(database_files.size * 8.0) / 1024.0 AS size_in_mb,
                   SUM(database_files.size * 8.0) / 1024.0 / 1024.0 AS size_in_gb,
                  
                   SUM(FILEPROPERTY(database_files.NAME,'SpaceUsed') * 8.0) AS used_size_in_kb,
                   --SUM(FILEPROPERTY(database_files.NAME,'SpaceUsed') * 8.0)/ 1024.0  AS used_size_in_mb,
                   SUM(FILEPROPERTY(database_files.NAME,'SpaceUsed') * 8.0) / 1024.0 / 1024.0 AS used_size_in_gb,                             

                   SUM((database_files.size - FILEPROPERTY(database_files.NAME,'SpaceUsed')) * 8.0) AS available_size_in_kb,
                   --SUM((database_files.size - FILEPROPERTY(database_files.NAME,'SpaceUsed')) * 8.0)/ 1024.0  AS available_size_in_mb,
                   SUM((database_files.size - FILEPROPERTY(database_files.NAME,'SpaceUsed')) * 8.0) / 1024.0 / 1024.0 AS available_size_in_gb,  

                   SUM(DIVFS.size_on_disk_bytes/1024.0) AS size_on_disk_kb,
                  
                  --the physical filename only
                  CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                            ELSE MAX(database_files.type_desc)
                   END AS file_type , 

                   --the physical filename only
                   CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                            ELSE MAX(UPPER(SUBSTRING(database_files.physical_name, 1, 1)))
                   END AS filesystem_drive_letter ,        
           
                  --thanks to Phillip Kelley from http://stackoverflow.com/questions/1024978/find-index-of-last-occurrence-of-a-sub-string-using-t-sql

                   --the physical filename only
                   CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                             ELSE MAX(REVERSE(LEFT(REVERSE(database_files.physical_name), CHARINDEX('\', REVERSE(database_files.physical_name)) - 1)))
                    END AS filesystem_file_name ,

                    --the path of the file only
                   CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                             ELSE MAX(REPLACE(database_files.physical_name, REVERSE(LEFT(REVERSE(database_files.physical_name),
                                                                                                                       CHARINDEX('\', REVERSE(database_files.physical_name)) - 1)), ''))
                    END AS filesystem_path
    FROM    sys.database_files --use sys.master_files if the database is read only and you want to see the metadata that is the database
                 --log files do not have a filegroup
                         LEFT OUTER JOIN sys.filegroups
                                 ON database_files.data_space_id = filegroups.data_space_id
                        Left Join sys.dm_io_virtual_file_stats(DB_ID(), DEFAULT) DIVFS
                                On database_files.file_id = DIVFS.file_id                        
    GROUP BY  filegroups.name ,
                     database_files.name WITH ROLLUP
    ORDER BY     --the name of the filegroup (or Log for the log file, which doesn't have a filegroup)
                     CASE WHEN GROUPING(filegroups.name) = 1 THEN '@TOTAL'
                              WHEN filegroups.name IS NULL THEN '@TOTAL-SortAfter'
                              ELSE filegroups.name
                      END,
                      database_file_name

    Hope these queries help out sometime.  More on the way as I finish up other projects!

  • Speaking at PASS 2012… Exciting and Scary… As usual…

    Edit: As I reread this, I felt I should clarify.. As usual refers mostly to the "Scary" part. I have a lot of stage fright that I have to work through. And it is always exciting to be picked.  

    I have been selected this year at the PASS Summit 2012 to do two sessions, and they are both going to be interesting.

    • Pre-Con: Relational Database Design Workshop - Abstract
    • Triggers: Born Evil or Misunderstood? - Abstract

    The pre-con session entitled Relational Database Design Workshop will be (at least) the third time I will have done this pre-con session, and I am pretty excited to take it to a bit larger scale. The one big change that I am forcing this time is a limit on the lecture time. Each of the first two times I have done this session the biggest disappointment has been that we didn't get nearly enough time for the exercises. If people get tired of the exercises, I will certainly have a lot of extra material to do, but the focus will be on getting at least three hours of design time in. Some as a full group on the screen, and some broken up into groups. (Of course, we will adjust the schedule based on the size of the group and whatever they are throwing at me verbally… and physically! I will have material to pad out at least an hour if people start looking bored (or if the group members start screaming at each other…you know, like a real database design session would be like if people weren't concerned with losing their jobs.))

    The triggers session is the one that I have been mulling over for years now, and it is going to be minimally interesting, and probably a lot of fun (particularly if Hugo Kornelis (@Hugo_Kornelis) and Tom LaRock (@SQLRockstar) (Tom is not a fan of triggers! http://thomaslarock.com/2009/03/sql-database-triggers/) show up to offer their opinions). Triggers are probably the most controversial of SQL Server objects, and for good reason. There are server and database settings that affect how they work, and it is not trivial to write them in a manner that doesn't harm performance. Worst yet, they can cause side effects that (if not performed correctly) really harm performance, data integrity, and the sanity of the developers who don't realize they exist. But for all of the negatives, there are some tasks that just fit the trigger to a T. I might be trying to do too much in a 1:15 session, but it would not be the first time!

    So I hope to see a good number of you there, for the pre- con, and certainly for the Trigger session. Just leave the potential projectiles in your hotel room...

  • PASS Nomination Committee

    Edit: Well, I didn't get elected, and at this point, seeing who won (great choices in Angela, Stephanie, and Jason) and what else I have going on between now and the end of the year (presentations, grand childen being born, etc), I couldn't be more pleased with the outcome.  Usually the phrase "maybe next year" is cold comfort to the people who were not (as the election site so elequently uses) top vote getters, but in this case, I think we can all be happy that the top vote getters will do a great job for PASS and that we have more time to do other community activities. So, see you maybe next year.

     -------------------------------------------

    Starting today you can vote for your choice for who should be on the Nomination Committee for SQL PASS (or perhaps some previous day if you aren't reading this on the 13, and if it is after noon on the 23, well, tough). There are 12 people who are up for vote with 3 slots available so this should be fun. There are a few people I know quite well on the list, and a few I don't really know at all. Looking at their qualifications I think one thing stands out. You probably can't make a bad choice. One of them is myself, and this is a job I did once in the past, and is something I feel like I am somewhat uniquely qualified to do from 11 years of working with PASS members and board members alike.

    The job is helping to pick who will get the chance to run for the Board of Directors for PASS by vetting the people who apply (according to the PASS By-Laws, Section VIII.3, assuming I am reading it right.) So, the fact is, your choice ought to be somewhat based on how your idea of a board member matches up with our ideas. Of course, it will be a team effort and there is always compromise.

    So let me tell you the criteria that I personally have always hoped to find in board members.

    Professional level technical skills (primarily SQL). The last thing we need is people who know nothing of what we do simply trying to make a name for themselves with the concept of a professional group like ours. We need people that understand what it is they will be representing. They don't have to be experts by any means, just solid skills, probably demonstrated in the community.

    Strong leadership qualifications. Ideally, we will get people who have led groups of people before. Not everyone who would be a great Board Member will live near enough to a SQL user group to be involved, so this could mean technical management, local government, and volunteer work for charities, church leadership, or even being a team leader. A big plus is that the person should be able to communicate well publicly to unify the community of PASS through the good times and the times that are opposite to the good times.

    History of volunteerism. Possibly just with PASS, or possibly with some other type of organism. The person ought to know what it means to work for not a buck.

    History with PASS. Even if for just a reasonably short period of time, the person should know enough about PASS that they aren't just blindly trying to join at the top. Attendance at a conference or two would be a good bonus so they know what it is about but not everyone can afford to attend, even if they were comp'd the entrance fee.

    Ideas on what they think PASS should be like and how they can help get it there. (Perhaps) obviously we all expect that candidates aren't just campaigning because they think it sounds cool. They have a desire to move PASS toward some better future.

    Over the years I have put my name in the hat a several times for the board of directors, and twice was cut by the Nomination Committee, and the one time I was a candidate I lost. The funniest part of that year's Summit for me was that I was sincerely relieved when Kevin Kline sat me down and told be I had lost. I did not (and still do not, really) have the strong leadership qualifications needed to be on the Board of Directors. However, like most any longtime volunteer, I feel like I have a solid understanding what kind of kind of person inspires me as a volunteer. (Hence the list!)

    If you look at the 5 qualities, all of them are solidly focused on community. If you agree with my vision of what a PASS leader should be, vote for me. I promise to make sure the candidates are up to the high quality that you expect. If you don't agree (or perhaps just didn't find my black and white photo to your liking), vote for someone else, everyone seems quite solid as members of this committee. 

    If I don't get elected...there is less work for me to do, and I will have more time to server PASS and the SQL Community in any other way I can.

  • Crazy Week Next Week–SQLRally and SQL Saturday Birmingham

    A lot of my time lately has been spent getting ready for next week and I hope to see you there.  First up SQL Rally.

    On Wednesday, I will be doing a pre-con session on Database Design. I have already blogged about it here, and I am definitely excited about it.  There are still seats available, and I will be giving away a few goodies including my new design book (though it won’t yet be printed, I will get it to attendees), several of my, Tim Ford and Glenn Berry’s DMV books from RedGate, and who know what else I might have in my bag (Apress is sending me a lot of swag to give away during the conference too).

    Friday (last session of the conference), I will be presenting my new session entitled Sequences (What They Are and Are Not).  This session is a great departure from my typical presentation.  I typically do 95% slides, and sometimes 100% slides.  In this presentation I currently have 17 slides, and that will only change to include a bit more  contact information.  I have a lot of code using sequences in multiple ways, including racing them against identities and different caching levels.

    After the rally ends, I will hop in my yellow Focus and make the following delightful journey:

    image

    Assuming traffic is kind to me and Bing’s time estimates are reasonably correct, a bit over 9 hours of driving and a night in a Hampton Inn to be named later and I should arrive at Birmingham’s SQL Saturday event.

    Sounds insane right?  Well, yeah, it kinda does.  But I wanted to make their event again this year, and it was just a couple of hours out of my way back to Nashville. You may also think it sound like a bit of vanity, and it isn’t really.  Assuming traffic is favorable, my hope is that I can arrive for the 2:00 sessions by Chris Price on Data Quality (or Kevin Boles on Column Store Indexes), Sven Aelterman’s session on File Tables as well before my 4:30 session on Sequences.  Obviously if traffic is bad it might turn out that I only make my own session, but even then I plan to hang out for the after event so it won’t be a total loss.

    In any case, hope to see you all in Dallas or Birmingham next week!

  • Utility Queries–Column Metadata

    Very often, I find myself wanting to query system metadata for columns. Some of the metadata is fairly easy to deal with, like if you want to find column names, just simply querying sys.columns is very straightforward.  But if you want to query data types, it gets stickier.  There are type types listed, one that is the declared type, and another for the base data type, which if you are using CREATE TYPE to create alias data types. So I started working on the query based on INFORMATION SCHEMA values, because it is a lot easier to work with. In my design book, I used that version (and will link this version in erratta/book updates.

    But in the Books Online entry for the COLUMNS object (and others too), I discovered the following warning, a bit too late the change: “Do not use INFORMATION_SCHEMA views to determine the schema of an object. The only reliable way to find the schema of a object is to query the sys.objects catalog view.” So I rewrote it to use the catalog views.

    The query is fairly complex looking, but overall is pretty simple.  I put comments through the code that explains what it does.  I will note that I have tested the database using a case sensitive collation, but I haven’t tried it with a and CLR types.  The query returns the following:

    • table_type – The type from sys.tables. Could be a user table, system table, user view, internal table
    • schema_name – The name of the schema of the table that the column is in
    • table_name – The name of the table
    • column_name – The name of the column
    • nullability – Whether that column is nullable, plus if the column is an identity column
    • declared_datatype – The data type as declared. For alias types (like sysname), this will be the name the user specified.
    • base_datatype – The data type as is implemented. For alias types this will be the base intrinsic type (like for sysname this will be nvarchar(128))
    • primary_key_column – A bit value, indicating that the column participates in a primary key constraint
    • column_id – The internal key of the column, which is used for ordering the columns
    • default_value – If there is a default constraint on the column, this will be the text of the declaration
    • column_check_constraint – If there is a check constraint that only deals with the one column, it will be considered a column check constraint. This will contain the text of the check constraint
    • table_check_constraint_reference – A bit value that indicates that there is one or more table level check constraints that reference the column

    The query text follows.  It is pretty unwieldy, so instead of the derived table you might want to consider building it into a view or even a procedure.  I keep a SSMS project with all of my metadata queries that I maintain over time to call in any place I need them, so I use use this one ad-hoc.  You can download the query from my website’s downloadable packages page here: http://www.drsql.org/Documents/ColumnMetadataQuery.sql

    SELECT  *
    FROM    (
    SELECT    REPLACE(LOWER(objects.type_desc), '_', ' ') AS table_type, schemas.name AS schema_name, objects.name AS table_name,
                        columns.name AS column_name, CASE WHEN columns.is_identity = 1 THEN 'IDENTITY NOT NULL'
                                                          WHEN columns.is_nullable = 1 THEN 'NULL'
                                                          ELSE 'NOT NULL'
                                                     END AS nullability,
                       --types that have a ascii character or binary length
                        CASE WHEN columns.is_computed = 1 THEN 'Computed'
                             WHEN types.name IN ( 'varchar', 'char', 'varbinary' ) THEN types.name +
                                            
    CASE WHEN columns.max_length = -1 THEN '(max)'       
                                                    ELSE '(' + CAST(columns.max_length AS VARCHAR(4)) + ')' END       
                             --types that have an unicode character type that requires length to be halved
                             WHEN types.name IN ( 'nvarchar', 'nchar' ) THEN types.name +
                                              CASE WHEN columns.max_length = -1 THEN '(max)'       
                                                    ELSE '(' + CAST(columns.max_length / 2 AS VARCHAR(4)) + ')'       
                                                                                         END
                              --types with a datetime precision
                             WHEN types.name IN ( 'time', 'datetime2', 'datetimeoffset' ) THEN types.name +
                                                                       '(' + CAST(columns.scale AS VARCHAR(4)) + ')'
     

                             --types with a precision/scale
                             WHEN types.name IN ( 'numeric', 'decimal' )
                             THEN types.name + '(' + CAST(columns.precision AS VARCHAR(4)) + ',' +
                                                                        CAST(columns.scale AS VARCHAR(4)) + ')'

                            --timestamp should be reported as rowversion
                             WHEN types.name = 'timestamp' THEN 'rowversion'
                             --and the rest. Note, float is declared with a bit length, but is
                             --represented as either float or real in types 
                             ELSE types.name
                        END AS declared_datatype,

                       --types that have a ascii character or binary length
                        CASE WHEN baseType.name IN ( 'varchar', 'char', 'varbinary' )
                                      THEN baseType.name +
                                                  CASE WHEN columns.max_length = -1 THEN '(max)'
                                                       ELSE '(' + CAST(columns.max_length AS VARCHAR(4)) + ')'
                                                  END
                   
                             --types that have an unicode character type that requires length to be halved
                             WHEN baseType.name IN ( 'nvarchar', 'nchar' ) 
                                      THEN baseType.name +
                                                  CASE WHEN columns.max_length = -1 THEN '(max)'       
                                                        ELSE '(' + CAST(columns.max_length / 2 AS VARCHAR(4)) + ')'       
                                                  END

                             --types with a datetime precision
                             WHEN baseType.name IN ( 'time', 'datetime2', 'datetimeoffset' )
                                       THEN baseType.name + '(' + CAST(columns.scale AS VARCHAR(4)) + ')'

                             --types with a precision/scale
                             WHEN baseType.name IN ( 'numeric', 'decimal' )
                             THEN baseType.name + '(' + CAST(columns.precision AS VARCHAR(4)) +
                                                     ',' +  CAST(columns.scale AS VARCHAR(4)) + ')'

                             --timestamp should be reported as rowversion
                             WHEN baseType.name = 'timestamp' THEN 'rowversion'
                             --and the rest. Note, float is declared with a bit length, but is
                             --represented as either float or real in types
                             ELSE baseType.name
                        END AS base_datatype,
                        CASE WHEN EXISTS ( SELECT *       
                                           FROM   sys.key_constraints       
                                                     JOIN sys.indexes       
                                                          ON key_constraints.parent_object_id = indexes.object_id       
                                                              AND key_constraints.unique_index_id = indexes.index_id       
                                                     JOIN sys.index_columns       
                                                          ON index_columns.object_id = indexes.object_id       
                                                              AND index_columns.index_id = indexes.index_id       
                                           WHERE  key_constraints.type = 'PK'       
                                             AND columns.column_id = index_columns.column_id       
                                             AND columns.OBJECT_ID = index_columns.OBJECT_ID )
                                     
    THEN 1       
                              ELSE 0 END AS primary_key_column,
                       
    columns.column_id, default_constraints.definition AS default_value,
                        check_constraints.definition AS column_check_constraint,
                        CASE WHEN EXISTS ( SELECT   *
                                           FROM     sys.check_constraints AS cc
                                           WHERE    cc.parent_object_id = columns.OBJECT_ID
                                                    AND cc.definition LIKE '%~[' + columns.name + '~]%' ESCAPE '~'
                                                    AND cc.parent_column_id = 0 ) THEN 1
                             ELSE 0
                        END AS table_check_constraint_reference
              FROM      sys.columns
                        JOIN sys.types
                            ON columns.user_type_id = types.user_type_id
                        JOIN sys.types AS baseType
                            ON columns.system_type_id = baseType.system_type_id
                               AND baseType.user_type_id = baseType.system_type_id
                        JOIN sys.objects
                                JOIN sys.schemas
                                       ON schemas.schema_id = objects.schema_id
                            ON objects.object_id = columns.OBJECT_ID
                        LEFT OUTER JOIN sys.default_constraints
                            ON default_constraints.parent_object_id = columns.object_id
                                  AND default_constraints.parent_column_id = columns.column_id
                        LEFT OUTER JOIN sys.check_constraints
                            ON check_constraints.parent_object_id = columns.object_id
                                 AND check_constraints.parent_column_id = columns.column_id ) AS rows
    WHERE   table_type = 'user table'
                  AND schema_name LIKE '%'
                  AND table_name LIKE '%'
                  AND column_name LIKE '%'
                  AND nullability LIKE '%'
                  AND base_datatype LIKE '%'
                  AND declared_datatype LIKE '%'
    ORDER BY table_type, schema_name, table_name, column_id

  • SQL Rally Relational Database Design Pre-Con Preview

    On May 9, 2012, I will be presenting a pre-con session at the SQL Rally in Dallas, TX on relational database design. The fact is, database design is a topic that demands more than a simple one hour session to really do it right. So in my Relational Database Design Workshop, we will have seven times the amount of time in the typical session, giving us time to cover our topics in a bit more detail, look at a lot more designs/code, and even get some time to do some design as a group. Our topics will be:

    • Theory - More or less the foundational principals and processes that will presumably help you understand the "why" behind the rest of the material.
    • General Data Modeling - The basic concepts behind database design, data modeling, graphically and semantically. Terminology and concepts will be covered to make sure that when I say a word, you know what I am meaning. The field of computer science is littered with confusing terminology that needs to be made clear.
    • Normalization - Basically the process of making your database work well with the relational engine form both a performance and data integrity point of view.
    • Physical Modeling - The actual process of creating a working database by choosing proper data types, protecting the data integrity, etc. We will discuss the steps, as well as take a look at a model that is implemented.
    • Implementation Patterns- For the most part, the primary pattern that we use in a relational database is normalization. Database, tables and columns are rather easily mapped to requirements, and using normalization, we usually will arrive at a solution.However, there are certain types of solutions that crop up in common implementations. Examples include uniqueness, hierarchies, files, user-specified schema, data driven design, and more.
    • Other Miscellaneous Advice - Time permitting, I have a set of slides that cover some basic performance/security/implementation related material.

    A lot of the material/examples/demos come from my 2012 book (SQL Server 2012 Relational Database Design and Implementation) that will be shipping just before the Rally, so I am making arrangements to get copies of the book for most if not all of the attendees. So the day will introduce the material to you in a rapid format, and then you can take the material home and read it again (and my email is always available for further questions on the material as well.)

  • Utility Objects–Waitfor Delay Coordinator (SQL Server 2008+)

    Finally… took longer than I had expected when I wrote this a while back, but I had to move my website and get DNS moved before I could post code…

    When I write code, I do my best to test that code in as many ways as necessary. One of the last types of tests that is necessary is concurrency testing. Concurrency testing is one of the most difficult types of testing because it takes running multiple processes simultaneously and making sure that you get the correct answers multiple times. This is really difficult when you need to make > 1 statement that takes only a second or two simultaneously with another (or even 10 other) connections.  What I am trying to replicate is the WAITFOR statement with a DELAY, but multiple connections use the same delay.

    As I am writing my sequences presentation for SQL Rally, I wanted to test how values were allocated to each row based on different caching levels. I was looking at the clock, creating WAITFOR TIME statements, and copying the values into multiple windows as I have done so many times before. I was also thinking that I should start blogging again, and (like my sequence presentation) do something code oriented rather than design for a bit (I just finished my design book, so I am going to rest on that subject for a bit, well, other than the SQL Rally precon that I am subliminally trying to get you to sign up for before prices go up again).

    So envisioned a kind of multi-user WAITFOR statement that would say: Start all processes in 20 seconds, then start multiple connections. (As I write this, I am envisioning a kind of semaphore version of this that you could hold until ready to go, but more on that some other day.)  I initially used some 2012 features like FORMAT, but I wanted to make sure this was useful to a more wide audience.  The usage of the system is really simple, to have your connection wait for 20 seconds, you execute:

    EXEC Utility.WaitForSync$StartWait @WaitSeconds=20

    To have multiple connections start at the same second, you execute the same statement on a different connection.  The stored procedure calculates the time in 20 seconds, stores the value off and executes a WAITFOR TIME statement on each connection you have executed this statement on.  After the time passes and the connection continues, you have to reset the connection or you will receive the following message (not an error, just a PRINT statement, since you may want the batch to complete):

    WaitForSync: An error occurred. Message:
    Too late to start another WAITFOR session. Last session DELAY time was 2012-03-20 17:55:03. Use Utility.WaitForSync$reset to start new sessions

    To reset the session, as the message says, just execute Utility.WaitForSync$reset. There are also procedures to view the time when the processes will continue. 

    If you want to not do the action if the WAITFOR TIME statements have already completed before you start an operation, the procedure will return a –100.  So you might use something like the following TRY…CATCH Block to control execution:

    BEGIN TRY
    DECLARE @RETVAL int
    EXEC @RETVAL = Utility.WaitForSync$StartWait @WaitSeconds=10
    IF @RETVAL = -100
         RAISERROR ('Time Elapsed',16,1)

    --Other code

    END TRY
    BEGIN CATCH
        PRINT ERROR_MESSAGE()
    END CATC
    H

    Which would return something like:

    WaitForSync: An error occurred. Message:
    Too late to start another WAITFOR session. Last session DELAY time was 2012-03-20 21:20:06. Use Utility.WaitForSync$reset to start new sessions

    Time Elapsed

    You can download the code from my website’s downloadable code page

     

    Of course, the most interesting thing about downloading code is learning how to write your own code that uses the same patterns and techniques, so let me cover the highlights.  The table is pretty simple:

    CREATE TABLE Utility.WaitForSync
    (
        WaitforDelayValue nchar(8) NOT NULL,
        StartTime          datetime2(0) NOT NULL, --used to make sure you can do a waitfor that crosses a day boundry
        OnlyOneRow          tinyint PRIMARY KEY DEFAULT (1) CHECK (OnlyOneRow = 1) 
    );

    It only allows a single row using the PK Constraint, a default, and a check constraint. A bit of a trick, but definitely workable trick to ensure a given cardinality in a table.  The important two columns are one for the value that will be used in the WAITFOR statement, and the StartTime column is used to allow you to do your demos right around midnight (and yes, I put this in because when I was first using the procedure, it was taking a really long time to complete late one night.)

    The idea is, the first user will put a row in here at a given time, then every caller after that (and before the StartTime) will use the same time.  After the StartTime, the process has already started.

    The simplest procedure is the reset procedure.  It just deletes the row from the table (even if other users are still in progress, but this is just a utility for demos and testing).

    CREATE PROCEDURE Utility.WaitForSync$Reset
    AS
    ----------------------------------------------------------------------------------------------
    -- Louis Davidson        drsql.org
    --
    -- Simple enough, just delete the row from the table
    ----------------------------------------------------------------------------------------------
        DELETE FROM Utility.WaitForSync
    GO

    The StartWait procedure is a bit more complicated that that..  It takes a parameter of @WaitSeconds that lets the first caller to set the time.  Usually 10 seconds works great for a few connections, but it depends on just how much you need to coordinate.  The code is commented, and is really pretty simple.  The basics are that it checks to see if a row exists and grabs the value, if it doesn’t, then it creates a new row. 

    ALTER PROCEDURE Utility.WaitForSync$StartWait
    (
        @WaitSeconds    int --minimum amount of time to wait. The WAITFOR statement
                                --always starts as minute changes..
    )
    AS
    ------------------------------------------------------------------------------------------------
    ---- Louis Davidson        drsql.org
    ----
    ---- Either starts a new wait for session or uses the existing one
    ------------------------------------------------------------------------------------------------
    SET NOCOUNT ON
    BEGIN TRY
        DECLARE @StartTime datetime2(0),
                @WaitforDelayValue NCHAR(8),
                @ProcStartTime datetime2(0) = SYSDATETIME();
        --Get the row from the table where we hold the sync value.
        SELECT  @StartTime = StartTime,
                @WaitforDelayValue = WaitforDelayValue
        FROM   Utility.WaitForSync;

        --if a value was not already stored, we are just starting
        IF @StartTime IS NULL
          BEGIN
                --set the startTime to the current time + the number of seconds in parame
                  SET @StartTime = DATEADD(second,@waitSeconds,SYSDATETIME());

                --then I use a good old style convert with a format to get the time for the delay
                SET @WaitforDelayValue = CONVERT(nchar(8),@starttime, 108);

                --insert the value into the WaitForSyncing table. StartTime willbe used to make sure
                --the time is later, even if it crosses the day boundry
                 INSERT INTO Utility.WaitForSync(WaitforDelayValue, StartTime)
                VALUES (@WaitforDelayValue,@StartTime);
          END
        --if the time has already passed, we raise an error to the client that you can't piggy back on the
        --existing session, reset   
        ELSE IF @StartTime <= SYSDATETIME()
          BEGIN
            DECLARE @msg nvarchar(1000)
            SET @msg = N'Too late to start another WAITFOR session. Last session DELAY time was '
                        + CAST(@startTime AS NVARCHAR(20)) + '.'
                       + N' Use Utility.WaitForSync$reset to start new sessions';
            RAISERROR (@msg,16,1);
          END
        --finally, we take the delay value we created and use it in an execute statement
        --note that SSMS won't how the SELECT until after the batch resumes.
        DECLARE @queryText NVARCHAR(100) = 'WAITFOR TIME ' + QUOTENAME(@WaitforDelayValue,'''');
       
        EXECUTE (@queryText);
        PRINT 'WaitForSync: Starting at: ' + CAST(@startTime AS NVARCHAR(20)) +
               ' Waited for: ' + CAST(DATEDIFF(SECOND,@procStartTime, SYSDATETIME()) AS VARCHAR(10)) + ' seconds.'
      END TRY
     
      BEGIN CATCH
        --benign output that won't stop anything. Simply keep going
        PRINT 'WaitForSync: An error occurred. Message: ' + CHAR(13) + CHAR(10)
                + ERROR_MESSAGE()
        RETURN -100 --The caller can use the return value to decide if they want to stop what they are doing
      END CATCH
    GO

    Finally, there is a viewing procedure that lets you see what the time that the procedure starts is.  Note that it has a 1 return value if there is no session started, and a –100 if an expired session is out there or there is an error.

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement