THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

Utility Queries–Structure of Tables with Identity Column

Edit: At the suggestion of a much knowledgable commenter who shall remain named Aaron, I changed from using schema_name() function to using sys.tables. When writing code that is expected to have reuse, it can be safer to use the tables rather than functions because the tables will work in the context of the database that is in the from clause, so if you changed the code to database1.sys.tables because you wanted the tables from database1, and you were executing the code in database2, the columns of the table would give you the answer you expected, but the functions would be context of database2.

I have been doing a presentation on sequences of late (last planned version of that presentation was last week, but should be able to get the gist of things from the slides and the code posted here on my presentation page), and as part of that process, I started writing some queries to interrogate the structure of tables. I started with tables using an identity column for some purpose because they are considerably easier to do than sequences, specifically because the limitations of identity columns make determining how they are used easier.

In the future (which will probably be after PASS, since I have a lot of prep and 3 more presentations to do before PASS), I will start trying to discern the different cases where you might want to use a sequence and writing queries to make sure the table structures are as I desire. The queries presented here are really the first step in this direction, as in most cases I foresee a mixture of identity and sequence based surrogate keys even once people get to SQL Server 2012 as a typical set up. The queries I am presenting here will look for tables that meet certain conditions, including:

  • Tables with no primary key – Very common scenario, no idea about uniqueness, or sometimes that identity property alone makes the table an adequate table.
  • Tables with no identity column – Abolutely nothing wrong with this scenario, as the pattern of using an identity based primary key is just a choice\preference.  However, if you you expect all of your tables to have identity columns, running this query can show you where you are wrong.  I usually use this sort of query as part of a release, making sure that the tables I expected to have a surrogate actually do.
  • Tables with identity column and PK, identity column in AK – This query is interesting for looking at other people’s databases sometimes.  Not everyone uses the identity value as a surrogate primary key, and finding cases where it is in a non-key usage can help you find “interesting” cases.
  • Tables with an identity based column in the primary key along with other columns – In this case, the key columns are illogical. The identity value should always be unique and be a sufficient surrogate key on it's own.  By putting other columns in the key, you end up with a false sense of uniqueness. Ideally, you want your tables to have at least one key where all of the values are created outside of SQL Server. Sometimes people with use this for an invoice line item and make the pk the invoiceId and an identity value like invoiceLineItemId.

    I can’t say that this is “wrong” but if the only key includes a system generated value, it means that you can have duplicated data along with the system generated value. So you need to monitor the data more carefully.
  • Tables with a single column identity based primary key but no alternate key. – This is the classic ‘bad’ use of surrogate key abuse. Just drop a surrogate key on the table and viola!, uniqueness. If you can’t see why this wouldn’t be the desirable case, it is like the previous case, except the only uniqueness criteria is a monotonically increasing value.

You can download the code directly from here  or you can see all my downloadable queries on my downloadable package page: DownloadablePackages.

The queries:

--Tables with no primary key

SELECT  schemas.name + '.' + tables.name AS tableName
FROM    sys.tables
          JOIN sys.schemas
             ON tables.schema_id = schemas.schema_id
WHERE   tables.type_desc = 'USER_TABLE'
        --no PK key constraint exists
    AND NOT EXISTS ( SELECT *
                        FROM   sys.key_constraints
                        WHERE  key_constraints.type = 'PK'
                            AND key_constraints.parent_object_id = tables.object_id )


--Tables with no identity column

SELECT  schemas.name + '.' + tables.name AS tableName
FROM    sys.tables
           JOIN sys.schemas
               ON tables.schema_id = schemas.schema_id
WHERE   tables.type_desc = 'USER_TABLE'
--no column in the table has the identity property
    AND NOT EXISTS ( SELECT *
                     FROM   sys.columns
                     WHERE  tables.object_id = columns.object_id
                       AND is_identity = 1 )

--Tables with identity column and PK, identity column in AK

SELECT schemas.name + '.' + tables.name AS tableName
FROM   sys.tables
        JOIN sys.schemas
            ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = 'USER_TABLE'
        -- table does have identity column 
  AND   EXISTS (    SELECT *
                    FROM   sys.columns
                    WHERE  tables.object_id = columns.object_id
                        AND is_identity = 1 ) 
        -- table does have primary key 
  AND   EXISTS (    SELECT *
                    FROM   sys.key_constraints
                    WHERE  key_constraints.type = 'PK'
                      AND key_constraints.parent_object_id = tables.object_id )
        -- but it is not the PK 
  AND   EXISTS (    SELECT *
                    FROM   sys.key_constraints
                        JOIN sys.index_columns
                            ON index_columns.object_id = key_constraints.parent_object_id
                                AND index_columns.index_id = key_constraints.unique_index_id
                        JOIN sys.columns
                            ON columns.object_id = index_columns.object_id
                                AND columns.column_id = index_columns.column_id
                    WHERE  key_constraints.type = 'UQ'
                        AND key_constraints.parent_object_id = tables.object_id
                        AND columns.is_identity = 1 )

--Tables with an identity based column in the primary key along with other columns

SELECT schemas.name + '.' + tables.name AS tableName
FROM   sys.tables
         JOIN sys.schemas
            ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = 'USER_TABLE'
        -- table does have identity column
  AND   EXISTS ( SELECT *
                 FROM   sys.columns
                 WHERE  tables.object_id = columns.object_id
                   AND is_identity = 1 )
        --any PK has identity column
  AND   EXISTS( SELECT  *
                FROM    sys.key_constraints
                           JOIN sys.index_columns
                                ON index_columns.object_id = key_constraints.parent_object_id
                                   AND index_columns.index_id = key_constraints.unique_index_id
                           JOIN sys.columns
                                ON columns.object_id = index_columns.object_id
                                   AND columns.column_id = index_columns.column_id
                WHERE    key_constraints.type = 'PK'
                  AND    key_constraints.parent_object_id = tables.object_id
                  AND    columns.is_identity = 1 )
    --and there are > 1 columns in the PK constraint
    AND (  SELECT  COUNT(*)
           FROM    sys.key_constraints
                      JOIN sys.index_columns
                          ON index_columns.object_id = key_constraints.parent_object_id
                             AND index_columns.index_id = key_constraints.unique_index_id
            WHERE   key_constraints.type = 'PK'
              AND   key_constraints.parent_object_id = tables.object_id
        ) > 1


--Tables with a single column identity based primary key but no alternate key

SELECT schemas.name + '.' + tables.name AS tableName
FROM sys.tables
         JOIN sys.schemas
             ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = 'USER_TABLE'
        --a PK key constraint exists 
  AND   EXISTS ( SELECT * 
                 FROM   sys.key_constraints 
                 WHERE  key_constraints.type = 'PK' 
                   AND key_constraints.parent_object_id = tables.object_id )
    --any PK only has identity column 
  AND ( SELECT COUNT(*) 
        FROM   sys.key_constraints 
                  JOIN sys.index_columns 
                      ON index_columns.object_id = key_constraints.parent_object_id 
                         AND index_columns.index_id = key_constraints.unique_index_id 
                  JOIN sys.columns 
                      ON columns.object_id = index_columns.object_id 
                         AND columns.column_id = index_columns.column_id 
        WHERE  key_constraints.type = 'PK' 
          AND  key_constraints.parent_object_id = tables.object_id 
          AND columns.is_identity = 0
        ) = 0 --must have > 0 columns in pkey, can only have 1 identity column 

  --but no Unique Constraint Exists 
  AND NOT EXISTS ( SELECT * 
                   FROM   sys.key_constraints 
                   WHERE  key_constraints.type = 'UQ' 
                     AND key_constraints.parent_object_id = tables.object_id )
  

--Test Cases

--The following are some sample tables that can be built to test these queries. If you have other ideas
--for cases (or find errors, email louis@drsql.org)

IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoPrimaryKey'))
        DROP TABLE dbo.NoPrimaryKey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoIdentityColumn'))
        DROP TABLE dbo.NoIdentityColumn;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.IdentityButNotInPkey'))
        DROP TABLE dbo.IdentityButNotInPkey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.TooManyColumnsInPkey'))
        DROP TABLE dbo.TooManyColumnsInPkey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.MultipleColumnsInPkeyOk'))
        DROP TABLE dbo.MultipleColumnsInPkeyOk;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoAlternateKey'))
        DROP TABLE dbo.NoAlternateKey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.IdentityInAlternateKey'))
        DROP TABLE dbo.IdentityInAlternateKey;

--very common scenario, assuming identity makes the table great
CREATE TABLE NoPrimaryKey
(
    NoPrimaryKeyId int not null identity,
    AnotherColumnId int not null
)
go

--absolutely nothing wrong with this scenario, unless you expect all of your
--tables to have identity columns, of course...
CREATE TABLE NoIdentityColumn
(
    NoIdentityColumnId int primary key,
    AnotherColumnId int not null
)
go

--absolutely nothing wrong with this scenario either, as this could be desired.
--usually it is some form of mistake in a database using surrogate keys though
CREATE TABLE IdentityButNotInPkey
(
    IdentityButNotInPkeyId int primary key,
    AnotherColumnId int identity not null
)
go

--absolutely nothing wrong with this scenario either, as this could be desired.
--usually it is some form of mistake in a database using surrogate keys though
CREATE TABLE IdentityInAlternateKey
(
    IdentityInAlternateKeyId int primary key,
    AnotherColumnId int identity not null unique
)
go


--In this case, the key columns are illogical. The identity value should always be unique and
--be a sufficient primary surrogate key. I definitely want to know why this is built this
--way.  Sometimes people with use this for an invoice line item and make the pk the
--invoiceId and an identity value like invoiceLineItemId. I generally prefer the surrogate key
--to stand alone and have the multi-part key to be something that makes sense for the user
CREATE TABLE TooManyColumnsInSurrogatePkey
(
    TooManyColumnsInPkeyId int identity,
    AnotherColumnId int,
    primary key (TooManyColumnsInPkeyId,AnotherColumnId)
)
go

CREATE TABLE MultipleColumnsInPkeyOk
(
    TooManyColumnsInPkeyId int not null,
    AnotherColumnId int not null,
    primary key (TooManyColumnsInPkeyId,AnotherColumnId)
)
go

--this is my pet peeve, and something that should be avoided. You could end up having
--duplicate rows that are not logical.
CREATE TABLE NoAlternateKey
(
    NoAlternateKeyId int not null identity primary key,
    AnotherColumnThatShouldBeUnique int not null
)
go

Published Sunday, September 02, 2012 4:05 PM by drsql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Steve Morris said:

For these queries are I often find the objectpropertyex function simpler i.e.

--Tables with no primary key

select SCHEMA_NAME(schema_id) + '.' +  name

from sys.objects

where objectpropertyex(object_id, 'TableHasIdentity') = 0

September 4, 2012 6:04 AM
 

drsql said:

True, those objects do exist and I use them often too, but I find that using the tables is more natural.  I really hate using functions that I have to look up all of the time when the columns in the tables are there in a natural form. I can't think of them right off the top of my head, but I think there are a few properties only available in the objectproperty and objectpropertyex type functions.

I also like that subqueries give me real SQL to write in a manner that is consistent with all other uses of SQL. For example, in a "regular" table that has a lot of rows: objectpropertyex(object_id, 'TableHasIdentity') = 0 would be terrible for performance. In the system tables, it won't much matter in either case

September 4, 2012 10:57 AM
 

AaronBertrand said:

It is also easier to run queries against anotherdatabase.sys.columns than to have to get the various functions to run in the context of their own database. So I would even join to sys.schemas instead of using SCHEMA_NAME() in that case.

September 4, 2012 5:11 PM
 

drsql said:

Good point. It is just something that isn't natural because when I am thinking utilities, I am usually thinking manual code that I would execute. But when I publish code, I should go that extra mile. Changed in text and downloads.

September 4, 2012 6:02 PM
 

gbn said:

When checking for alternate keys, should you not check sys.indexes?

A unique index allows INCLUDES which may be useful: unique constraints do not. A unique constraint also exists in sys.indexes

CREATE TABLE dbo.AKTest (

   AKSurrogate int NOT NULL IDENTITY(1,1) PRIMARY KEY,

   AKNaturalKey uniqueidentifier NOT NULL UNIQUE)

GO

CREATE UNIQUE INDEX IXU_Natural ON dbo.AKTest (AKNaturalKey)

GO

SELECT * FROM sys.key_constraints KC WHERE KC.parent_object_id = OBJECT_ID('dbo.AKTest');

GO

SELECT * FROM sys.indexes I WHERE I.object_id = OBJECT_ID('dbo.AKTest');

GO

September 19, 2012 8:23 AM
 

drsql said:

gbn:

The purist in me would say no (which is why I didn't include it here, I was thinking of the practices I try to follow). You "should" do all of your constraining using constraints and not indexes.

At the same time, I do need to add some information along these lines about unique indexes that are not supersets of unique constraints, particularly because of filtered indexes that can be useful in constraining data that isn't 100% unique (like where you want to only check uniqueness amongst non-null values).

Will add this when I get time again to work on things...

Thanks!

September 19, 2012 8:57 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement