THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Louis Davidson

  • Inheritance in Database Design

    As I have been walking around Disney World this week, my mind starts to wander to matters of database design. Sad, perhaps, but I will guess that most people who read this blog do the same much the same thing with whatever technology they are good at when they are relaxing also.  It also may actually have helped me come up with an example for my next book (especially if I can double the size of the book!)

    Examples are the hardest of all parts of writing for me. Thinking of an example that covers a lot of situations as is really hard, but I think that a theme park might actually be the answer (if it is and you are a writer, don’t steal my idea or I will steal it right back :).  It seems like it actually covers most if not all of the situations that are needed:

    • Good entities (People, places, ideas)
    • Rich attributes to describe the entities
    • All of the different parts of normalization, including 4nf (show, cast member, location)
    • Geography data (to locate all of the different locations)
    • Geometry data (finger print biometrics)
    • Image data (pictures of attractions, menus, etc)
    • Subclasses

    And many more. 

    But it is the Subclasses that got me thinking. Initially I was thinking that a Location entity would be the top level entity, then attraction, restaurant, shop, etc as subclasses.  But is this really a subclass? Thinking about the AdventureWorks database the other day I realized the reason I dislike that design is the way it has a BusinessEntity entity as a superclass to almost EVERYTHING. A subclassed entity isn’t like a generic object in an object oriented design, but rather it should be more of a specific entity with even more specific information tacked on for specific types.  The point being that you would figure to use the superclass quite often as itself, with the subclass giving you some additional information sometimes, but not as a rule exclusively using the subclass entities.

    So how alike is a roller coaster, or a restaurant?  Seems that the only thing these share is a position on the globe. So instead of a subclassed entity, why not specify standalone entities for each and create an independent entity for position.  This would be a geographic point, and would serve as the entrance of sorts to the math of determining how far apart the different locations are, as well as (possibly) to predict which place a person would be likely to go next.  This position is not really a parent to the items, but more of an interface (though technically I don’t think an interface doesn’t specify storage…)

    So each table would get a relationship to the position entity that would contain all of the different locations of “stuff” that you have. This is not so different from how address is often implemented, except that an attraction shouldn’t need more than one position. It might have a geometric representation of the area it takes up, but I would think that each location would have one single point on the map that would represent it.  Might this get so detailed as to have multiple items per location?  For example, a very large store having multiple entrance points, each so different as to direct a person in a different way?  Sure, and this might be a matter for the other tables to worry with.  Each position would also need to be able to tell the distance from itself to other positions, with knowledge of the terrain.

    Just having an interface would make things easier to use the subclass technique for the things that are common.. Like a queue.  There are different types of queues all around a them park, and several different types. A roller coaster uses several of them itself (a special ticket to get you through fast, a just stand here and wait line, single rider, etc) as does a restaurant (just stand here and wait, and in some cases, reservations, or priority seating).  Each of these queues shares commonality, but at the same time some differences are obvious.  And an actual queue for one ride may use all of the different types.

    Clearly this is more to flesh out if I use this example…Just food for thought about subclasses.

    As a different example, would it seem natural to have a person entity that was then subclassed into guest and cast member? They do share obvious commonality, and (given the right technology) you would want to track all of their movements. But you probably don’t know that much about guests. Maybe guests aren’t considered people as much as tickets (in the database only) and the ticket is associated with the human characteristics of the guest (height, weight, sex, etc so as to help direct them on their way/predict what they might do? Small kids, fluffy bunny ride?)  Then we might be back to guests and cast members sharing an interface for their location. How alike are location and position?  Probably look alike, but you would want a history of movement if you could track a person’s ever move, right?

    The goal of a database is to store the information in a manner that is useful for analysis both to predict future events, as well as to look at how current processes work each day.  Getting everything just right is the dream of every designer. Of course all of this tells me that I have to be careful with this example or it will grow REALLY huge.  Hopefully I won’t need to finish this too soon, but who knows when the next major version of SQL Server is.  Much to consider, there is.


  • Apologies for comments going unpublished

    I had quite a few comments out there that were unpublished.  One person was particularly slighted and I am sending him a "sorry" gift.  I was stuck in "writer rut" and forgot about the fact that comments have to be allowed when people first come to the site (to eliminate spam!)


  • Commenting your code

    As I am easing back into real life from writing the book, I am in search of easy targets for blogging.  My boss mentioned this blog over on Jeff Atwood's Coding Horror Blog and it got me thinking about commenting.  His advice is to only comment "why" the code works.  I can't quite agree, because the code he claims to be acceptable is:

    private double SquareRootApproximation(n) {
      r = n / 2;
      while ( abs( r - (n/r) ) > t ) {
        r = 0.5 * ( r + (n/r) );
      }
      return r;
    }
    System.out.println( "r = " + SquareRootApproximation(r) );

    I mean, it is better than some code I have seen,  but still, I would like a bit more information about why this works.  Maybe the name of the algorithm used, or at least what to do if this fails to provide the expected results.  Admittedly this is probably something that could be easily found, but most algorithms are not.  Comments in my mind should at least lead you to understand the mindset of the programmer.  What would actually improve this code in my mind is to change the variables to full words (though in this case it might not make sense to do this.)

    On an extremely different side of things is this article from "Edgewood Solutions Engineers" on mssqltips.com. Their answer is to explain what the code is doing in simple terms, making sure to comment almost everything.  They have a very elaborate header devised, with dependencies, both users of the object and objects it used.  Most of what is said seems a bit like overkill, but their point here "Comment all of the major code blocks of the code and the critical minor points that can be easily overlooked such as a obscure WHERE clause." is a good one.  I generally pepper my code with comments where I think it will be hard to debug for myself later, with a consideration for others, particularly when those others will call me to explain the code.

    Which brings me to my commenting philosophy. I personally think you have to comment to the expected lowest common denominator.  Think of the dumbest person who could have the need to read your code who is also qualified to have their job (otherwise you would have to write instructions on every line of code). If the qualified person can figure out what you are doing just by your naming conventions and , then it doesn't need comments. But if that person would look at the code and reasonable figure it out, then there is no need to comment the code.  What this requires is a few things:

    • Naming objects - if your procedures, tables, columns, functions all have meaningful names, you won't have to explain what they mean, saving time
    • Good design - if the relationship between objects and the cardinality of those relationships is clear, then you don't need to explain that what you are doing is hack due to poor thinking...
    • Naming variables - probably the most important thing to avoid the need for comments is naming stuff.  Name variables with words, not single character values (except sometimes i, x, etc will suffice for obvious typical uses)
    • Reasonable code formatting - SQL has no real form, so you could write procedures on a single line.  You could.  You could smash your hand with a hammer too.  Neither action would be very good.  (Consider using Red-Gate's SQL Refactor tool if nothing else.)

    However, the fact is, for SQL code, the real problem comes in when you start coming up with cool relational methods of solving problems that most moderately qualified people wouldn't get. For example the trick of using a sequence table to break apart a comma delimited list. Couple that with a join and you get some amazingly cool code, but how do you comment it?

    For example, say an architect that shouldn't be an architect designs a table with a comma delimited list like this (didn't I mention good design earlier?  I hate having to say this is a hack, but it is an elegant hack...)

    --excerpted from Chapter 7 of Pro SQL Server 2008 Relational Database Design and Implementation
    CREATE TABLE poorDesign
    (
         poorDesignId int,
         badValue varchar(20)
    )

    INSERT INTO poorDesign --using 2008 syntax
    VALUES (1,'1,3,56,7,3,6'),
                (2,'22,3'),
                (3,'1')

    You can "normalize" this set using a table of numbers (in my examples named tools.sequence) and a really cool join:

    SELECT    poorDesign.poorDesignId as betterDesignId,
              SUBSTRING(',' + poorDesign.badValue + ',',i + 1,
                    CHARINDEX(',',',' + poorDesign.badValue + ',',i + 1) - i - 1)
                                  
    as betterScalarValue
    FROM     poorDesign
                 JOIN tools.sequence
                        on i >= 1
                          AND i < LEN(',' + poorDesign.badValue + ',') - 1
                          AND SUBSTRING(',' + + poorDesign.badValue + ',', i, 1) = ','

    But are there enough pixels available on the planet to make that more understandable to most SQL programmers? Even the reasonably qualified?   I mean, I am still kind of amazed at the technique and the fact that it returns the following:

    betterDesignId betterScalarValue
    -------------- -----------------
                 1                 1
                 1                 3
                 1                56
                 1                 7
                 1                 3
                 1                 6
                 2                22
                 2                 3
                 3                 1

    still impresses me.  Frankly I don't know how to comment that code to make it readable.  In a real situation I would settle for a comment before the SELECT that stated:

    --Uses a table of numbers to parse the comma delimited list into a SQL acceptable format.
    --If you don't understand this code, read this article: http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum

    Opinions? What do you use for a comments in your code?  Do you have commenting policies?


  • Triggers...Evil?

    Say it isn't so. "It isn't so."  Glenn Berry thinks so in his post here. When I read his post I thought I was going to really get into it with Conor's post here, called the Trouble with Triggers (a title which I had to post because I like Star Trek too,) but frankly I agree with Conor.  Triggers are great tools, when applied correctly.  When they are used in a bad way (which is quite often, from the code I have seen), well, agree with Glenn I must...

    In my book, I advocate triggers for a few things:

    * Cross-database referential integrity (RI): Just basic RI, but SQL Server doesn’t manage declarative constraints across database boundaries.

    * Intra-table, inter-row constraints: For example, when you need to see that the sum of a column value over multiple rows is less than some value (possibly in another table).

    * Inter-table constraints: For example, if a value in one table relies on the value in another. This might also be written as a functions-based CHECK constraint, but it is often more maintainable to use a trigger.

    * Introducing desired side effects to your queries: For example, cascading inserts, maintaining denormalized data, and so on.

    (and I give examples of each.) but I also am careful to say to be extremely careful with them. When my tech reviewer got to the start of my section on triggers, he quickly said that he didn't like them and they shouldn't be used.  By the end he agreed with my examples.  The problem with triggers is just like the problem with several tools that SQL Server gives you.  For example:

    * Stored procedures: If used to encapsulate set based SQL calls into precompiled batches...good.  Used as a functional language to do work row by row, often with cursors...not so good.  Used to implement lots of business rules...well, that depends on the business rules but often this is where the real trouble comes (that is a future post)

    * Cursors: Used to do some repetitive task, usually for some maintenance use...good.  Used in place of set based operations because the programmer cannot write good SQL...baaaddd (said properly should sound like a bleating sheep.)

    * Clustered indexes: Used to cluster on the right sort of key (like a small monotonically increasing value)...good.  Used to cluster on a random value, like a guid...not so good.  Never used as a search argument of any kind, forcing bookmark lookups constantly....well, yuck.

    So just like you don't hammer in a nail with a wrench, or hammer in a screw with any implement, triggers are not to be used improperly.  The problem is that of education.  There are so many people out there who just do what it is they feel like without regards for what is actually correct.  SQL programmers try to do everything in SQL (I have been guilty of this many times) and functional programmers try to do everything one operation at a time.  Some day, once we all trust the SQLCLR to the extent we ought to, perhaps the exact balance can be achieved.

    And Glenn, I completely agree with your reasons to hate triggers: "I witnessed first hand the consequences of over-use of DML triggers in a large, n-tier client/server application."  I think the important thing in there is the word "over-use."  I mean, a person could probably get fat eating only celery and drinking water if they over did that.


  • See you at Tech Ed? Or other places?

    Next week is Tech Ed Developers, and I will be there working at the OLTP demo station from:

    Tuesday 11:45 – 2:45
    Wednesday 2:30 – 6:00
    Thursday 11:45 – 2:45
    Friday 11:45 – 2:45

    All times Eastern Daylight Saving Time and a little bit approximate at this point.  Please stop by and say howdy!  I don't exactly yet know what I will be doing completely, but it will be fun, never the less.

    After that I have a few other speaking engagements coming up:

    Nashville SQL User's Group - Late Summer - Doing some session in concert with my book release (hopefully doing some special stuff that time too.  Stay tuned for that announcement)
    Devlink (www.devlink.net) - August 22, 23 - Doing a double session on Database Design with Paul Nielsen
    PASS (www.sqlpass.org) - Nov 19-21 - Doing the same session with Paul, just condensed into a single session

    I will be back in the forums soon, and on this blog and my personal website (drsql.org) too as the principal writing for the book will be finished tomorrow, and rewrites in a week or so.  After that, it is a downhill journey to the finish line.

    So come by next week and let me know you are reading the blog, book, or whatever. 

    (cross-posted to drsql.spaces.live.com)


  • Sequence Table Tricks

    Ok, so I am writing about the kinds of things you can do with a sequence table, and I have built the table, I have the typical kinds of things planned (like splitting a string, and for the next section, loading a calendar table) but I wanted to do something interesting.  And frankly, sometimes the path to solving a real problem starts with with solving an abstract problem, then reality often tosses you a problem that is really close to this.  Admittedly this might not be realistic in my case, but it is possible.

    I have just recently watched the Futurama video "Bender's Big Score" and they have a little math lesson section on the video that was very interesting, but the thing that stuck in my mind was this reference back to a previous episode called the "Lesser of Two Evils".  The Bender look-alike named Flexo (they are both Bender units) start talking and have the following exchange:

    Bender: Hey, brobot, what's your serial number?
    Flexo: 3370318.
    Bender: No way! Mine's 2716057!
    Fry: I don't get it.
    Bender: We're both expressible as the sum of two cubes!

    So I figured, the sum of two cubes would be an interesting, and pretty easy abstract utilization of the sequence table.  Then I have found this reference also to "taxicab" numbers, where the goal is to discover the smallest value that can be expressed as the sum of three cubes in N different ways.

    How hard is the query? Turns out, that once you have a sequence table with numbers from 1 to 100000 or so, you can calculate that Taxicab(2) = 1729 very easily (and all of the other numbers that are the sum of two cubes too), and the the sum of two cubes in three different ways also pretty easily (took 3 seconds on my laptop, and that value is 87539319). 

    Here is the code:

    declare @level int

    set @level = 2 --sum of two cubes

    ;with cubes as
    (select POWER(i,3) as i3
    from   tools.sequence
    where  i >= 1 and i < 500) --<<<Vary for performance, and for cheating reasons, max needed value

    select c1.i3 + c2.i3 as [sum of 2 cubes in N Ways]
    from   cubes as c1
             cross join cubes as c2
    where c1.i3 <= c2.i3
    group by (c1.i3 + c2.i3)
    having count(*) = @level
    order by 1

    Ok, breaking this down the cubes CTE is pretty simple:

    (select power(i,3) as i3
    from   tools.sequence  --the table holds 100000 values
    where  i >= 1 and i < 500)

    This transforms our values to a table of cubes, so the values would be 1, 8, 27, 64, etc.  The query is a bit more interesting. We want the lowest value that meets the criteria that present in a second, so top is used.  I sum the two cube values, which I get from cross joining the CTE twice.

    select c1.i3 + c2.i3 as [sum of 2 cubes in N Ways]
    from   cubes as c1
                    cross join cubes as c2
    where c1.i3 <= c2.i3 --this gets rid of the "duplicate" value pairs

    The where condition of c1.i3 <= c2.i3 gets rid of the "duplicate" value pairs since c1 and c2 have the same values, so without this, for 1729 you would get:

    c1.i3                 c2.i3
    -------------------- --------------------
    1                    1728
    729                  1000

    1000                 729
    1728                 1

    These pairs are the same.  I don't eliminate equality to allow for the case where both number are equal, because they won't be doubled up.  With these values:

    c1.i3                 c2.i3
    -------------------- --------------------
    1                    1728
    729                  1000

    You can see that 1729 is the sum of two cubes in two different ways.  So, lastly, the question of performance must come up.  Reading the articles, it is clear that this is not a terribly easy problem to solve.  Values for the sum of three cubes is fairly simple, leaving the sequence values bounded at 500, I get two values in around one second.

    [sum of 2 cubes in N Ways]
    ---------------------------------
    87539319
    119824488

    Four however, was a "bit" more challenging.  Knowing the answer from the article, I knew I could bound my numbers using 20000 and get the answer.  Using this "cheat" on my laptop, I was able to calculate the value of taxicab(4) was 6963472309248 (yea, it only found one) in just 1 hour and 33 minutes, this on a 2.2 Ghz Pentium M laptop with 2 GB of RAM. I tried calculating taxicab(5), but alas, I ran out of space for tempdb (and I had 50 GB available.)  For that you had to go up to i being greater than something like 350000....

    I am thinking that if I turn the CTE into a physical, indexed table I would be able to do this.  And for 6, perhaps using the precision of the numeric datatype (38 places!), but not today, I don't want to melt my computer trying something this abstract.  Well at least not just yet, perhaps when the book is done...


  • 2008: Error List in 2008

    Ok, so I was looking around for a blurb about plan guides for my anti ad hoc SQL section of my book (ok, maybe not completely anti- but that isn't the point,) when I found this blog: http://geekswithblogs.net/Sreeblog/articles/117576.aspx that basically just lists some new features in 2008.  One that I hadn't seen I feel the need to mention. 

    Transact-SQL Error List Window:
    SQL Server Management Studio includes an Error List window that displays the syntax and semantic errors generated from the IntelliSense code in the Transact-SQL Query Editor.

    If you haven't seen this, it is really kind of neat.  Say you type:

    select *
    from sys.object
    whe re name = 'fred'

    You can see the obvious errors, since this is a very small batch, but IntelliSense underlines sys.object and re in the batch.  Go to the View menu and show the Error List, and you will see something like this:

    image 

    Double-click on the error, it takes you to the error in the query window.  Obviously I have just discovered this, so please comment if you know more about this topic (particularly if it has been helpful to you or not) please chime in.  And thanks to Sreenivas Mogullapalli for the cool post, it had a few other things I didn't remember too (just don't make me have to pronounce your name...I am pretty sure I wouldn't get it right.)

  • 2008: Declaring and instantiating a value

    Ok, I admit it.  Sometimes the least important things are the most fun.  As I try to get my blog back up and kicking again after a few months of holiday fun coupled with some dreary personal life things (a death in the family and lots of sickness/busyness, mostly,) I felt the need to write about another little time saving feature that you might not have heard of. Ever if you have it is still cool.

    This topic is declaring and instantiating a value in a single statement.  So what used to be:

    DECLARE @i int
    SET @i = 1

    Can now be:

    DECLARE @i int = 1

    Ho hum, I had thought earlier when I first saw this.  You know, it saves me 3 keystrokes. But today, I was doing some writing in my book and I realized that it isn't just limited to literals (it had just never crossed my mind) so when I was creating my savepoint names for nested savepoints.  You can use literals and functions, whatever you need.

    DECLARE @savepoint nvarchar(128) = cast(object_name(@@procid) AS nvarchar(125)) + cast(@@nestlevel AS nvarchar(3))

    All in one line of code...clean, neat.  And when I was writing a trigger that I could do this:

    DECLARE @rowsAffected int = @@rowcount, --stores the number of rows affected
                   @msg varchar(2000) = '' --used to hold the error message

    And save a couple of lines of code, and now, how much cleaner and effective is this code?  Not only does it save two lines of code to do the SET (or you could use SELECT), but it saves lines of whitespace too. 

    Will this save a lot of time?  Probably not, but it is one of those long desired features that we SQL Server programmers are so glad to be getting.  In my case it is going to cost me time because I have to go back and re-edit places where I forgot about this syntax and didn't use it in my new book, which is one of the worst parts of writing.  When you mess up and forget something that is new and more or less essential, you can have to edit large amounts of code/text.

     The comment from Steve got me thinking if you could use a query to instatiate the value.   Survey says:

     create table fred
    (
        value   char(1)
    )
    go
    insert into fred
    values ('a'),('b'),('c')
    go
       
    declare @value char(1) = (select MAX(value) from fred)

    select @value

     YES! This returns 'c', just as you would expect.  Okay, so now this is even cooler.  It doesn't work with table types though. And since we are feature complete, I assume it probably won't (I didn't expect it too, but it never hurts to try.)


  • 2008: Rebuilding a Heap

    In 2005, rebuilding a table that was a heap (no clustered index) wasn't easy.  You could copy it to a different table, or you could add a clustered index and then drop it.

    In 2008, this is a far easier thing to do.  They have added to the ALTER TABLE command a method to rebuild the table, which is the same as rebuilding the clustered index for a clustered table, but for a HEAP, it is the only way to go. 

    In the following code sample, I create a heap, load it will "little data", then expand every row to much larger values.  The result is a little bit of fragmentation, but more than that, tons of forwarding pointers (when a row won't fit on the same heap page, it gets moved to a different page, but the pointer to the row does not change in the indexes.)  Rebuilding the heap is now really simple:

    create table heapDemo
    (  
        value varchar(1000)
    )
    GO
    set nocount on
    insert into heapDemo
    select 'hi'
    go 10000
    --Expand the values to 500 times the size they were
    update heapDemo
    set value = replicate('hi',500)

    Now, check the stats of the table (using the index stats dmv, no less)

    select index_type_desc, fragment_count, page_count, forwarded_record_count 
    from sys.dm_db_index_physical_stats(db_id(),default,default,default,'DETAILED')
    where object_id = object_id('heapDemo')

    This returns:

    index_type_desc       fragment_count       page_count           forwarded_record_count
    --------------------- -------------------- -------------------- ----------------------
    HEAP                  6                    1443                 9961

    Now, you can rebuild the heap with the command:

    alter table heapDemo rebuild

    Check the values now:

    select index_type_desc, fragment_count, page_count, forwarded_record_count
    from sys.dm_db_index_physical_stats(db_id(),default,default,default,'DETAILED')
    where object_id = object_id('heapDemo')

    This shows no that the forwarded_record_count is 0, which is the idea situation for your heap.

    index_type_desc       fragment_count       page_count           forwarded_record_count
    --------------------- -------------------- -------------------- ----------------------
    HEAP                  3                    1440                 0

    Nice new addition!


  • 2008: Initializing Table Data with Row Constructors

    Well, I am just discovering this feature, mostly because I never saw it demoed at any of the sessions I have attended so far on 2008.  Not that it was kept particularly hidden, I have seen the title before, but I hadn't tried it out, or seen the depth that they have "finally" implemented.

    Tonight, I am working on my chapter where I create some tables as part of a big example, and I had the code from the 2005 version of the book (and I add the primary key to the table later in the book, as well as other constraints, so don't judge me!):

    CREATE TABLE Inventory.MovieRating (
           MovieRatingId             int NOT NULL,
           Code                           varchar(20) NOT NULL,
           Description                 varchar(200) NULL,
           AllowYouthRentalFlag bit NOT NULL
    )
    GO

    INSERT INTO Inventory.MovieRating (MovieRatingId, Code, Description, AllowYouthRentalFlag)
    VALUES (0, 'UR','Unrated',1)
    INSERT INTO Inventory.MovieRating (MovieRatingId, Code, Description, AllowYouthRentalFlag)
    VALUE   (1, 'G','General Audiences',1),
    INSERT INTO Inventory.MovieRating (MovieRatingId, Code, Description, AllowYouthRentalFlag)
    VALUES (2, 'PG','Parental Guidance',1),
    INSERT INTO Inventory.MovieRating(MovieRatingId, Code, Description, AllowYouthRentalFlag)
    VALUES (3, 'PG-13','Parental Guidance for Children Under 13',1),
    INSERT INTO Inventory.MovieRating(MovieRatingId, Code, Description, AllowYouthRentalFlag)
    VALUES (4, 'R','Restricted, No Children Under 17 without Parent',0)

    (Another variety is to use:

    INSERT INTO Inventory.MovieRating (MovieRatingId, Code, Description, AllowYouthRentalFlag)
    SELECT 0, 'UR','Unrated',1
    UNION ALL
    SELECT 1, 'G','General Audiences',1
    UNION ALL
    SELECT 3, 'PG-13','Parental Guidance for Children Under 13',1
    UNION ALL
    SELECT 4, 'R','Restricted, No Children Under 17 without Parent',0

    But that is not that much better (certainly a little better).  I felt for the book that using VALUES was the more "proper" way to do it.  However, now, in the 2008 edition, I obviously have to change all of the code to use the latest and greatest syntax, so I use row constructors, and this turns into:

    INSERT INTO Inventory.MovieRating (MovieRatingId, Code, Description, AllowYouthRentalFlag)
    VALUES (0, 'UR','Unrated',1),
           (1, 'G','General Audiences',1),
           (2, 'PG','Parental Guidance',1),
           (3, 'PG-13','Parental Guidance for Children Under 13',1),
           (4, 'R','Restricted, No Children Under 17 without Parent',0)

    And it just strikes me at how...simple this is, and how readable this is.  I know I have typed the UNION ALL stuff hundreds, possibly thousands (well hundreds more like) of times in the forums over the past year and a half dummying up data for someone who was asking for help but couldn't take the time to do it for us.


  • Changing the owner of a database

    Tonight, as I was creating my sample database for my chapter on implementing the database, I learned something new, that existed in 2005.  I had always used sp_changedbowner to change the owner of a database, but I was reading in another section about ALTER AUTHORIZATION earlier to change the owner of an object in a schema, so I said to my self "Louis" (I call myself that) "Louis, wonder if ALTER AUTHORIZATION works on other stuff?"

    Bizarre and surreal conversations with myself aside, the answer was Yes, you sure could.  The syntax is:

    ALTER AUTHORIZATION ON Database::<databaseName> TO <serverPrincipal>

    To demonstrate, first, I will create a login:

    create login test with password = 'like, password, dude'
    go

    Then a database just taking all of the defaults.  Setting an owner is not a part of the CREATE DATABASE syntax anyhow:

    create database showChangeOwner
    go

    Then, check the owner:

    select SUSER_SNAME(owner_sid)
    from   sys.databases
    where  name = 'showChangeOwner'

    This will return the login that you resolve to when you created the database. 

    MYDOMAIN\LBDAVI

    Next, run the following statement to change the owner:

    ALTER AUTHORIZATION ON Database::showChangeOwner TO test

    Then run the previous statement to see that the database is now owned by login: Test .

    As an aside, if you use EXECUTE AS to change context to a different login, any database you create will be created as the user you are EXECUTING AS, so this could technically be the way to set the owner at creation time.  I will give the test login rights to create a database then execute as it. (My new favorite permission is "create any database".  I looked, but despite the connotation, there is not a "create almost any database", "create just this one database", or even "create database named fred" rights.

    grant create any database to test
    execute as login = 'test'
    create database showChangeOwner2

    Checking the owner again:

    select SUSER_SNAME(owner_sid)
    from   sys.databases
    where  name = 'showChangeOwner2'

    Will show that this is owned by user "test". ALTER AUTHORIZATION will actually let you change the owner of quite a few different things.  For more information, please visit your public library...wait, no this is 2007, not 1970...  Check here in books online.

    Don't forget to clean up!

    drop database showChangeOwner, showChangeOwner2

    (did you know you could do that?  Saved me having to type drop database two times.  Wow, the time I saved by not typing drop database.  Wait, I typed drop database even more times after that.  And yes, this obvious lack of material is why I haven't been blogging of late...)


  • sys.dm_exec_xml_handles

    d Gives you information about any sessions that are using xml documents, including stats about the XML document and usage.  You can also get the statements that were used to create the XML handle using sys.dm_exec_sql_text to get the sql.

    Type: Function
    Parameter: session_id
    Data: Snapshot, values based on current reality
    Columns:

    • session_id - the session_id of the user who is using the sp_xml_preparedocument command to use the XML datatype
    • document_id – handle created for the document
    • namespace_document_id – you can use sp_xml_preparedocument to declare a document that is  namespace, then use it for subsequent calls. This value will have the value of a document_id, or a NULL if there is not one declared
    • sql_handle – relates to sys.dm_exec_sql_text to get the T-SQL of the query
    • statement_start_offset - the starting point in the T-SQL query object that is currently executing (note that these values are double the expected values due to unicode values. You will have to divide by 2 when using them in equations. For more information, see examples)
    • statement_end_offset - the ending point in the T-SQL query object that is currently executing
    • creation_time - the time the handle was created
    • original_document_size_bytes – size of the original text of the XML document
    • original_namespace_document_size_bytes - size of the original text of the XML namespace document
    • num_openxml_calls - Number of times the document has been used in OPENXML calls using this handle
    • row_count - Number of rows that have been returned using this handle in OPENXML calls
    • dormant_duration_ms - the amount of time (in milliseconds) since the last use of the handle in an OPENXML call

    Example:

    DECLARE @idoc int
    DECLARE @doc varchar(1000)

    SET @doc ='
    <root>
    <person firstName="barney" lastName="rubble"/>
    <person firstName="fred" lastName="flintstone"/>
    </root>'

    --Create an internal representation of the XML document.
    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    -- Execute a SELECT statement that uses the OPENXML rowset provider.
    SELECT    *
    FROM       OPENXML (@idoc, '/root/person',1)
                         WITH (firstName varchar(20),
                                   lastName varchar(20))

    Then you can see the information for this document using:

    select *, text, substring(text,statement_start_offset /2,
                1+ statement_end_offset/2 - statement_start_offset/2),
                statement_start_offset,statement_end_offset
    from   sys.dm_exec_xml_handles(null)
                   cross apply sys.dm_exec_sql_text(sql_handle)

    Note:  This is part of an ongoing project to write a book about all of the dynamic management views for Red-Gate. It will be freely distributable once complete as an ebook. Any examples, suggestions, corrections are desired and will certainly be mentioned in the final book.  Also, I have begun maintaining the following web page once the book is closer to completion: http://drsql.org/dmvbook.aspx, including a list of resources I have used to write the book so far.


  • Normalization's other little side effect...

    It isn't that I don't like a challenge, really it isn't.  I like puzzles, mazes, Suduko, video games with challenging levels where I have to really think about a problem to solve it.  So why don't I like poorly designed databases, where finding a value that you want can be just as challenging, or maybe more?  Glad you asked (or maybe you didn't, what do I care, this is my blog :)   And for any of you who might be asking: "Shouldn't I be able to use SQL Server like I want to?"  It is a good thing you didn't actually ask that out loud, or I would have to ban you :)

    When someone creates a puzzle, maze, or video game, they design in the challenge purposefully.  Itzik Ben-Gan frequently sets up SQL puzzles on SQL Server Magazine that can be kind of fun.  But when someone builds a poorly designed database, then don't leave you breadcrumbs to find your way around.  Quite the opposite.  If they had spent time trying to think about how other people might get around in their database, they would have naturally started normalizing.   No, usually a poorly designed database is an act of "selfishness" or "ignorance" by a person who is not thinking at all.  Or even worse, was.

    This all comes up because I spent a good amount of my week this week digging through a database. First I would identify some data, go back to some other people with some knowledge of the data, have a meeting, then go back and try again.   What should have been a quick, painless task to identify 20 columns from this database took much longer than necessary.  Not 100s of tables worth, no, just a few measly columns.  So what was wrong?

    • Columns that had meaning in one row, but not another
    • Columns named FieldName1, FieldName2...
    • Two related tables that were not related by key, but by a concatenation of columns
    • One database per day of activity, not a key that denotes a different day
    • Data with embedded values, and not always the same format. 
    • Domain value with no apparent meaning

    I could go on, but I would just be making stuff up.  And frankly this is enough mess for a Sunday afternoon.  The problem is, had someone taken the time to design this database using proper normalization techniques, none of these problems would be evident.  So what is the side effect?

    Documentation

    Not that it is impossible to build an incomprehensible normalized database (lots of companies do stuff to make it hard for competitors to understand their data (like having column names be non-sensical without their data dictionary).  But the problem here is that yet another person spent time creating a "general purpose" database.  It never ceases to amaze me the lengths that people will go to never change tables in SQL.  So they do all of this nasty mapping in their code. 

    But you know what.  It turns out that SQL Server has a lot of really cool stuff that lets you customize data storage.  For example:

    ALTER TABLE allows you to add columns

    CREATE TABLE allows you to create tables

    sp_addextendedproperty allows you to add documentation to these properties

    Note that I am not necessarily suggesting that the answer to all "open schema" type problems necessarily should (or even could) be solved by simply adding to the schema.  In many cases you would not want that at all.  But in this case, I am not talking about that situation.  Why?  Because the code had to change...

    If the code that accesses the data needs to change based on the structure, extend the data structures.  It will save you code if your database is cleanly created and normalized.  Use SQL to do the job it was made for.  Then, it will be easy to go to one table that has a name that matches what I think I want, see what it is used for, then go to related tables, and to their related tables and so on.  Legal values for a column will be documented and checked.  Names given to columns will reflect their meaning.  You might even have descriptions stored in extended properties.

    And then a job that took a week could have taken an hour.  (On the other hand, I get another anecdote for my presentation next weekend at devlink (www.devlink.com) and for my next book.  And this blog.  Maybe I do like poorly normalized databases...Nah!)


  • sys.dm_exec_sql_text

    (Edit: Was reading Adam's book tonight and discovered you can pass a plan handle to this object.  Very interesting!)

    This dynamic management object returns the SQL that was saved when a query was executed. This is a very exciting and useful thing to have, as there are dynamic management views that you will be able to use to get statistics for a query, as well as the full SQL for any actively executing query.

    This is a tremendous leap ahead of the functionality that was available via the rather limited view available from DBCC INPUTBUFFER, where we could only get the first 256 characters. Be careful on active servers with very large queries as this can return a lot of data.

    Type: Function

    Parameter:

    • handle (either one of the following types)
      • sql_handle – ( which can be retrieved from sys.dm_exec_query_stats, sys.dm_exec_requests, sys.dm_exec_cursors, sys.dm_exec_xml_handles, sys.dm_exec_query_memory_grants)
      • plan_handle - ( which can be retrieved sys.dm_exec_requests, sys.dm_exec_query_memory_grants, sys.dm_exec_query_stats, sys.dm_exec_cached_plan)

    Data: Snapshot, values based on current reality. Note that rows can be reset by running DBCC FREEPOCCACHE, removing all rows from the query cache.

    Columns:

    • Grouping: Object Reference If the query is executing an object, the following three columns will give you the pointer to the object. They will be NULL when you are executing a batch of SQL Statements.
      • dbid - surrogate key for the database, if applicable. (Note: this is usually database_id in other objects. Look for this to change in a following version)
      • objectid - surrogate key for the object in a database, if applicable. (Note: this is usually object_id in other objects. Look for this to change in a following version)
      • number – for stored procedures, can be the number for grouped procedures. Proc;1, Proc;2. Rarely used.
    • encrypted – 1 if plan is encrypted, which will prevent viewing of the query text. 0 otherwise.
    • text – The text of the query, unless the object is encrypted, in which case it will be NULL

    Example:

    See the query you are executing:

    select dest.*
    from  sys.dm_exec_requests as der
                 cross apply sys.dm_exec_sql_text (der.sql_handle) as dest
    where session_id = @@spid

    This will return:

    dbid   objectid    number encrypted
    ------ ----------- ------ ---------
    NULL   NULL        NULL   0

    text
    -----------------------------------------
    select dest.*
    from   sys.dm_exec_requests as der
              cross apply sys.dm_exec_sql_text (der.sql_handle) as dest
    where session_id = @@spid

    Of course, that is kind of a silly query, but it is a repeatable result that you can execute to see how the function works. The sys.dm_exec_sql_text object will actually be of a lot more use when it is used by other dynamic management objects that have a query handle (sys.dm_exec_query_stats, sys.dm_exec_requests, sys.dm_exec_cursors, sys.dm_exec_xml_handles, sys.dm_exec_query_memory_grants)

     

    Note:  This is part of an ongoing project to write a book about all of the dynamic management views for Red-Gate. It will be freely distributable once complete as an ebook. Any examples, suggestions, corrections are desired and will certainly be mentioned in the final book.  Also, I will begin maintaining the following web page once the book is closer to completion: http://drsql.org/dmvbook.aspx.


  • PASS Followup, Technical Edition

    I have already given a rundown of most everything PASS here on my personal blog, including a series of posts with pictures too over the days of PASS) but since this blog is all about things technical with SQL Server, I wanted to just rundown of some of the cool stuff I learned about SQL Server 2008 that I hadn't heard before...

    • You can index a sql_variant (script in this post)- possibly the least important piece of information is the most interesting.  A person asked Dejan Sarka in his session if you could index them, and no one really expected that you could.  Turns out, you can....
    • Spatial Datatypes are going to rock! - Really cool stuff.  (good high level rundown here on the Virtual Earth team blog).  They will support a flat earth model (like you use on a road map) and a round earth (well, ellipsoidal really) model as well, for more precise modeling of large spaces. Initially I thought this was going to be a fringe datatype and while it still might be one day in the future, I saw a lot of ideas I could use this for, even in an OLTP database. 

      For example, if you have the longitude and latitude of the address of a customer/constituent, it is going to be easy to join that point to the shape that represents a time zone, a country, the zip code, a voting district, etc, etc.  Or you could also do it at a zip code level, and join the shape of a zip code to the the other shapes it intersects with.  Of course, just like XML, these values will not always be exactly relational in nature (though you can make a case these types of datatypes represent a scalar value (a shape) while internally they might be made up of tons of points.
    • Intellisense is a coming - Yes, without an add-in, SQL Server 2008 tools will have intellisense.  Can't wait to see just how this works out for them.  Red Gate's version is ok, but it can be kind of clumsy to work with. Might even still be useful to have both...who knows.  The folks at Red-Gate are smart cookies.
    • Analysis Services 2008 will be evolutionary, not revolutionary - This is a good thing really.  The change from 2000 to 2005 was big, huge, major.  The problem was, it was so huge that it was hard to figure out all of the bits and pieces.  Part of this was the tools were version one, and 2008 will take the tools to the next level.  The biggest neat stuff in there surrounded having the engine able to give warnings about how your cubes are designed.  Blue squiggly lines (Donald Farmer claimed this to be a new technical term: "squiggly" in his session on AS 2008) will show you warnings right in the UI, and red ones show errors.  Warnings can be ignored, errors not so much.  Apparently you can use the new tools to modify 2005 cubes too, though you won't be able to use any new functionality that does get added to 2008.
    • T-SQL is making small amounts of progress forward - As a person who basically makes his living on relational database design, and T-SQL, I am a bit disappointed by how little T-SQL is changing.  I am particularly desirous of lots of usability tweaks being done to T-SQL (like why do something about delayed name resolution on procedures or especially CREATE OR ALTER syntax for creating objects).  To me, the biggest wins for DBA could be done at what should be the least cost to the dev team.  Both of the ideas stated would only matter to the compiler, right? So nothing