THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Louis Davidson

  • Best Thing I Learned at PASS Summit

    I have learned many things at every PASS Summit I have attended. First off, all of the people who are speakers are just human beings like you and me (including everyone's favorite Kalen Delaney). I learned that it is really expensive to put on a conference like PASS, and I have especially learned to enjoy six dollar bagels and four dollar cans of coke, while they still exist. And I have learned that speaking in front of 200-2000 people isn't nearly as scary as it initially seemed. I have even learned many things about being a DBA\Database Architect that I might never have learned when I was just digging around books online.

    However, the "best thing" I have learned is how to be a citizen of the SQL Server community. PASS, being a non-profit organization relies very heavily on people to donate their time. Up until this year (I am teaching a post-con with Paul Nielsen), I have never made one cent at the PASS Summit, and last year I spent over a thousand of my own dollars to attend. Why? Because I believe in the SQL Server community that PASS represents. During the time I have spent in the PASS community I have made lots of great friends, and I wouldn't trade any of them or even one moment I have spent at any Summit for anything.

    I hope to meet you there this year adding to the community!


  • Read/Write Ratio versus Read/Write Ratio?

    So, Jason Massie (http://twitter.com/statisticsio) had a blog post (http://statisticsio.com/Home/tabid/36/articleType/ArticleView/articleId/324/Is-8020-a-90rsquos-Estimate.aspx) that Kendal Van Dyke (http://twitter.com/sqldba) alerted me to at SQL Saturday in Atlanta .  It does this by taking the number times an index was used for a read operation versus update operations using sys.dm_db_index_usage_stats.  Coincidentally, I was working on a chapter in a book about DMVs and working on a section on this very subject.  Initially I did my calculations solely it in a much different manner altogether just thinking about the amount of data moved to and from the physical file using sys.dm_io_virtual_file_stats.

    After reading his post, I added not only a look using the index usage stats DMV, but also using sys.dm_io_virtual_file_stats as well.  My findings on my server was actually pretty interesting.  I found that while in operations my read/write ratio could be 50/50 at times, when looking at magnitude of data moved in and out of the server the ratio could be 95/5 in favor of data read.  This was clearly a clue that something isn’t quite right with our indexing or our data utilization (this is a very active server, mostly with batch/webservice OLTP operations through CRM!)

    In the book, I do more breakdown of drives/objects, etc, but at a high level, I find that this is a pretty interesting comparison to look at how the read/write ratio.  For this blog entry I am just including the the following three queries, which I did go back and change to exclude log files based on some of the very interesting discussion in the comments.

    Note too that sys.dm_io_virtual_file_stats could include things like backups and sys.dm_db_index_usage_stats could include maintenance too so it is highly expected that to get the most useful numbers that you employ some technique to capture numbers periodically and you exclude certain times of day when you might be doing non-standard work.  What this means to you may be different to every company.  For example, you may only want to know the read/write characteristics during steady working hours, and not overnight when nightly processes are being run.  Or you might be tuning the backup window and doing backups are you main concern.  These three queries could open your mind to lots of different ideas of how to use these DMV objects to tune your system.

    --uses a like comparison to only include databases you desire
    DECLARE @databaseName SYSNAME, @excludeLogFilesFlag bit
    SET @databaseName = '%' --'%' gives all databases
    SET @excludeLogFilesFlag = 1 --excludes
     log files by default because they would not be considered in the sys.dm_db_index_usage_stats numbers

    SELECT 'Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats'
    SELECT  CAST(SUM(num_of_bytes_read) AS DECIMAL)
            / ( CAST(SUM(num_of_bytes_written) AS DECIMAL)
                + CAST(SUM(num_of_bytes_read) AS DECIMAL) )
                                       As RatioOfReads,
            CAST(SUM(num_of_bytes_written) AS DECIMAL)
            / ( CAST(SUM(num_of_bytes_written) AS DECIMAL)
                + CAST(SUM(num_of_bytes_read) AS DECIMAL) )
                                       AS RatioOfWrites,
            SUM(num_of_bytes_read) as TotalBytesRead,
             SUM(num_of_bytes_written) as TotalBytesWritten
    FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs
             join sys.master_files mf
                on mf.database_id = divfs.database_id
                   and mf.file_id = divfs.file_id
    WHERE   DB_NAME(divfs.database_id) LIKE @databaseName
      and   (mf.type_desc <> 'LOG' or @excludeLogFilesFlag = 0)

    SELECT 'Ratio bases on numbers of operations-sys.dm_io_virtual_file_stats'
    SELECT  CAST(SUM(num_of_reads) AS DECIMAL)
            / ( CAST(SUM(num_of_writes) AS DECIMAL)
                + CAST(SUM(num_of_reads) AS DECIMAL) )
                                       As RatioOfReads,

            CAST(SUM(num_of_writes) AS DECIMAL)
            / ( CAST(SUM(num_of_reads) AS DECIMAL)
                + CAST(SUM(num_of_writes) AS DECIMAL) )
                                       AS RatioOfWrites,
            SUM(num_of_reads) as TotalReadOperations,
             SUM(num_of_writes) as TotalWriteOperations
    FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs
             join sys.master_files mf
                on mf.database_id = divfs.database_id
                   and mf.file_id = divfs.file_id
    WHERE   DB_NAME(divfs.database_id) LIKE @databaseName
      and   (mf.type_desc <> 'LOG' or @excludeLogFilesFlag = 0)

    SELECT 'Ratio bases on numbers of operations - sys.dm_db_index_usage_stats'
    SELECT  case when (SUM(user_updates + user_seeks
                        + user_scans + user_lookups) = 0)
                   then NULL
                 else (
            CAST(SUM(user_seeks + user_scans + user_lookups)
                                                   AS DECIMAL)
            / CAST(SUM(user_updates + user_seeks
                        + user_scans + user_lookups)
                                                  AS DECIMAL)
                      ) end
                                               AS RatioOfReads,

            case when (SUM(user_updates + user_seeks
                        + user_scans + user_lookups) = 0)
                     then NULL
                 else (

            CAST(SUM(user_updates) AS DECIMAL)
            / CAST(SUM(user_updates + user_seeks
                       + user_scans + user_lookups) AS DECIMAL)
                      ) end
                                              AS RatioOfWrites,
           SUM(user_updates + user_seeks
                        + user_scans + user_lookups) as
                                           TotalReadOperations,
           SUM(user_updates) as TotalWriteOperations
    FROM    sys.dm_db_index_usage_stats AS ddius
    WHERE   DB_NAME(database_id) LIKE @databaseName


  • SQL Quiz: Gilligan’s Island

    Tagged, I was by the Rambling SQLSarg, Jonathan Kehayias, to answer the following question:

    So You’re On A Deserted Island With WiFi and you’re still on the clock at work. Okay, so not a very good situational exercise here, but let’s roll with it; we’ll call it a virtual deserted island. Perhaps what I should simply ask is if you had a month without any walk-up work, no projects due, no performance issues that require you to devote time from anything other than a wishlist of items you’ve been wanting to get accomplished at work but keep getting pulled away from I ask this question: what would be the top items that would get your attention?

    The funny part is that I have almost always been isolated like this at any of my jobs, unless a big project was underway and I was the roadblock that other people had to get their part of the project done.  I have at times had teammates that I worked with closely, one of them was an exceptional dba that had only been a dba for a year or two, and the others were more project managery, and once I worked with a UI guy who I worked with to build an excellent UI that was generated from a call to a stored procedure to get the metadata for the object.

    So often I actually do this, planning what I think would be the best way to implement our current systems or future ones and how best they would be could be built using my own personal ways of doing stuff. Ironically, more often than not, it isn’t feasible to do in my own little world, but it becomes good advice for others (hopefully who don’t have large amounts of legacy code piled up to deal with.  In my real world, reality becomes that we have meetings and meetings to discuss and committee things until we eventually fix the legacy stuff enough to go for a good long time.  Sadly reality has a lot to do with money, available resources (human and computer), etc whereas in the mind of an architect, the only barrier during the pure design part of a project is physics. 

    For example, take the following two designs of a house that an house architect might think up (perhaps after taking some oxycodone for a back (or hip) problem:

    image

    Clearly the idea in B would be nixed because even an architect knows enough about physics to know that this would tip over, where as A would be much safer.  (These images should also explain why I did not finish my degree in Civil Engineering.)

    So, assuming that I am on this deserted island alone, with what would I research? Well, let’s just use a Genie clause and ignore the fact that just like Genie wouldn’t allow Aladin to wish for more wishes, I won’t state the obvious that I would study up on boat making and praying for rescue. And even assuming that I knew that someone was coming back for me (hopefully not the crazy Dr. Boris Balinkoff coming to change my brain with a duck), I would spend a good amount of my off time looking for something to surf with/something to avoid sunburn, and probably something good for lunch…

    1. A really good explanation for Fifth Normal Form that didn’t include mind numbing technical language that makes you brain hurt.

    I get the basic idea behind the fifth normal form, but unlike even the fourth normal form, I can’t come up with an explanation that isn’t basically the same as how your mom and dad explained why not to mess with the little red haired girl with the bad reputation.  “Don’t violate this rule, or else bad things will occur” 

    “Bad things?”, you ask.

    “Like babies, and shotguns” but at this point, you aren’t really sure how pulling the hair of the this little girl will really hurt things. And every explanation of the entire problem you can find on the internet (with safe search turned on!) is like a college course in anatomy. You of course simply know you will never do that so you ignore it.

    All explanations for 5th Normal Form are either of the “don’t do it”, “doesn’t matter”, “or 30 pages of terse text with lots of formulas that look nothing like the CREATE TABLE statement you have grown to love over the years. 

    2. Work on/Design/Create a data access layer tool that allows you to set up the objects that you want and generates procedures and functional code

    Look, I understand your pain programmers.  You like things to be easy.  You don’t like to write a bunch of “code” to solve problems.  You like it when some tool dictates to you how things are going to work internal to your programs and you want to just let it happen.  We relational types are not really like that.  Sure, most compiled code nowadays will either run in 10 or 100 ns depending on how you write it, and no one really cares about 90 ns unless they are building a website along the scale of ESPN or MySpace. But in the relational world, we are dealing with fetching data from physical storage which is inherently s l o w. It gets better and better every year, but most companies react to this increase in performance by increasing their desire to store data at a pace that is slightly faster than the improvement in technology.

    However, there are still a majority of companies out there that could still be using SQL Server 1.0 on 16 bit OS/2 (okay, I am exaggerating, maybe SQL 6.0) that don’t care so much about performance and just want to get programming done fast.  Tools often cater to these entities and not to to the larger systems needs.  As relational types, a majority of us constantly bellow “stored procedures…stored procedures” on the corner like the town crier.  The problem is, tools generally want to ignore this layer, primarily for time and programmer experience.  Why write SQL code when you can let the tool do it?

    But if a tool can generate SQL, why can’t an object layer be built that automatically creates procedures? And like any generated code, if you don’t monkey with it, the generator can create a new version as you need it.  If you do have to make changes, likely to use some query technique that is not easily automated, the code generator just throws up it’s hands and says “I am trusting you brother” (or sister.)

    Maybe then the topic of stored procs could finally be retired and we could just leave it alone.  This tool, the best on the market, gives you both in a manner that is EASY to do, FAST to build, and POWERFUL enough to do extremely complex configurations…

    3.  Getting better at data warehousing.

    I have designed and implemented a data warehouse, including ETL (both with the assistance of my friend and fellow Data Architect, Frank Castora). It was quite the mind bending experience, as normalization becomes a “bad thing” and denormalizaiton is a “good thing”.  From 15 years of OLTP development and writing, this felt like a sin that was going to end up with me getting a pitchfork to the tush for eternity.  But as the project went on, it started to feel natural and the dimensionally modeled tables stored in the SQL database started to feel semi-natural and really started to work for me.  But there were quite a few things that we did “wrong” as well as many that we did “not really wrong but not exactly correct either” that had to do with different scenarios that didn’t fit the examples that we could find on the web. 

    Add to that that Analysis Services and the language MDX both were quite foreign to me that the project didn’t exactly “fail”, but it certainly didn’t make the “succeed” column either.  The project is more or less ongoing yet in a pattern of stasis that Dave Lister (not the guy I work with but the guy from Red Dwarf who was frozen for a million years only to wake up with everybody gone except for electronic holograms and computers) would appreciate. If I knew more about all of the processes/tools, I could go far towards making things better and getting things straight.

     

    So, whom would I like to see deserted on an island? Um, I mean, whom would I like to see how they would respond to this challenge… hmm  Paul Nielsen doesn’t have anything to do these days (ha!), and Allen White is a very interesting guy who I would like to see his feelings too.


  • What is a physical database?

    A bit of terminology that gets beaten to death is that of the “physical” database.  I would think most every DBA uses this term (I do), but…to mean what?  I think there are two common utilizations:

    1. The layer of tables, constraints, indexes, etc used to store data
    2. The actual on-disk structures.

    Frankly, until 3 years ago, I used the first interpretation.  However, I was beaten up pretty badly by a few people whom I don’t really remember (I think Anith Sen was one of them.)  The problem is, I was scolded, “physical” already had a meaning, given it by the “founder” himself, EF Codd.

    So, checking his 12 Rules, Codd stated the following two things:

    Rule 8: Physical data independence:

    Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure.

    Rule 9: Logical data independence:

    Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

    And actually, the implementation layer really is the logical model if you follow his terminology since his rules were pertaining to the relational model and not the entire design process.  This article says it better than I can in a long blog, but I am not sure about that URL (mac.com?):

    http://homepage.mac.com/s_lott/iblog/architecture/C465799452/E20080301143528/

    The physical layer of a relational database occurs down at the file system level.  Codd's "Rule 8" (Physical Data Independence) says that the things we're designing in ERwin (and similar tools) are the things our application depends on.  These are not physical in nature, but are the relational implementation. 

    So the thing I am trying to say is that physical means that a little 5 volt charge is sitting there representing a bit of data in the physical world.  I like the term logical to mean implementation platform non-specific.. The thing in the middle is the SQL Server/Relational  implementation specific model.  It may take liberties to optimize for SQL Server, but it is not physical. That is were partitioning. indexing, filegroups, etc come in. Changes to this layer ought never be noticable by the application. 

    I guess in the comments, I ought to expect a good number of replies that might start to answer the question.  Does it matter? Is it only semantics? Hey if you don’t think semantics matter, I hope that when you find yourself drowning that the person who has the choice of tossing you a life preserver or a sack of door knobs interprets the meaning of your cry for help in the way you intended. You would hate to find yourself at the bottom of a lake thinking “hmm, I wonder why they did that? Did they hate me, of just mis-interpret the meaning of my sentence?"


  • disallow results from triggers

    A setting that I noticed a while back when looking at sys.configurations was disallow results from triggers.  Triggers are one of my favorite subjects, and you will find a lot of good uses of them in my book (triggers are also well named, as poor usage of them will allow you to shoot your foot off in a heartbeat as I discussed in my earlier post: http://sqlblog.com/blogs/louis_davidson/archive/2008/07/13/triggers-evil.aspx).

    One of the main ways you can mess up your system is to have triggers that return results, often by accident, that the client doesn’t know about.  There are three kinds of return values that are interesting:

    • Result sets
    • Raiserror messages
    • Rowcount messages

    Also note that this is an overall system configuration. So it will disallow results from triggers in all of your databases.  It is never a good idea to return data from a trigger, but it is actually pretty useful for testing.  In the following code, I will test this setting and show how it will affect returning data to the client from the aforementioned methods. 

    What will the setting do? It will raise an error if you try to do a result set.  It will automatically suppress rowcount messages (without the need for SET NOCOUNT ON), but will allow error messages (I won’t cover it here, but error messages can cause issues with transactions in triggers, particularly when coupled with TRY…CATCH blocks).

    To demonstrate the setting, I will use tempdb.  Checking the system setting for your server, use:

    SELECT value
    FROM    sys.configurations
    WHERE name = 'disallow results from triggers'

    This will probably return, unless you have discovered the setting before I did, in which case you probably aren’t reading any longer, so there.

    ---------
    0

    To show you the effect of this setting , let's build the following scenario. The trigger will return 3 types of values

    • the contents of the table named triggerResult
    • the rowcount of rows affected from an insert statement like you would normally have in a database
    • the rowcount of the rows from the insert statement into a temp table

    To start, we create 2 tables, one as the “main” table, and another that will hold the results of a side effect causing query:

    --primary test table
    create table triggerResult
    (
        triggerResultId int primary key
    )   
    --holds our side effect to prove the trigger executed
    create table triggerResultSideEffect
    (
        triggerResultId int
    )

    Then we will create the trigger that gives us several different types of output

    create trigger triggerResult$insertTrigger
    on triggerResult
    after insert
    as
    begin
        --returns a result set
        select triggerResultId
        from   triggerResult

        --side effect like you might expect in a trigger
        insert into triggerResultSideEffect (triggerResultId)
        select triggerResultId
        from   inserted

        --just to get a fixed rowcount output
        declare @test table (value char(1))
        insert into @test
        values (1)

        --and a couple of errors
        raiserror ('Low',10,1)
        raiserror ('Normal',16,1)

    end
    Now we will test out the trigger by inserting one row into the triggerResult table:

    insert into triggerResult
    values (1)

    This returns (the final rows affected is from the original statement):

    triggerResultId
    ---------------
    1

    (1 row(s) affected)

    (1 row(s) affected)

    (1 row(s) affected)

    Low
    Msg 50000, Level 16, State 1, Procedure triggerResult$insertTrigger, Line 22
    Normal

    (1 row(s) affected)

    Now try a multi-row operation, to see the difference

    insert into triggerResult
    values (2),(3),(4)

    This returns the following (which you can clearly now see the different number of rows that were affected by each statement, 4 from the one that returns all of the rows in the table, 3 from the side effect query that inserts the rows from the inserted table.

    triggerResultId
    ---------------
    1
    2
    3
    4

    (4 row(s) affected)

    (3 row(s) affected)

    (1 row(s) affected)

    Low
    Msg 50000, Level 16, State 1, Procedure triggerResult$insertTrigger, Line 22
    Normal

    (3 row(s) affected)

    Now, change the setting to disallow trigger results (you may need to do allow advanced options)

    exec sp_configure 'show advanced options',1
    RECONFIGURE
    exec sp_configure 'disallow results from triggers',1
    RECONFIGURE

    Now, just to make sure of where our data is, we know that there are 4 rows in the triggerResult table. Let’s check the triggerResultSideEffect table contents:

    SELECT *
    FROM   triggerResultSideEffect
    ORDER  BY triggerResultId

    This returns, showing all of the values we have inserted:

    triggerResultId
    ---------------
    1
    2
    3
    4

    Now, trying to run the statement with the same trigger:

    insert into triggerResult
    values (5)

    This will cause the following error message:

    Msg 524, Level 16, State 1, Procedure triggerResult$insertTrigger, Line 6
    A trigger returned a resultset and the server option 'disallow results from triggers' is true.

    Altering the trigger, just remove the statement that returns a result set:

    alter trigger triggerResult$insertTrigger
    on triggerResult
    after insert
    as
    begin
        ----returns a result set
        --select triggerResultId
        --from   triggerResult
        --side effect like you might expect in a trigger
     

        insert into triggerResultSideEffect (triggerResultId)
        select triggerResultId
        from   inserted

        --just to get a fixed rowcount output
        declare @test table (value char(1))
        insert into @test
        values (1)

        --and a couple of errors
        raiserror ('Low',10,1)
        raiserror ('Normal',16,1)

    end

    Now, re-executing the statement with no results being returned:

    insert into triggerResult
    values (5)

    This simply returns the error message that are returned, and the rows affected message from the insert statement:

    Low
    Msg 50000, Level 16, State 1, Procedure triggerResult$insertTrigger, Line 22
    Normal

    (1 row(s) affected)

    Just to make sure that the rows were created:

    select *
    from   triggerResult

    select *
    from   triggerResultSideEffect

    This returns:

    triggerResultId
    ---------------
    1
    2
    3
    4
    5

    (5 row(s) affected)

    triggerResultId
    ---------------
    1
    4
    3
    2
    5

    (5 row(s) affected)

    Which shows that the data was inserted..

    Note that books online states: This feature will be removed in the next version of Microsoft SQL Server. Do not use this feature in new development work, and modify applications that currently use this feature as soon as possible. We recommend that you set this value to 1.

    This means that the ability to return result sets from a trigger will go away in the next version of SQL Server.  Returning results from a trigger is certainly not something that has ever been a good practice, so if this does in fact occur, it is a good setting to set in your development servers and begin to use.


  • Checkpoint – Four pillars down, Three to Go

    With the previous post on the fourth pillar, I have reached the “end” of the design posts.  To review, these were:

    • Coherent – cohesive, comprehendible, standards based, names/datatypes all make sense, needs little documentation
    • Normal – normalized as much as possible without harming usability/performance (based on testing)
    • Fundamentally Sound – fundamental rules enforced such that you don’t have to check datatypes, base domains, relationships, etc
    • Documented – Anything that cannot be gather from the previous four is written down and/or diagrammed for others

    When you are looking at a database that you have NO idea what it does, or why it exists, these things I have so far outlined will be the sorts of things you notice first. If NOT(normalized AND understandable AND data protected AND documented) = TRUE, then you will feel it and really hope that you are an hours based contractor with your highest rate and a set time to pack up and head elsewhere richer than a .  If the opposite is true, being an employee for that company is where you want to be.

    Looking back, one thing stands out to me as a possible (reasonable) concern. I didn’t include anything about “complete”.  By the time my next edition of “Pro SQL Server 2008 Relational Database Design and Implementation” rolls around, it might.  However, my initial thinking was that these were characteristics beyond the basic “it must at least work” benchmark that should be obvious to any semi-logical human being (which I suppose limits the gene pool, but at least most readers of my book who don’t misunderstand the term data model as a social activity will meet that requirement). I sort of hope it goes without saying to those folks that a database cannot be a good database without it serving the needs of the users. The only thing I am trying to point out in these pillars is to ascertain how cleanly the database does that task.

    I also think that I might do some rearranging to make Normal and Sound Theory as the foundation and just have six pillars.  Either way, things will change.

    For example, if you were modeling a time entry system, no matter how normalized, understandable, data protected and documented your system was, if you failed to capture who was entering time and the employees never got paid, your database would suck.  In fact, I would go so far as to warn you to expect a large group of people at your door with pitchforks and torches ready to give you an alternate meaning to the term “perforated colon” or at least a crash course in database design/implementation.

    The thing is that anyone building a system to capture employee’s time and getting them paid would do a certain set of things, even if the entire system was done on paper. The fact is, rare is the case that a system is created that won’t minimally do the task that was originally asked for in some reasonably decent manner. In fact, this can, as a person who wants to get it done right, be your biggest enemy.  Getting it done well can easily take a back seat to just getting it done.  Worse still, there is some truth to the logic to this statement.  As the old saying goes:

    Better is the enemy of good enough
    Admiral of the Fleet of the Soviet Union Sergey Georgiyevich Gorshkov

    But the problem with that saying is in the interpretation. What is “good enough”? To me, this lack of a definition of good enough is what derails a lot of projects where this is the sentiment. If you don’t define “good enough” as being a solid platform for growth and usability, this is pretty much a half truth, and a half truth is far more difficult to deal with than a complete lie. 

    And then she understood the devilish cunning of the enemies’ plan. By mixing a little truth with it they had made their lie far stronger.
    C. S. Lewis, The Last Battle

    As the architect of a system, just getting it done is NOT the goal. It is a very important step along the way to the actual goal, but it is not the goal. Malleable software that can grow and expand to meet the needs of the users now and in the future is the goal that serves your customers best. The cost of creating and managing software over months and years is often a lot more than is initally thought. The cost of having a human do the work on paper and filing cabinets can be more efficient and cheaper than poorly written, hard to manage software.  Too often computers just turn a bunch of paper forms into bits and bytes rather than streamlining a process to make the computer do the work for you.  Sometimes it is just protecting the jobs of others that causes this, but the fact is, eliminating a job of a person who pushes a button over and over every day will allow someone to be hired to do actual work (even the button pusher!) that can add to the bottom line, not simply waste resources.

    The pillars were conceived a “catch phrase” to help think about how we can ease the process of getting the system built, created, and eventually maintained in a manner that helps the eventual product be useful for years to come, and in the process help you to become a respected member of your team, not the one who is talked about as a “good riddance” once you have finally left the company.


  • Do you care where an idea comes from?

    Ok, so you are the lead ______________ for your company.  The big cheese, the super duper guy who knows everything…right?  So then some newbie comes up with an idea that might solve a problem you have been dealing with for a few days/weeks/months/whatever.  What do you do?

    A. Ignore his idea and use your own, genetically superior idea

    B. Ignore where the idea came from, pretend it is your own.

    C. Accept the answer, give the newbie credit, put it in your toolbox so next time you will know it too

    If you didn’t answer C, I am glad we aren’t coworkers.

    Frankly any other answer blows my mind. Clearly, the people who hire a person on the top of their field will know more than most people. And it should be expected that a person 15 years of experience with some topic/product will know more than people with 1 or 2. Everyone has access to the same search engines, so they can find new ideas and act upon them. Ideally there is a review process where people that have the life experience do much of the reviewing, but still that usually goes both ways.  Reviewing ideas before executing them should be done even if it is the companies chief smarty pants, if for no other reason that when people of lesser experience try to validate an “experts” ideas (or hopefully disprove the expert and win some cool points,) the worst that can happen is that someone learns something new.  The best thing that can happen is a disaster is averted. And that can’t be a bad thing, right?

    But going back to the “expert” role.  Too often being saddled with the moniker “expert” gets mistaken for “perfect.”  Unless I suddenly become perfect, and man, that ain’t happening any time soon, everyone make mistakes. In fact, I know that one of the things I am most proud of is my complete lack of competitive pride. (I am the best, baby!)  To me, the key is how the person who is saddled with the title expert handles it. All to often, the gut reaction is A. “Well, you can’t be right. My idea is clearly better as it was formed from my superior intellect.” This might not be that horrible, like if the decision was where to eat lunch, but if you were the engineer who designed the Titanic, well, not good. The answer B is just evil.

    Who cares if there are 2 experts at your company? Or on the planet? Competition should drive you to work harder to be better, with the final goal being everyone being better…not to harm others…


  • Headed..South..To Atlanta for SQL Saturday

    Heading down to SQL Saturday (www.sqlsaturday.com) in Atlanta to give a 9:15 AM session on Database Design, with a greater than normal dose of normal thrown in. I hope there are more than 7 people to show up, because I would hate if everyone got some swag (I usually like tossing out stuff when people make cool and interesting statements, and NOT when they prove me wrong…you know who (all) you are :). Apparently they were at their capacity, so maybe not.  If you need to know where it is, check here: http://www.sqlsaturday.com/eventhome.aspx?eventid=17, and come on out.  The more smart alecky know-it-alls I get who want to prove me wrong the better…well, as long as you sit there quietly and learn something…

    (Hmm… I hasten to send you that link (www.sqlsaturday.com), because frankly you will have to see that someone has a bad where clause on a query somewhere. I would offer up prizes for the best guess, but every time I do contests no one ever gets it or even tries.)


  • Book Reviews – Again

    I have gotten a few more reviews in, and interestingly I appreciate the negative ones almost as much as the positive ones. I prefer the negative ones that have decent star ratings better… but what are you going to do.

    The most recent review was critical of the book for not having mentioned testing. I actually think that this was really good criticism and have already started my planning for how to rectify this.  The only thing I wish this reviewer had done was mention the rest of the book.  This person has three reviews on Amazon and some cool stuff on his wish list (if we ever meet, I will be happy to buy you an expresso/cup of coffee and discuss the rest of the book, which pretty much goes for anyone, if you want. I will also buy you lunch at my favorite restaurant: Prince’s Hot Chicken Shack. Only rule is that we have to talk about the book at least a little).

    Please, if you have read the book, oh please (am I begging), I beg you (yes, I am begging) to please email me your feelings on the book or post reviews. I would love to know what you thought of it. I just want to make the book better and who knows, I usually give out a few copies of the next book (no guarantees) if the advice is really constructive. 

    The real problem here is that writing is a VERY slow process. If my book was electronic, I could start writing and shoehorn in the new material and be done with it.  But a book is not like a website.  I wrote the book as a cohesive 650+ pages that are supposed to work together as a unit. Unlike a set of web pages, my hope is that you will skim 1/2 of the book and read at least half (which half depends on you, but I like both halves.) And in each edition, I try to give more and more information as I find it, learn it, and on a few shining occasions, make something up.

    In the first edition, the process was simply that I wrote what I thought I wanted to say, and editor(s) hacked that to bits.  So about 10 people were involved in the creation.  Now, working on the fifth edition I have had hundreds of people give me feedback, and a fairly small percentage tell me stuff that was missing that ruined the experience for them. I take these comments VERY seriously, especially if I agree with them.  If you compared version one to the fourth version, a lot of the stuff I was really enthused about didn’t make too many people all that excited, so I cut it.  In this last version, I have more examples, more code, and more technique, because it was asked for.

    Now if I just knew if this reviewer liked anything else about, or if there was more that was disliked I could possibly make the next book even better.  So if you have any ideas/feelings/criticism/etc email them to louis@drsql.org.  Thanks!


  • Allen White to speak at our Nashville SQL Server User’s group April 24

    I am scheduled to be the opening act, with a short 5-10 minute tip/trick session entitled “Tips – Taking Care of ‘Future You’”, and then the illustrious Allen White will take over and give a presentation on the new administration features of SQL Server 2008.  Frankly I am glad that he is going after me, because I know he is going to be quite a bit better than me :) (This being based on what I have heard and the fact the he is a professional trainer and I am just a rank amateur!) 

    If you don’t know Allen, his bio is:

    Allen is a SQL Server Trainer for Scalability Experts, a leading provider of scalable solutions, training and services based on Microsoft SQL Server. He's spent over 30 years in IT, including operations, development, telecommunications, network admin and database design and administration. He's been using SQL Server since 1992 and is certified MCITP:Database Administrator, MCITP: Database Developer and MCT.  Allen has been awarded Microsoft’s MVP Award for his work in the SQL Server community for the last two years. He's active in the Ohio North SQL Server User's Group and contributes in the MSDN Forums, answering questions about SMO and PowerShell, and maintains a blog at http://sqlblog.com/blogs/allen_white/default.aspx.

    He is a great guy, and I am looking forward to getting an opportunity to hear him speak.  He is coming to see us because he happened to be in town for the weekend to run the Country Music Marathon. I was going to run it, but then I realized I was going to be in Atlanta for SQL Saturday. (Okay, to be honest if you know me you would be fairly certain that I won’t even be walking 26 miles anytime soon.)

    If you want to come out and see Allen too, we will be at Nova Copy, 15 Lindsley Ave. Nashville 37210. RSVP to Shelton Dickson (shelton@dicksonresources.com) as soon as you can if you are coming so we will know how much pizza to order.


  • The fourth pillar – Documented

    This blog probably won’t stir up a hornet’s nest or anything, but I would also expect that it would be the least popular in practice. The person who feels they can disagree with the need for a reasonable amount of documentation is probably nuts. In the first post, I defined documented as “Anything that cannot be gathered from the previous four is written down and/or diagrammed for others”.  And yes, I know documentation is more boring than a whole oilfield of derricks.  So let me give you a life lesson. Documentation is fun when it is done BEFORE you build.

    Yes, you heard me. If you have chosen names for things that seem right, the definitions you make may in fact just fall out of the name. So it isn’t that much work. The payoff is probably a lot better than you might imagine. And be careful though, TOO much documentation can easily be worse than no documentation. Sometimes systems have so much documentation that it is impossible to know how to even find information, or there are conflicting bits of documentation.  Keep it sparse, just enough information that you communicate what was on your mind when you were designing your objects. Some example benefits that I find:

    • Names can only be so descriptive – You only get 128 characters for a name, and if you get close to that your users are going to beat you with whatever they can find handy.  Name should be succinct versions of the definitions. 
    • Names should not usually include purpose - Purpose can be tricky. Who wanted this, why they wanted it, and how it is to be used can only be documented (or mined from the code later, I suppose, if you are REALLY patient and your programmers use really good comments.)
    • While documenting your objects, you will be forced to (shudder) think one more time about what you are documenting -Writing down your meaning of what you have just designed after you are feeling good about it gives you a chance to give it one more look over, and this time in context of other stuff you have done. I often find one or two things per hundred (at least) that I go…maybe I was wrong. (Of course when I say it the sound in my head is more like Fonzie admitting he was wr..wr..wr..wroo…wwwr..Hey a guy has to have some ego.)
    • Documentation should be written in a language that all users can understand – Sometimes in our names we use clever jargon to make the names more concise. Inheritance is a good example.  Ask the normal user what inheritance is any they will start fantasizing that their parents just passed away (in the best possible manner, of course) and left them a billion dollars. Clearly this is unlikely to be the meaning of this word unless you are creating a system to create a last will and testament.

    The fourth is the magical moment (wow, too much Disney ).  You have your design you want to implement.  Your analysts and users are checking out the design to make sure they “approve.”  They see a lot of names, fine.  They see some diagrams, decent. They read your definitions and wait.  What do you mean by “this attribute is here for X”? Did we agree to that?

    Well, yeah, I thought you did, you will indignantly think, but sometimes, no matter how sure you are that you are right (and really no matter how right you are that this was in fact the design you had agree on) the realization that a communication gap the size of you and your parents was somehow opened up and things went sideways.

    I feel like I am repeating myself, but for you Agile guys, I don’t mean documenting things to death, or even spending all that much time.  If you are creating 30 tables with 10 columns on average, and 20 relationships, you only need to document 320 independent items. Add a bit more to the documentation for constraints, and then for processes, and you might have two hours of work. And if you DO find this taking too long, either you have messed up the first Pillar: Coherent or your team has real communication problems that can’t easily be solved by reading this blog.  And the more Agile you are, the more important coherent communication (which includes documentation) is. What happens if you get it wrong and cannot ship after a sprint because you didn’t get clear direction and your team didn’t realize it until they started seeing the “results”?  Pretty sure that can’t be good for the manifesto.

    So yeah, all I am trying to say is that documentation is communication. Not just to you, not just to other programmers, but to future programmers, users, and even a future version of yourself that you would treat better if you could see the pain in his (or hers) eyes. 


  • Requirements vs Architecture

    Okay, so on the first look this sounds like the most boring Japanese action movie ever. Requirements is tearing through the village, and Architecture is in the city.  Developers by the horde are trying to code both of these into oblivion…Maybe not.  Clearly I am talking about something a little more exciting…the battle between the forces of business and computing, or more precisely, the forces trying to keep them apart. Back in 2000, C J Date produced a book (which I clearly haven’t read) called “What, Not How”.  I haven’t read it because as a person who does implementation, my primary care is about the opposite. “How, Not What”  I would like to think that if the “what” was perfectly defined, all we as implementers would have to do would be to translate from business language to machine language, but all to often I find myself involved in the business analysis, either because it is thought that I would be, or because the requirements come to me in terms of how the software will be created (which isn’t usually the way I want it done, which usually does matter, since it is my responsibility to actually design and often do the implementation.)

    The thing I tend to find is that one of the most difficult parts of the software implementation process is getting people in the proper roles. As an architect, I find that a lot of times it is expected that I know a lot about the subject matter of the system I have created.  Even worse, when it comes to the creating a new system, it is supposed that I would be able to pick a lot about the system that is being designed (often without it being formally communicated.) There is some truth there, but let’s be honest.  I got a degree in Computer Science, not chemical processes, accounting, or food service/logistics. Every new system that I might work on is a crash course in learning a new industry.  And to be honest, not always one that I really want to fill my brain with permanently.

    However, the person whose role is entitled Business Analyst does in fact have that role to learn enough about a problem in an industry and turn it into a specification. This document is what will be used to create a future piece of software and/or manual processes (that is for us to decide once the scope of the system has been decided upon.  Their job is to, well, this might sound silly if you think about it, but to analyze the business and “figure it out”.  This role is one that I feel is essential to the health of a project because they are cleanly divorced from the implementation whereas I as an architect are not.

    Put it this way, consider the role of the architect for a building.  Does the architect determine what the house is for? No, for if they did every house would be 100 stories with artsy windows in every room.  I saw a good example of this on the TV Show “How I Met Your Mother” the other night. Ted (the architect) was told to design a bleak room for the clients new “firing room” (this was to be a copy of the existing one, just on a different floor of the building).  He wanted a horrible place to can people because he was completely sadistic.  Ted wanted to put his touch, and designed a whole system for nicely letting someone go, including a sympathy room with councilors, and such.  Needless to say he was fired from his job.

    As an architect, it is my role to take the business requirements and negotiate a solution that closely resembles what is desired, bending mostly for cases where the desired system, if built in the exact manner of the requirements, would extinguish life on Earth or cost too much (sometimes just the latter.)  Keeping out the influence of implementation during the requirements gathering phase of a project eliminates the “this is how you do it in this technology” creep. I am a relational designer, and as such I feel I can do anything in SQL (I want someday implement a compiler, much like I used VB to implement one in a college class once. Not my best grade, but it did work.)  I constantly find myself steering requirements toward what *I* know how to do, and as such find that I tend to miss a few points because I have started designing/coding mentally even before all of the requirements have been gathered. Of course, in the implementation it is okay to steer towards what you know, as this is how things get done.  Having multiple strong minded people on a team, however, is how great things get done.

    A classic example of this for me was when I was architecting the db for a chemical plant product testing system. Materials had to be within a given range or they were considered not shippable. Cool, makes sense, keep getting requirements…must…get…coding. Well, a person who had really thought about their business would have probably realized that “not shippable” might not always mean “not shippable” when you are talking about materials that are not medical nor edible. So in the first week of production they go, “okay, so how do I override these requirements to ship this lot of product to client X?” Oops. Maybe if I had spent more time considering the entire process rather than trying to finish up my diagram to generate code…

    What do you think?  Can you turn off the “Get ‘er done” part of your mind when you are doing requirements, or do you find that you really really want to start building software well before you really understand the entire problem?

     

    Now, before you get all “but in Agile we don’t gather ALL requirements…” understand that I don’t mean ALL requirement for an entire project.  What I do mean is as many requirements as possible about the area you plan to implement, and I still mean that if you are the coder that steering requirements toward your expertise is not a great service to anyone but you…well, unless you are really good, then maybe it is okay.  In my example, if we had said “We aren’t going to implement overriding in this sprint,” the system would have still been unusable.


  • The third pillar – Fundamentally sound

    This one should be simple to anyone who sees it (once I decode what I mean by fundamentally… and sound…by then for sure!) In the initial post I defined this as – fundamental rules enforced such that you don’t have to check datatypes, base domains, relationships, etc.  The gist here is that you at a minimum don’t have to spend all of your time validating what has been saved.

    As an analogy, consider the first car your mom and dad (or mom or dad or grandparent or even yourself, so as not to be offensive to anyone!) purchased. If it was like my first car, my father made sure that it was fundamentally a safe vehicle, but that was pretty much all you could say about it.  It didn’t have all of the bells and whistles (unlike my car now that has lots of whistles) but it had an engine that was solid and managed, a steering wheel that turned, and brakes that stopped the car.  Near the end of my ownership of the car, it had a set of pliers for a gear shift, but at that point the car was losing the fundamentally sound “zone”.

    So why does this matter? There is a large number of non-db programmer types who would prefer it if the database was treated as nothing more than a data storage device with some search capabilities built in.  SQL Server is happy to serve in this capacity, but what a waste of resources.  Why?

    • Data has to be validated to be used– ETL type processes are all to often ETCCCCCL processes, where instead of Extract, Transform, and Load, there is a bunch of Clean, Clean, Clean processes tossed in.  The problem is that no matter how good you are, without using at least foreign key constraints and transactions, you will end up with less than pleasant data once users and the reality of concurrent data access gets a hold of your data.
    • Some forms of validation are very difficult to do outside of the data layer – starting with foreign key constraints, this list is actually more large than you might imagine.  For example, any validation where another table or multiple rows is involved, like a check constraint that check the cardinality of a relationship.  Another example is uniqueness. Uniqueness enforcement should always be done in a UNIQUE constraint. Worst case you are going to end up with a unique index that is very helpful to the query processor.
    • Loading data is far safer – If you rely completely on a layer to manage the integrity of the data, what happens when the ETL layer is pointed AT your data?  Do you have to recode all of the rules?  Of do you just hope that the rules are all followed? Unfortunately hope springs eternal in a lot of situations where it was not such a dandy idea, and the UI starts failing because of unknown data rules.
    • Data in a nullable column ought to allow nulls – Nothing quite cheeses my cracker more than trying to set up a scenario and I leave all data nullable and it crashes the app.  So you start filling in data as you figure it out (hopefully the app tells you what column it is stumbling on) until you finally get there. If the app requires data, then the database should too.

    Look, In the early days of client-server programming, we tried using the data layer as the be-all and end-all layer of data programming, and it stank. Our UIs were as bad as a lot of web pages are now. No validation, just let a different layer handle it and report errors after the user had entered lots of data. I am not suggesting that.  And be clear about something else too.  This is NOT about stored procedures.  Stored procedures are part of the data access layer, not the data layer.  Whether or not you favor procs as a data access layer is immaterial to this discussion. I am talking about the fundamentals that can be done with or without a TSQL implemented data access layer.

    For example:

    • Theoretical data ranges – Salary greater than 0, Temperature for the weather forecast within a range that could actually occur, etc.
    • Data types – don’t just use sql_variant for every column, leaving the user to deal with it at runtime. Pick the type that actually represents what you are modeling.
    • Data Length – And in the same vein as data type, choose a length that is reasonable. Too many databases have varchar(50) for all string types, unless it is text.  Come on, think about it and set the max limit if there is one.  Don’t use an “unlimited” type and only give the user 50 characters.  And don’t do the opposite either.  Varchar(50) with 100 characters on the UI.  Unless your testers are excellent (or psychotic) you won’t find that out until much later
    • Uniqueness – Already mentioned.  Don’t leave that to the UI, you are going to need an index in any case to do this well.
    • Relationships – Foreign key constraints rule and take no thought.

    And implement as much of this on the UI as needed also. Best case you generate the code from both from a rule repository. Don’t force the poor user to type data into 100 columns and then find out that the need another piece of information in a proper format, only have the form timed out when they come back to it.  Unless you hate your users, in which case you are excused as long as one of my friends isn’t stuck using your evil UI…then it just isn’t cool.


  • Heading to SQL Connections!

    You know how when you go to a session and there is always this person who is heckling the presenter?  Well, it isn’t that often that this person is authorized to be annoying people/answering questions at the request of the main presenter.  Well, that is about to change.  By a nice coincidence, I was going on vacation to Disney World next week, and Paul Nielsen (the primary author of the upcoming SQL Server 2008 Bible, as well as earlier versions in the series) was speaking there on Wednesday. Well I never miss an opportunity to heckle Paul get a bit more SQL Server knowledge and he asked so I will be there.

    The two sessions he will be presenting (with a little help from his friends) are:

    SSW09: Smart Database Design
    Data architecture can be a vague subject. This session provides clear strategic thinking on data architecture using six measurable attributes of a database that can be used to evaluate designs and best practices.

    and

    SSW08: Nordic – New Object Relational Database Design
    Objects and database don’t typically mix well. Nordic is Paul’s open source T-SQL code-gen project that fully emulates object orientation within SQL Server, supporting class/attribute inheritance, workflow state inheritance, and associations with workflow state dependencies and inheritance. Associations with inheritance is very significant feature–it takes the 2D relational model and makes it 3D. Adding workflow state to associations is like adding business logic to foreign keys without any code. Using a generic .NET UI to display the data, Nordic makes it easy to model and build an object/relational database with a clean database abstraction layer. And it’s fast.

    So come by the sessions in the Mediterranean Salon 8 from 1:30 – 3:45 to close out the SQL Connections in Orlando.


  • Granting rights to all objects in a database

    File this under the “I can’t believe there is still stuff that I keep learning about SQL Server 2005!” though thankfully most things I find I learn are things I wouldn’t be all that likely to use.

    I was asked today how I felt about using the syntax:

    GRANT EXECUTE TO [username]

    to give users rights to all procedures in the database (and you could use it with SELECT, INSERT,…and other rights too).  Well, first off, I had to admit that I didn’t know you could do this. I knew you could grant execute rights to a user on a schema, but not the entire database. Jasper Smith had an article back in 2004 on sqldbatips that covered it (http://www.sqldbatips.com/showarticle.asp?ID=8), so it isn’t some big secret.

    I often use the technique to grant a user all rights to a given schema:

    GRANT EXECUTE on SCHEMA::schemaName TO [username]

    I often use the this technique to apply execute rights to an application login/users to a given schema. I don’t like it as much when using the dbo schema, since it commonly contains other object that I don’t want to just give blanket rights to, but when using named schemas to segregate objects into functional groups (partially for security, and partly for logical separation) I feel it is a good idea.

    Even using the dbo schema isn’t horrible, as long as you understand what you are doing and are careful to separate out other procedures. I am a big believer in having the database be as self contained as possible, so I try to put maintainence objects and such in the database, typically in a schema named utility. Often this might have a procedure to drop all foreign keys, or indexes, etc.  Whatever I might need during a data load operation, or even just general maintenance. So granting rights to the entire db seems a bit too lenient as I don’t want ANYONE who isn’t the dba running these procedures.

    I guess the fact is that I think that the database security should be a bit more stringent than a simple GRANT all rights to EVERYTHING, as you have to be cognizant that sometimes there will be objects in the database that just shouldn’t be opened up for the programmer to accidentally use, thinking that an object does something different than it actually does.  When you carefully lay out schemas, odds are that you are considering the purpose/meaning of the schemas and have  a plan for the schemas which SHOULD consider security/usage as well. 


More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement