THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Aaron Bertrand

  • Bad habits to kick : relying on undocumented behavior

    In my last post in this series, I talked about the common habit of creating an IDENTITY column on every single table.  Today I want to talk about a more broad concept: relying on undocumented (and therefore probably undefined, and certainly far from guaranteed) behavior and objects.

     

    ORDER BY in a view

    This is probably the most infamous of all SQL Server undocumented behaviors.  In SQL Server 2000, users learned to create views with a built-in ordering, such as follows:

    CREATE VIEW dbo.MyView
    AS
        SELECT TOP 
    100 PERCENT a,b,c
            
    FROM dbo.MyTable
            
    ORDER BY c;

    The problem is, they have assumed that if they then issue...

    SELECT FROM dbo.MyView;

    ...the results are guaranteed to come back ordered by c.  In reality, the order of the results of such a query is arbitrary, since the behavior is undefined.  Without an ORDER BY on the outer query, SQL Server is free to return the results in any order it chooses (more about this later).  To dissect the syntax in the view a bit, I want to explain that this circumstance is a symptom of a poor design choice in the T-SQL dialect -- the way TOP was implemented, the ORDER BY can now serve two functions: to determine the rows included in the result, and to indicate the order of the result.  Which is why you sometimes see the following logic to get the top 10 finishers of a race, with 10th place listed first (well, let's ignore ranking functions and the OVER() clause for now):

    SELECT RunnerID, [time_in_seconds] FROM
    (
        
    SELECT TOP (10) RunnerID, [time_in_seconds]
            
    FROM dbo.RaceResults
            
    WHERE RaceID 15
            
    ORDER BY [time_in_seconds]
    AS x
    ORDER BY [time_in_seconds] DESC;

    Going back to the view, the TOP 100 PERCENT is required in this case because the ORDER BY is only supposed to be used to determine the rows included in TOP, not to guarantee the order of the results.  The syntax for a view allows you to include an ORDER BY clause *only* if you also include a TOP clause.  In the 2000 version of the optimizer, it just so happened that you could depend on the fact that in this case, if you issue a SELECT against the view without an ORDER BY, you would in fact get the results in the order defined by the ORDER BY within the view.  In SQL Server 2005, changes to the optimizer broke this behavior, and this was considered a bug by a large segment of the SQL Server community -- even though the behavior that people were relying on in previous versions was not guaranteed; it was more or less a coincidence.  One rather large customer of Microsoft, with plenty of leverage against the SQL Server team, made such a fuss about this change in behavior that a trace flag was introduced to instruct the optimizer to obey the ORDER BY in the view when returning rows.  (You can get a lot more information about this trace flag in KB #926292.)  Should you use this trace flag?  In my opinion, no -- I think the squeaky wheel was just too lazy to fix their application code so that it properly dictated the order in which it wanted its results.  Should you be aware of what can happen when you rely on undocumented behavior, and that behavior changes?  Absolutely.

     

    Ordering without an ORDER BY 

    Aside from the view case above, there is a more general myth out there that SQL Server will always return rows in a specific order - the myth varies from person to person, but it is either the order of the clustered index, or the order of the IDENTITY column, or the chronological order of insertion.  (Similarly, the assumption is that using TOP without an ORDER BY should have a predictable meaning; as with ordering results, it does not.)  Often people come to this conclusion because in a limited sample size, that's the behavior they "always" see.  I want to make it quite clear: ordering is arbitrary unless you use an ORDER BY clause.  You should never, ever, ever rely on the ordering you observe in a query without an ORDER BY -- and you should only issue a query without an ORDER BY clause if you truly do not care what order the results come back.  In such a case, you may as well imagine that the rows are going to come back in a different, random order each time, even though that is not truly the case (random has a meaning completely separate from arbitrary, but like I said, just imagine). 

    As I mentioned earlier, in the case where you haven't told SQL Server how you want the results ordered, the optimizer is free to find the most efficient way to return the results to you, which can mean any ordering whatsoever.  Typically you will see a predictable ordering, but it might not be the one you expect (for example, it may happen to use a non-clustered index to sort).  And even if it is currently the one you expect, the next time you run the query, the choice the optimizer has made can change due to a variety of events, including but not limited to: statistics updates, data changes, statement or module recompiles, query/table hints, forceplan, plans dropping out of the plan cache, service packs, cumulative updates, engine upgrades, and even detach/attach or backup/restore to a different server at the same build level.  So quite truthfully, you could run such a query right now, and 5 minutes later run it again, and get the results in a completely different order.  This is the quintessential definition of relying on undocumented and undefined behavior, and I strongly recommend you stay away from it.  If you want some proof, go and check out Alexander Kuznetsov's post in his Defensive Programming series, entitled, "Without ORDER BY, there is no default sort order."

     

    Short-circuiting

    I see a lot of cases where people expect SQL Server to short-circuit their clauses -- basically, read the conditions from left to right (or top to bottom), and when any of the criteria returns false, don't bother evaluating any subsequent conditions.  An example is where you have chosen to store DATETIME data in a VARCHAR column, then try to evaluate some DATETIME aspect -- but only after determining which values really are dates (this pattern of poor data type choices is another really common bad habit, by the way).  So you see queries like this:

    CREATE TABLE dbo.foo(bar VARCHAR(20));

    INSERT dbo.foo(barSELECT '20090201'
     
    UNION ALL SELECT 'last week'
     
    UNION ALL SELECT 'next Tuesday';

    SELECT bar
    FROM dbo.foo
    WHERE ISDATE(bar) = 1
    AND DATEPART(MONTHbar) = 2;

    DROP TABLE dbo.foo

    The result:

    Msg 241, Level 16, State 1, Line 6
    Conversion failed when converting date and/or time from character string.
    The truth is, SQL Server is free to evaluate the conditions in any order, so it doesn't necessarily have to first check the result of ISDATE() before trying to treat the data like DATETIME values.  And like with the ORDER BY cases above, just because you observe a specific behavior today, does not mean you will see the same behavior tomorrow -- many things can change the way SQL Server processes a query.  So how do you get around it?  Well, there are a few ways.  The most obvious one in this specific case is, using a DATETIME column to store DATETIME data.  But this isn't always possible, and it is not always the issue in this scenario anyway -- it was just the handiest example I could come up with.  The next thing people try, is to use a sub-query to first return only the rows that have DATETIME values:
    SELECT bar
    FROM
    (
        
    SELECT bar
            
    FROM dbo.foo
            
    WHERE ISDATE(bar1
    AS x
    WHERE DATEPART(MONTHbar) = 2;

    The result: same error (and same happens with a CTE).  It's like the optimizer is reading your mind and knows you're trying to trick it; it still sees that there are rows in the table that would not pass the DATEPART check, but is evaluating things in the wrong order.

    So how do we get this "short circuiting" to really work?  The CASE expression is your friend (and thanks Alex for the improved syntax suggestion):

    SELECT bar
       
    FROM dbo.foo
       
    WHERE CASE WHEN ISDATE(bar) = 1
            THEN DATEPART(MONTHbar)
        END = 2;

    I'm sure I've seen this solved in other clever ways before, and that someone will remind me how to really make this short circuit, but on a Monday night, this is the only workaround I know is guaranteed to work.  In some cases, SQL Server *will* short-circuit (see the comments from Nigel Ellis about 2/3 through this TechNet chat transcript), and you can see some working examples from Mark Cohen.  My point, though, remains: feel free to take advantage of these optimizations, but do not rely on them.



    sp_who2

    You're probably thinking, wow, that's a strange one to mention.  sp_who2 is undocumented and unsupported?  Absolutely.  Try to find sp_who2 in any current version of Books Online.  I reminded them that there was a reference to sp_who2 in the 2005 version of Books Online in Connect #207997 back in September of 2006.  Shortly thereafter, the last remaining trace of this procedure was removed.  I also asked for a supported and documented version in March of 2007, in Connect #264681.  Don't hold your breath, because all sp_* procedure development has been abandoned (which is why a lot of the sp_help style procedures haven't been updated to reflect new 2008 features).  I am also not trying to scare you into believing that sp_who2 will suddenly stop working in the next release or service pack - I think it is pretty much here to stay.  But with the advancements in performance tuning and troubleshooting, I question if it is of any value to use these days at all.  Instead, there are many alternatives.  Adam Machanic has written a great replacement with more flexibility and much more useful information than the new stuff (sp_WhoIsActive).  I also documented a few approaches to querying the DMVs for certain information in Chapter 29 of SQL Server MVP Deep Dives, entitled, "My Favorite DMVs and Why."


    sp_MSforeachdb

    This system procedure creates a cursor, loops through all of your databases, and performs some action against each of them (for some background, you can see this Database Journal article).  In one of my systems (2005 SP3), I was able to reproduce many times a scenario where a random number of rows would come back from sp_MSforeachdb.  It seemed to occur more frequently during our peak load times, but nonetheless, that got me spooked.  I looked at what it was doing and, not surprisingly, it is actually pretty simple if you discard all of the security stuff (you should only be running this kind of code for admin / maintenance functionality anyway, IMHO).  I was surprised even in 2008 R2 to still see this code in the procedure:

    EXEC(N'declare hCForEachDatabase cursor global for select name from master.dbo.sysdatabases ...'

    ...the use of dbo.sysdatabases, as opposed to sys.databases, serves as further proof that there is no investment whatsoever in updating sp_* procedures to fit the new features and catalog views as SQL Server matures.  Anyway, I like to write my own versions of these procedures.  Not only does this enable me to avoid a flaky global cursor that is quite expensive, and future-proofs me against changes in their behavior (or deprecation that technically doesn't need to be announced), but it also allows me to add flexibility to the procedure, such as only executing code against a database with a certain naming pattern or in a certain recovery model, and only attempting to run on databases that are online and aren't read only.  For example:

    CREATE PROCEDURE dbo.ForEachDB_MyWay
       
    @cmd NVARCHAR(MAX),
       
    @name_pattern NVARCHAR(255) = '%',
       
    @recovery_model NVARCHAR(60) = NULL
    AS
    BEGIN
        SET NOCOUNT
    ON;

        
    DECLARE
            
    @sql NVARCHAR(MAX),
            
    @db NVARCHAR(257);

        
    DECLARE @c CURSOR
    LOCAL FORWARD_ONLY STATIC READ_ONLY
            FOR
                SELECT 
    QUOTENAME([name])
                    
    FROM sys.databases
                    
    WHER(@recovery_model IS NULL OR (recovery_model_desc @recovery_model))
                    AND 
    [name] LIKE @name_pattern
                    
    AND [state] = 0
                    
    AND [is_read_only] = 0
    ORDER BY [name];

        
    OPEN @c;
       
        
    FETCH NEXT FROM @c INTO @db;

        
    WHILE @@FETCH_STATUS != -1
        
    BEGIN
            SET 
    @sql = REPLACE(@cmd'?'@db);
            
    BEGIN TRY
                
    EXEC(@sql);
            
    END TRY
            
    BEGIN CATCH
                
    -- I'll leave more advanced error handling as an exercise:
                
    PRINT ERROR_MESSAGE();
            
    END CATCH

            
    FETCH NEXT FROM @c INTO @db;
        
    END

        DEALLOCATE
    @c;
    END
    GO


    -- simple example: print names of all DBs in FULL recovery:

    EXEC dbo.ForEachDB_MyWay
       
    @cmd 'PRINT ''?'';',
       
    @recovery_model 'FULL';

    You could also do other interesting things with this construct, such as filtering on databases where log_reuse_wait = 0, or is_broker_enabled = 1, or where it is not a system database (database_id > 4).  You could also dictate an ORDER BY so that the databases you deem most important can appear at the top or the bottom of the queue, so to speak.  But most importantly, you'll be able to keep augmenting your own version of the procedure, as SQL Server adds more features and as you find new ways to want to work on sets of databases.

    I asked for a supported and documented version of this stored procedure in Connect #264677, so that hundreds of DBAs out there don't have to develop their own procedure as above.  But it was promptly closed as "Won't Fix."

    (Of course the same kind of logic can hold true for a similar undocumented stored procedure, sp_MSforeachtable.)

     

    xp_fileexist / xp_getfiledetails

    The writing is on the wall: the days of extended procedures are numbered, especially those that are undocumented.  Writing code today that relies on these modules being available in future versions of SQL Server is a little like Russian roulette.  The last thing you want to do during a SQL Server upgrade is worry about breaking code that accesses the file system.  Thankfully, there are many tutorials out there that will help you bring CLR to the rescue, in the event that you really need to perform file handling from within SQL Server.  Greg Larsen does a pretty good job of priming you to write your own xp_getfiledetails.

     

    Summary

    Sadly, I am just skimming the surface on features and behaviors that are undocumented and/or undefined.  There are literally dozens and dozens of undocumented stored procedures and extended procedures, and many engine behaviors that we all take for granted.  There are probably several behaviors I rely on every day that I don't realize are undocumented... it is definitely a difficult habit to avoid altogether.  But the realization that you *might* be using undocumented methods, or making assumptions based on observed and not guaranteed behavior, is often a good chunk of the battle.


  • Bad habits to kick : putting an IDENTITY column on every table

    Back in October, I started a series of blog posts called "Bad Habits to Kick," and thought I would revive the theme.

    I've worked with developers that dabble in SQL, and they tend to have a few common traits.  I'm not sure where they come from, but one that I find rather distracting is the tendency to place an IDENTITY column on every single table.  Usually this is done because this is an "easy" way to add a column to the table that allows you to identify a single row.

    Now don't get me wrong, I am a firm believer that IDENTITY columns have their place, and I use them quite liberally.  When you are storing information about customers, products or companies, where there are going to be multiple references to those entities in related tables, it makes little sense to copy all of the "real" attributes that identify that entity into all of the related tables.  In some cases this would make your storage requirements surge dramatically, never mind that the performance of JOINs will suffer.  It also can make for a real mess when you are repeating information that is likely to change - such as surname, e-mail address, phone number, ISBN.  Both of these issues are technical and not logical limitations - SQL Server doesn't support a "behind-the-scenes" foreign key construct that would make repeating the information unnecessary, and there are cases where cascading updates do not exactly work as advertised.

    A simple example is where you have an Orders table and an OrderDetails table.  (And as a heads up, whiteboarding your perception of Barnes & Noble's database schema is a typical part of my interviewing process - how detailed we get is up to you :-).)

    CREATE TABLE dbo.Orders
    (
      
    OrderID INT IDENTITY(1,1PRIMARY KEY,
      
    ...
    );

    CREATE TABLE dbo.OrderDetails
    (
     
    OrderDetailID INT IDENTITY(1,1PRIMARY KEY,
      
    OrderID INT NOT NULL FOREIGN KEY REFERENCES dbo.Orders(OrderID),
      ProductID INT NOT NULL FOREIGN KEY REFERENCES dbo.Products(ProductID),
    Quantity INT NOT NULL,
      
    ...
    );

    Now tell me, what business purpose does the IDENTITY column on the OrderDetails table satisfy?  When are you ever going to need to reference rows in this table by the OrderDetailID, and not by the OrderID and/or ProductID?  And why is *that* the column that should uniquely identify the row?  This seems to imply that you would allow multiple rows for the same order and product.  Even if the PK is elsewhere and these questions disappear, the value of this column is questionable at best - for no discernible gain whatsoever, you seem to be taking up an extra 4 bytes (plus space in any indexes that include the column, and then the additional I/O involved in maintaining those indexes).

    Another example is a logging table.  Let's say you have a table that logs exceptions, or API calls, or other events over time.  You insert rows monotonically either in real time or from log files.  I often see this designed as:

    CREATE TABLE dbo.ActivityLog
    (
     
    LogID INT IDENTITY(1,1PRIMARY KEY,
      
    EventDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
      
    ...
    );

    I think part of the knee-jerk reaction to add an IDENTITY column and make it the PK is that events can happen at the same time, so it makes it hard to define a real natural key.  I'll concede this point, as I have several log tables that do not have a PK or a unique constraint - but they certainly have a clustered index, and they do not have an IDENTITY column.  The purpose of this type of table is usually to maintain a history of activity and not to do a lot of searching for individual rows (and when you are searching for individual rows, you are not looking for a specific LogID, you are looking for details in other columns - say, part of an error message).

    An interesting thing that came up in one of these cases was the need to purge older data.  So, the developer intended to have a 30-day running window, and once a day a job would wake up and would delete any rows from the log that were more than 30 days old.  Since the clustered index was on the IDENTITY column, their code basically did this:

      - determined the highest IDENTITY value more than 30 days old
      - assigned that to a variable
      - deleted from the table where IDENTITY < variable

    With the clustered index moved to the DATETIME column, and the IDENTITY column disposed of, his code became a lot easier (simply a DELETE with a WHERE clause), and more efficient too, since it didn't have to perform a lookup first.

    Now let's talk about a very similar case of a logging table where I think an IDENTITY column *is* useful, such as a logging table that keeps track of backup or index activity.

    CREATE TABLE dbo.ActivityLog
    (
      
    LogID INT IDENTITY(1,1UNIQUE NONCLUSTERED,
      
    EventType TINYINT-- assume some kind of FK
      
    StartDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
      
    EndDate DATETIME NULL,
      
    ...
    );

    Now why is an IDENTITY column useful here?  Well, rather than logging the start and end events in different rows, keeping the data on a single row makes it much easier to consolidate the start and end of a specific event.  So the code would do something like:

    DECLARE @LogID INT;

    INSERT dbo.ActivityLog(EventTypeSELECT 1;

    SELECT @LogID SCOPE_IDENTITY();

    -- do other work

    UPDATE dbo.ActivityLog
    SET EndDate CURRENT_TIMESTAMP
    WHERE LogID @LogID;

    The unique constraint makes it slightly easier for the engine to locate the row to be updated, but the clustered index could still be placed on the StartDate and/or EndDate columns to provide useful searches on date ranges (with the same monotonically increasing property as the IDENTITY column), and a primary key could still be defined on other columns.  I am sure Celko is rolling his eyes right now, though he has probably not even read this far.*

    So back to the basics, am I telling you to not use IDENTITY columns?  Absolutely (and hopefully obviously) not.  Just showing that before blindly applying an IDENTITY column to a table, think about what it means and why you are using it.

    * P.S. I don't want to debate the merits of using an IDENTITY column or otherwise surrogate key (such as NEWID() or NEWSEQUENTIALID()), vs. using a natural primary key.  This comes up with Celko a lot - he is completely against any kind of system-generated identifier that can't be physically verified.  When asked a pointed question such as, "what do you think Barnes & Noble uses as the primary key to identify a customer or an order?" he disappears, never to appear in the thread again.  So while he is passionate about his convictions, he has no desire to defend them.

  • Injection is not always about SQL

    I think anybody even remotely involved with databases these days has seen the xkcd comic about Bobby Tables : http://xkcd.com/327/ 

    Basically, the comic warns against SQL injection, and reminds you to sanitize your database inputs rather than blindly append incoming data to queries.  There are more elaborate discussions about this concept in these articles on MSDN, and of course your favorite search engine will have many results as well:

    Books Online : SQL Injection

    MSDN Mag : Stop SQL Injection Attacks Before They Stop You

    It is important to remember, though, that not all injection attacks are intended to screw up your database or gain unauthorized access to your data.  Some vectors are a little more subtle, and while the damage isn't always as severe, it is still something you need to be aware of to prevent fraudulent activity or even just to keep egg off your face.  I'll illustrate with two examples:
     

    Letting customers dictate your pricing

    I remember several years ago, I demonstrated to an online shop why they shouldn't add items to a shopping cart by passing (among other things) price information in the query string.  I showed them how I was able to buy a bunch of T-shirts at $0.01 each, and encouraged them to verify whether my cart would have gone through like that.  As it turned out, it would have.  For pointing out this fundamental flaw, they added a free t-shirt to my order, and eliminated the shipping charge.  I am curious how quickly they would have caught this - or if some folks even got away with it unnoticed - had I not pointed it out.  I think it would be really hard to trace back and prove the buyer to be at fault.
     

    Letting customers change your messaging

    This morning, fellow MVP Simon Sabin (@simon_sabin) pointed out that Toyota Ireland had a similar flaw in a site used for their recent recall issues.  Basically you could change their message to you .  Now arguably this doesn't really harm anyone (because you are only able to change your own messaging), except it was fun to get a screen shot of something like this before they fixed the exploit (remarkably quickly, I might add):

     



    click to embiggen


    Summary

    So just as a reminder, when you're accepting input from users, validate and sanitize!  And when you're accepting input from your own site, make sure it really comes from your site - there is no reason to expose contextual information in the query string, as this just invites users to tinker.  Unless you have a really good reason to use QueryString parameters, protect your data and variables by passing them via post or session.


  • Connect Digest : 2010-02-06

    Upgrading a database with read-only filegroups

    Earlier today, I complained that I should be able to upgrade a database with read-only filegroups.  In this case an upgrade from, say, 2000 to 2005 is blocked because the engine can't update the system schema / structure.  Since I am trying to protect the data, not the system schema, I think it should be okay that the engine ignores the read-only flag during the upgrade process.  The workaround is rather cumbersome : get exclusive access to the original database, mark the filegroup as read/write, backup, restore, then mark the filegroup as read-only.  Blecch.

    #531630 : Should be able to upgrade a database with read-only filegroups

    In testing this scenario against different versions of the engine, I also came across a couple of bugs in SSMS 2008 SP1 dialogs that deal with filegroups and the read-only flag:

    #531643 : SSMS : issues scripting the disabling of read-only flag

    #531645 : SSMS : cannot create new database with read-only filegroup

     


    Updating statistics during RESTORE

    Steve Jones wants to see updating statistics to be an optional part of a RESTORE operation:

    #529760 : Update Statistics as Part of a Restore

     


    FILEGROWTH settings

    NULLgarity wants filegrowth settings to be inherited from model instead of defaulting to 1MB when not specified.

    #531593 : Change default behavior for FILEGROWTH argument in CREATE DATABASE statement

     


    Sorting

    There is a lot to be said for sorting.  The other day I requested that the "Object Explorer Details" column header choices be ordered alphabetically instead of mosaically:

    #531267 : SSMS : order Object Explorer Details column header context menu alphabetically

    And a long time ago, NULLgarity also asked for the ability to sort grid results by clicking on column headers.  Ideally we would just get Excel integration into the results pane (I asked for that in #524769), but this would be a good consolation prize:

    #127045 : "Results to Grid" Grid should be sortable by clicking column header

     


    Bad error messages

    And the other day, I highlighted several Connect items when I discussed some of the poor and/or vague error messages that come out of SQL Server:

    When bad error messages happen to good people

     


    Connect URLs

    As for Connect itself, you may have noticed the shiny new URL format for Connect items, e.g.:

    https://connect.microsoft.com/Connect/feedback/details/531385

    This overcomes the severely limited URL format that existed previously:

    https://connect.microsoft.com/Connect/feedback/ViewFeedback.aspx?FeedbackID=531385

    This format was not only far too long to be practical for services like twitter, but it also drove people nuts in newsgroups or e-mail, because it would often wrap and break.  Go vote for that item (either link will work!), as it is encouraging them to come up with an even shorter and more intuitive URL format that will do away with the need to use bit.ly and the like to shorten Connect item URLs.
     


  • When bad error messages happen to good people

    Over the years I have largely been amused by the variance in error messages that come out of SQL Server.  Some are very verbose and some even border on provide too much information, but the ones that irk me are the ones that leave you scratching your head.  I wanted to point out a few of these, and ask if you have any misleading or unhelpful error messages that you see a lot?
     



    Msg 8152, Level 16, State 14, Line 5
    String or binary data would be truncated.

    What string or binary data? Could you be a bit more specific?  This error message could be a lot more helpful.  For example, it could tell me the column name that is rejecting a string value, the value that exceeded the length, and the data type of the column.  Since the message could be bubbled up from a trigger or otherwise not related to the table in the statement I am executing, the schema name and table name would also be useful.  Something like:

    String or binary data would be truncated.  The value "wow that was a bad one" exceeds the size of the column foo in table dbo.bar (nvarchar(15)).

    This issue keeps getting deferred.  In Connect #125347 (also see #339410), we were told "too late for 2005, we'll fix it in 2008!" and then, "too late for 2008, we'll fix in 2008 R2!"  Now the CTP of R2 is in our hands and it remains unfixed.
     



    Msg 10735, Level 15, State 1, Line 3
    Incorrect WHERE clause for filtered index 'c' on table 'dbo.splunge'.

    What is incorrect about it?  There are several restrictions with the use of filtered indexes, but I spent a bit of time trying to figure out which rule I had violated.  This specific message comes from the following code:

    CREATE TABLE dbo.splunge
    (
        foo INT,
        bar INT
    );
    GO
    CREATE INDEX
    c
        ON dbo.splunge(foo, bar)
        WHERE foo + bar > 1;
    GO

    I have learned elsewhere that you apparently can't reference more than one column in the WHERE clause of a filtered index -- but I certainly did not get this from the documentation.  *Something* has to help me out here, no?  This is documented in Connect #341891, but it is closed as "by design."
     



    Msg 16943, Level 16, State 4, Line 59
    Could not complete cursor operation because the table schema changed after the cursor was declared.

    What table schema changed?  This occurs on the line:

    FETCH NEXT FROM @cursor INTO @variable;

    So it could be any of the table(s) mentioned in the DECLARE statement for the cursor, or any table referenced within work performed by the cursor.  How do you start debugging this one?  It turns out that the developer was lazy with the cursor declaration.  Typically we use:

    DECLARE @cursor CURSOR 
      LOCAL FORWARD_ONLY STATIC READ_ONLY 
      FOR ...

    But in this case, the developer simply used:

    DECLARE @cursor CURSOR
    FORWARD_ONLY
      FOR ...

    Changing it to our usual implementation made the error go away (though maybe it was just coincidence, since the error was intermittent).  The only mention of this one I could find on Connect was in the comments for #278763.
     



    Any error message that indicates a name without schema

    This involves both the message itself, and the error header which sometimes indicates which module raised the error.  I assume the change would be simple - just pass schema_name.object_name into the token, as opposed to just object_name.  Some examples include:

    Msg 16915, Level 16, State 1, Procedure y, Line 14
    A cursor with the name 'c' already exists.

    Msg 16905, Level 16, State 1, Procedure y, Line 19
    The cursor is already open.

    Msg 16916, Level 16, State 1, Procedure y, Line 21
    A cursor with the name 'c' does not exist.

    Msg 2714, Level 16, State 3, Procedure y, Line 3
    There is already an object named 'y' in the database.

    Now picture a case where I have tables dbo.y and foo.y, and stored procedures bar.y and splunge.y.  While those aren't exactly optimal architecture choices, the system is making it even harder for me to correct them.  I complained about this problem in Connect #525308.
     


     
    System.Data.SqlClient

    SqlException: An error has occurred while establishing a connection
    to the server.  When connecting to SQL Server 2005, this failure may
    be caused by the fact that under the default settings SQL Server does
    not allow remote connections.
    I saw this error last week from a web app that was connecting to a SQL Server 2008 instance.  The mention of "SQL Server 2005" in the error message had people scrambling, trying to figure out if someone "downgraded" their instance to 2005 without telling anyone, and when they were assured that this did not happen, looking through their config files to see if any of their connection strings had changed to point to an old server.  It turned out to be a simple network error, but I think the error message wording could be improved to be version-agnostic (since it can't connect, it really has no knowledge of what version is on the other end).

    While Buck told us via Connect that this was fixed in April of 2008 (see Connect #296637), I am still seeing it in the latest version of SQLNCLI.
     

     
    Msg 0, Level 20, State 0, Line 0
    A severe error occurred on the current command.
    The results, if any, should be discarded.

    There are dozens of Connect items reporting this nondescript error, and they are all raised by various circumstances (my favorites are #411154, #530712, #406332, and #480713).  In all honesty, I envision the code in the engine goes something like this:

    try { }

    catch(e)
    {
      if (e.number IN (x,y,z))
    {
        // raise meaningful error
      }
      else
    {
        // it must have been severe! Say something scary and generic!
      }
    }


    Now in reality some of these errors are not as easy to fix as I'm making them out to be, and in some cases they aren't even in the engine but rather in external components. Still, I think they are some good examples of error messages that Microsoft as a whole could work on improving.
     

  • Your laptop may be ready for SSDs, but are your SQL Servers?

    Brent Ozar (blog | twitter) recently made some comments about the FusionIO SSD drives.  Basically, he was able to break three drives in a row - simply by doing load testing against them (using SQLIO).  The symptom is simple: the drives go offline and disappear from the O/S, and need to be physically pulled from the machines.  Kind of scary.

    I am not trying to be Debbie Downer here ... SSDs sound great, and my next MacBook will definitely come with one.  I am sure they are not far from production excellence, and the I/O performance they offer -- especially in shops like ours, where we are I/O-bound -- will be world-shattering.  But right now, if you are looking at expanding or upgrading your I/O under SQL Server, I'd give the vendors some time to shake off these early jitters.  Hopefully they will come back with a quick fix or even an explanation that there was a misconfiguration of some kind ... since I know others are already using these drives in production, and haven't come across similar issues.


  • Management Studio Tips & Tricks, version 1.0

    Today I presented "Management Studio Tips & Tricks" at SQL Saturday #34 in Waltham, MA.  I had an audience of close to 100 (my biggest talk yet!), and unless they were stroking my ego, I achieved my primary goal for the session: to make sure that every single person in the room learned at least one new thing about SSMS.

    I learned some things today too:

    1. I am still horrible at repeating audience questions before answering them.  I know before and after delivering a presentation that it's exactly what you should do, and understand that there are several reasons to do so (not all of which benefit the audience).  But in the heat of the moment, I'm usually spending too much time making sure I answer the question right, when I should be ensuring that the others have heard it.
       
    2. No matter how much you prepare a presentation (and especially the first time you offer the topic), there will always be things you forgot to include.  Today I talked about templates as an easy way to help encourage specific coding conventions and naming standards, but Tim Ford (@sqlagentman) reminded me afterward of one of the more direct features that you can utilize via templates: CTRL+SHIFT+M.  If you haven't used it, this pulls up a dialog allowing you to substitute parameters in your script, without having to do it by hand.  The parameter placeholders in your script have to match a specific format (some info about this here).
       
    3. I am a part of a fantastic community.  At events like this there is always great interaction, both with other MVPs and with the audience in general.  Unfortunately I could not commit to the entire day, but aside from my session I attended both Tim Ford's talk on DMVs and Mike Walsh's session on tuning T-SQL.  While I was more of a heckler than a contributor, I kind of felt at home in every session, and am looking forward to the next one.

    I'll be giving the talk at SNESSUG in March, so I'll have plenty of time to work on my slide deck, and will hopefully take what I learned today into that presentation.  In the meantime, you are welcome to download the current version of the deck (I exported to PDF).  Some of the bullet points only make sense in the context of the talk, but several of the slides have speaker notes that might help elaborate.  Please let me know if you have any comments, questions or suggestions!

    UPDATE: I just signed up on SpeakerRate.  If you attended this presentation, please feel free to leave me comments, either negative or positive!  Thanks!

    http://speakerrate.com/talks/2075-management-studio-tips-tricks

    UPDATE 2: A screen shot demonstrating the multi-object action capabilities of Object Explorer Details.  Click to embiggen.

     


  • Sometimes it's the small things : match column names in subqueries

    The behavior of column matching in subqueries is a little peculiar, to say the least.  If you've been bitten by this behavior once, you're unlikely to have been bitten a second time, but for some of us it just takes a while to sink in.

    This morning I wasted a good five minutes "troubleshooting" a query that wasn't working out the way I thought it should.  I have a core table called Users (simplified for brevity):

    CREATE TABLE dbo.Users
    (
      
    UserID INT PRIMARY KEY,
      
    -- ... other columns ...
    );

    Against this table, I was writing a custom report for a specific set of users, so I wrote something like this (again, simplified for brevity):

    DECLARE @u TABLE (id INT PRIMARY KEY);

    INSERT @u(id) SELECT UNION ALL SELECT 2;

    SELECT UserID FROM dbo.Users
      
    WHERE UserID IN (SELECT UserID FROM @u);

    I ran the query, and of course got all of the users in the original Users table.  Why?  Because in the table variable definition, I got really lazy about the name of the ID column, and then forgot that I had been lazy when I referenced it later.  In most cases this won't have any effect, but in a subquery, if the column is not found in the objects referenced there, the parser jumps scope and tries to bind to a column in the parent query.  If it finds none, you get an error (invalid column name); if it finds more than one, you get a different error (ambiguous column name).  But when it finds exactly one, then I may as well have written:

    SELECT UserID FROM dbo.Users
      
    WHERE UserID IN (SELECT UserID FROM dbo.Users);

    -- which of course becomes:

    SELECT UserID FROM dbo.Users;

    Note that if @u had no rows, then the outer query would have also returned no rows.  But all it needs is one row to satisfy the condition for every row in the outer table... and in some cases, this can be more difficult to troubleshoot than my simple example here.

    It is clear I'm not the first person to get tripped up by this behavior, but they aren't going to fix it:

    #126785 : Subquery error not causing main query error

    #124775 : Column confusion with IN

    #499463 : wrong results from Temporary table in a subquery

    According to Microsoft, this behavior is "correct."  If it were me, I would "fix" the language rules so that a subquery that is not explicitly correlated to an outer query, should be able to be parsed and executed on its own without error.  If it can't, then it shouldn't parse as part of a larger query, either.

    But that's just my opinion.  While I understand that the current behavior aligns with the T-SQL rules, that doesn't mean that the T-SQL rules make a whole lot of sense in this case.

    In the meantime, if you have a situation where a subquery with IN() is not acting like the filter you expected, check your column names before you start re-writing your query and pulling your hair out.


  • SQLFool has updated her index maintenance scripts and needs testers!

    Michelle Ufford (@SQLFool) has just announced some shiny new updates to her popular index maintenance scripts.  Read about them and contact her if you're interested in beta testing:

    http://sqlfool.com/2010/01/index-defrag-script-updates-beta-testers-needed/


  • Decrypting : A question of morals, ethics, or both?

    In essence, encryption of code within the engine is more like obfuscation than anything else.  Unlike passwords and other sensitive data, the engine has to be able to read the code, so there has to be a way to reverse the so-called "encryption."  While the engine inherently has the ability to do this internally, there is no direct way for us to do it ourselves out of the box (so if you play with the WITH ENCRYPTION option, be warned; you need to ensure you keep a copy of your module code in source control).  Obviously decryption methods exist: some of which are free, and some you have to pay for - such as SQL Prompt Pro from Red-Gate:

     

    I won't point you to the free ones, but will simply state that anyone with a search engine can find them.

    Personally, I haven't used them, but I haven't been put in a situation where I felt I needed to.  Others have.  Last week on twitter there was a brief conversation on whether or not it was "okay" to look at a vendor's encrypted stored procedure; the reason: poor performance.  With the assumption that the vendor was either non-responsive or did not have a good answer, I suggested that this would be okay in order to learn for yourself the potential cause of the performance problem - not necessarily so you can fix it yourself, and not necessarily to say to the vendor, "I told you so," but it might actually yield information about a problem in your own part of the system (e.g. a missing index or a poorly networked linked server).

    One of the comments @Dave_Levy made was:

    "I guess this is one of those areas where people willing to cut corners can get ahead of the rest of us."

    I wasn't sure exactly how this constituted "cutting corners."  If the vendor is not owning up to the issue and taking responsibility for their code, and as long as you're not learning from the code in such a way to compete with them (or to replace their solution with your own), then personally I think it is within reason to use the tools at your disposal to get to the bottom of the issue.

    Later in the discussion it turned out that this peeking would be a direct violation of the licensing agreement with the vendor (which is probably the case in most situations).  The software vendors who sell products that use encrypted stored procedures know that it is not protection of their intellectual property, but rather a deterrent for the casual observer.  So encryption is just a minor roadblock slowing down a determined user; the way they really enforce the protection of their IP is through their EULA.

    So it is quite a fence to be sitting on : suffer the performance problem, or try to get to the bottom of it? wait for the vendor to fix it (or to refuse to), or arm yourself with enough information to argue with them? stand by your morals? switch vendors?  If what you learn from inspecting the code is that the vendor should have no business putting code anywhere near your systems, then that's something, right?

    I realize we all want to be loyal to our vendors, but if they aren't willing to work with us, our only other alternative seems to be to either violate the EULA or uninstall the software; both options have their pros and cons from both sides.  I'm curious what others' opinions on this are?


  • Speaking at SQLSaturday #34 in Boston, 2010-01-30

    Just a quick reminder that I will be presenting a Management Studio Tips & Tricks talk at SQLSaturday #34, this Saturday, January 30th at the Microsoft offices in Waltham, Massachusetts.  I go on at 9:30 AM.  There are plenty of other great speakers, lunch will be provided, and best of all, the event is free.  There are still openings so if you are going to be in the area and haven't registered, now is the time!

    You can find out more about the event, review the schedule, and register at the SQL Saturday site:

    http://sqlsaturday.com/34/eventhome.aspx

    I will try to get slide decks posted here by Friday afternoon, but I make no promises.


  • Connect Digest : 2010-01-22

    Give us easier to read execution plans

    Michelle Ufford (@SQLFool) recently asked for help pinpointing the most expensive node(s) in a complicated execution plan.  Mladen Prajdic (@MladenPrajdic) has a useful workaround; he coded up a quick query to parse the showplan XML and order results by cost descending.  The Connect item that would make this workaround unnecessary was filed by "Ewan1":

    #477390 : Rank cost of graphical execution plan components in SSMS


    Give us more SARG intelligence in the optimizer

    As Adam Machanic (@AdamMachanic) lays out in a recent blog post, the optimizer is getting better at this with each new version, but in this Connect item, Rob Farley (@Rob_Farley) correctly points out that there are a lot of places where non-sargable arguments could obviously (and automatically) be made sargable.

    #526431 : Make more functions SARGable


    Give us more Agent smarts : system procedures to avoid ad hoc queries

    Dave Ballantyne (@DaveBally) blogged that SQL Agent is still very stubborn about submitting ad hoc queries all day long when, in reality, there should be a whole bunch of procedures in msdb to satisfy these queries (and prevent unnecessary ad hoc cache bloat)  He also filed this Connect item:

    #526485 : dm_exec_cached_plans Bloat

     
    Give us more power to work with data in the results pane

    After some back-and-forth with Buck Woody (@BuckWoody), I filed this suggestion, which asks for the ability to embed Excel in the results pane of Management Studio.  This would allow us much more immediate analysis of results, with fewer steps.

    #524769 : SSMS : Ability to embed Excel in results pane

     
    Give us better error messages

    Since I work with multiple schemas, I am coming across more and more error messages from the engine that were obviously written at a time when everything was owned by dbo.  As the product gets more complex and the schema model takes hold, this will need to be corrected.

    #525308 : All error messages that include object name need to also include schema name


    Give us more predictable function performance

    Simon Sabin (@Simon_Sabin) and Andrew Novick had similar suggestions: to finally fix the abysmal performance of UDFs that has been present since they were first introduced to the product.

    #524983 : User defined function performance is unacceptable

    #273443 : The Scalar Expression function would speed performance while keeping the benefits of functions


    Give us more complete metadata about stored procedures

    Greg Low (@GregLow) came up with an interesting suggestion about providing better metadata about stored procedures (a "contract").  I don't agree with all of Greg's ideas (see recent threads here and here), but in general I agree that the metadata for stored procedure interfaces could be exposed better - providing benefits all around, but especially in environments where a consistent standard is enforced (e.g. resultset shape is not only deterministic but also remains consistent even when inputs do change).

    #525653 : Stored procedures should expose detailed contracts

  • Mis-steps in the publication of Cumulative Updates

    It used to be very difficult to obtain hotfixes for SQL Server (sometimes even to learn about their existence), and they were often unsupported.  They have made extremely great strides in this area, and in general, I find the new procedure much more convenient : just go to the KB article, select the fix(es) you want, and you get an almost immediate e-mail with download links and brief instructions.  No longer do I have to raise an issue with customer support and prove to them that I am both affected by the issue and responsible enough to handle the patch correctly.  But there are still some holes in the process.

    1. Trying to be "smart" about platform

    The hotfix download page determines your platform and language and offers you the patch files for your current local environment.  While this might be a good idea in some cases (such as client patches), for SQL Server this makes no sense at all.  This methodology assumes that I am downloading patches only for my immediate machine, when in reality I am usually downloading for a variety of instances, many of which are not even on the same subnet as my workstation.  Add to this the fact that in my experience the code just doesn't work - this is a pure x64 machine, and yet because I am using Firefox, the download page for every cumulative update thus far has limited my view to the x86 files.  It is only a minor piece of extra work to get access to the x64 files I'm really after (simply click on the "Show hotfixes for all platforms and languages" link), but ideally I'd like to see a setting where (via a cookie, or LiveID, or something) I could say "always show all platforms" or "always show both x64 and x86 files."  Or take away the "smarts" that hide the other platforms from me in the first place - presumably this is just a way to avoid actually documenting the files in sections, so silly people don't download files for the wrong platform.

    2. Publication of correct files and filenames

    While the screen shot will only highlight the repeating spelling error ("cumlative" vs. "cumulative") that has afflicted the last few cumulative updates, there was a pretty serious mistake in this most recent CU where the SP1 download actually contained the patch for RTM.  Now a lot of people are extracting the file, trying to run it against their SP1 instances, and it finds there is nothing to update.  Rather than assume a simple case of mistaken identity, many will assume that either the patch or their environment is broken.  Right now you'll notice in the screenshot below that the core file has been pulled from the download page, so they are fixing the problem - in fact it has probably been corrected by the time you are reading this.  But why was it wrong in the first place?  Are the download packages not tested even once before they are published?

    3. Being vague about the included files

    There are always several files in each CU download, and they are not explained anywhere either on the download page or the original KB page.  What is a "SapBi" or "RB2ClickOn" file?  Do I need it?  Who knows? I really believe that for each CU they need to have some section on the download page describing the use case where you would want to download any of the smaller files individually.

    The proof

    The proof is, as they say, in the pudding.  You can see the highlighted areas in the screenshot from the points above (click to embiggen):


    What to do?

    Without direct interaction with those involved, I am not sure what else we can do to make corrections to these processes.  The feeling of a CU is already inherently "rushed"; then when the files are published clearly without testing or attention to detail, that rushed feeling is exaggerated.  I have commented about this problem on Connect and in a previous bog post.  I've also stressed the need to fully document CUs before releasing them (but somehow left out the "testing" part).  After today's incident, it is obvious there are still some items that have room for improvement.

    At least from the outside, they look like relatively simple things to fix.  Fixing them would lead to a lot less confusion among the customer base, and a lot more confidence in the general approach and attitude toward releasing updates.  A lot of that is just about better communication.  For example, while they were fixing this problem, they did not make any announcements whatsoever; they simply pulled the core files from the download page.


  • "What three events brought you here?"

    I was tagged by Tim Mitchell (twitter) in a recent meme launched by Paul Randal (twitter), entitled, "What three events brought you here?"

    Well, I've lied told stories about this in the past.  If you've read my previous posts about my career (in particular, this one), you'll know that I'm Canadian, gave up NHL dreams early, took Economics in College, started earning a living with computers in desktop publishing and later web design, and soon after graduating moved to Rhode Island (where I still live to this day).  It will be tough to pick out three solitary events along that timeline that dictate who I am and what I'm doing now, but I'll try.


    No, this is not a young me, but did I fool you for a second?

    My first real computer

    In college, I overpaid (and I mean really overpaid) for a used 486 from a "friend" who I thought was giving me a good deal.  Mind you this was before Dell and Gateway, and before online stores, when you could only buy computers directly from the local computer nerds or Radio Shack.  It was a pretty crappy computer, even at the time.  Still, it really sparked my enthusiasm for using a computer for good, and not for evil; to stop doodling silly Don Martin faces on paper, and start doing it "for real" (albeit digitally) in Photoshop.  The turning point in my career was when I realized I could get ahead in school not by getting smarter, but by making things look better: in a lot of cases, it's all about presentation.  The computer seemed to me to be an easy way to bypass all the hard work, and get by on (mostly) looks alone.  Now in my current position, my peers and managers can see through all that kind of fluff, so it doesn't fly anymore... they want substance, and can tell when they aren't getting it.  But as a kick-starter to how I became marginally tech-savvy, it certainly got me on my way.

    The dawn of the Internet

    Do you remember Geocities?  I do!  I built my first homepage there.  It was predictably ugly with a grey background, blue underlined links and a literal puke pile of animated Simpsons logos - I believe there was background sound as well.  (I know this paraphrase came way later, but imagine a web page saying to you, "Hey Marge, did you know they have the Internet on computers now?")  This in itself wasn't the event... but my amazement with the ability to instantly communicate with people all over the world, even if I was bragging about something as meaningless as my latest Tetris scores, was the start of something huge.  I jumped from normal desktop publishing to web design and eventually, as my previous recounts detail, this landed me a job here in Rhode Island.

    My first professional database

    One of the first projects I worked on when I moved down here was an e-commerce site for a condom company (Global Protection Corp. who, it looks like, hasn't updated their web site since 2003).  We did all of the scans and photographs for the product gallery, so we received boxes and boxes of this stuff.  (Sadly, much of the free product went unused, but it was fun nonetheless.)  Anyway, free lubes and condoms didn't have any direct impact on my career, but that first taste of data modeling certainly did - I still remember thinking I had "graduated from spreadsheets," and can still feel the sense of accomplishment when the web site was taking orders successfully and not falling over in the process.

    I'm going to be nice, and not tag anybody. This meme has already made the rounds and I think I'd have a hard time finding anyone who hasn't already been tagged anyway.

  • New Cumulative Updates for SQL Server 2008 RTM & SP1

    Late last night, the SQL Server Release Services team announced new cumulative updates for both SQL Server 2008 RTM and SQL Server 2008 SP1.

    There are too many fixes to mention, but I once again notice the trend that the "newer" branch (SP1) has more fixes than the RTM branch.  I suppose they are serious about their commitment to end support for SQL Server 2008 RTM on April 13th of this year.  If you haven't already planned upgrading your instances to SP1, it is definitely time to start thinking about it.

    Cumulative Update 6 for SQL Server 2008 SP1 (KB #977443) brings you to build 10.00.2757.

    Cumulative Update 9 for SQL Server 2008 RTM (KB #977444) brings you to build  10.00.1828.

    As a note, these CUs once again contain a wide variety of files.  The point of these files is to allow you to download and apply only the SNAC or Report Builder updates, which are smaller files.  If you are not sure that you only need an individual update, then your best bet is to always download the file "SQL_Server_2008_<SP1>_Cumulative_Update_<#>" for your platform - the installer will figure out which components are eligible for the update.  I wonder when the process that creates the files is going to be corrected so it spells "cumulative" correctly?

    You can see the official announcements here and here.


More Posts Next page »

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement