THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Aaron Bertrand

Aaron is a Product Evangelist for SentryOne, makers of Plan Explorer and a performance monitoring and event management platform for the Microsoft Data Platform and VMware. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features, and also blogs at blogs.sentryone.com and SQLPerformance.com; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at major conferences, user group meetings, and SQL Saturday events worldwide.

Bad habits to kick : relying on undocumented behavior

In my last post in this series, I talked about the common habit of creating an IDENTITY column on every single table.  Today I want to talk about a more broad concept: relying on undocumented (and therefore probably undefined, and certainly far from guaranteed) behavior and objects.

 

ORDER BY in a view

This is probably the most infamous of all SQL Server undocumented behaviors.  In SQL Server 2000, users learned to create views with a built-in ordering, such as follows:

CREATE VIEW dbo.MyView
AS
    SELECT TOP 
100 PERCENT a,b,c
        
FROM dbo.MyTable
        
ORDER BY c;

The problem is, they have assumed that if they then issue...

SELECT FROM dbo.MyView;

...the results are guaranteed to come back ordered by c.  In reality, the order of the results of such a query is arbitrary, since the behavior is undefined.  Without an ORDER BY on the outer query, SQL Server is free to return the results in any order it chooses (more about this later).  To dissect the syntax in the view a bit, I want to explain that this circumstance is a symptom of a poor design choice in the T-SQL dialect -- the way TOP was implemented, the ORDER BY can now serve two functions: to determine the rows included in the result, and to indicate the order of the result.  Which is why you sometimes see the following logic to get the top 10 finishers of a race, with 10th place listed first (well, let's ignore ranking functions and the OVER() clause for now):

SELECT RunnerID, [time_in_seconds] FROM
(
    
SELECT TOP (10) RunnerID, [time_in_seconds]
        
FROM dbo.RaceResults
        
WHERE RaceID 15
        
ORDER BY [time_in_seconds]
AS x
ORDER BY [time_in_seconds] DESC;

Going back to the view, the TOP 100 PERCENT is required in this case because the ORDER BY is only supposed to be used to determine the rows included in TOP, not to guarantee the order of the results.  The syntax for a view allows you to include an ORDER BY clause *only* if you also include a TOP clause.  In the 2000 version of the optimizer, it just so happened that you could depend on the fact that in this case, if you issue a SELECT against the view without an ORDER BY, you would in fact get the results in the order defined by the ORDER BY within the view.  In SQL Server 2005, changes to the optimizer broke this behavior, and this was considered a bug by a large segment of the SQL Server community -- even though the behavior that people were relying on in previous versions was not guaranteed; it was more or less a coincidence.  One rather large customer of Microsoft, with plenty of leverage against the SQL Server team, made such a fuss about this change in behavior that a trace flag was introduced to instruct the optimizer to obey the ORDER BY in the view when returning rows.  (You can get a lot more information about this trace flag in KB #926292.)  Should you use this trace flag?  In my opinion, no -- I think the squeaky wheel was just too lazy to fix their application code so that it properly dictated the order in which it wanted its results.  Should you be aware of what can happen when you rely on undocumented behavior, and that behavior changes?  Absolutely.

 

Ordering without an ORDER BY 

Aside from the view case above, there is a more general myth out there that SQL Server will always return rows in a specific order - the myth varies from person to person, but it is either the order of the clustered index, or the order of the IDENTITY column, or the chronological order of insertion.  (Similarly, the assumption is that using TOP without an ORDER BY should have a predictable meaning; as with ordering results, it does not.)  Often people come to this conclusion because in a limited sample size, that's the behavior they "always" see.  I want to make it quite clear: ordering is arbitrary unless you use an ORDER BY clause.  You should never, ever, ever rely on the ordering you observe in a query without an ORDER BY -- and you should only issue a query without an ORDER BY clause if you truly do not care what order the results come back.  In such a case, you may as well imagine that the rows are going to come back in a different, random order each time, even though that is not truly the case (random has a meaning completely separate from arbitrary, but like I said, just imagine). 

As I mentioned earlier, in the case where you haven't told SQL Server how you want the results ordered, the optimizer is free to find the most efficient way to return the results to you, which can mean any ordering whatsoever.  Typically you will see a predictable ordering, but it might not be the one you expect (for example, it may happen to use a non-clustered index to sort).  And even if it is currently the one you expect, the next time you run the query, the choice the optimizer has made can change due to a variety of events, including but not limited to: statistics updates, data changes, statement or module recompiles, query/table hints, forceplan, plans dropping out of the plan cache, service packs, cumulative updates, engine upgrades, and even detach/attach or backup/restore to a different server at the same build level.  So quite truthfully, you could run such a query right now, and 5 minutes later run it again, and get the results in a completely different order.  This is the quintessential definition of relying on undocumented and undefined behavior, and I strongly recommend you stay away from it.  If you want some proof, go and check out Alexander Kuznetsov's post in his Defensive Programming series, entitled, "Without ORDER BY, there is no default sort order."

 

Short-circuiting

I see a lot of cases where people expect SQL Server to short-circuit their clauses -- basically, read the conditions from left to right (or top to bottom), and when any of the criteria returns false, don't bother evaluating any subsequent conditions.  An example is where you have chosen to store DATETIME data in a VARCHAR column, then try to evaluate some DATETIME aspect -- but only after determining which values really are dates (this pattern of poor data type choices is another really common bad habit, by the way).  So you see queries like this:

CREATE TABLE dbo.foo(bar VARCHAR(20));

INSERT dbo.foo(barSELECT '20090201'
 
UNION ALL SELECT 'last week'
 
UNION ALL SELECT 'next Tuesday';

SELECT bar
FROM dbo.foo
WHERE ISDATE(bar) = 1
AND DATEPART(MONTHbar) = 2;

DROP TABLE dbo.foo

The result:

Msg 241, Level 16, State 1, Line 6
Conversion failed when converting date and/or time from character string.
The truth is, SQL Server is free to evaluate the conditions in any order, so it doesn't necessarily have to first check the result of ISDATE() before trying to treat the data like DATETIME values.  And like with the ORDER BY cases above, just because you observe a specific behavior today, does not mean you will see the same behavior tomorrow -- many things can change the way SQL Server processes a query.  So how do you get around it?  Well, there are a few ways.  The most obvious one in this specific case is, using a DATETIME column to store DATETIME data.  But this isn't always possible, and it is not always the issue in this scenario anyway -- it was just the handiest example I could come up with.  The next thing people try, is to use a sub-query to first return only the rows that have DATETIME values:
SELECT bar
FROM
(
    
SELECT bar
        
FROM dbo.foo
        
WHERE ISDATE(bar= 1
AS x
WHERE DATEPART(MONTHbar) = 2;

The result: same error (and same happens with a CTE).  It's like the optimizer is reading your mind and knows you're trying to trick it; it still sees that there are rows in the table that would not pass the DATEPART check, but is evaluating things in the wrong order.

So how do we get this "short circuiting" to really work?  The CASE expression is your friend (and thanks Alex for the improved syntax suggestion):

SELECT bar
   
FROM dbo.foo
   
WHERE CASE WHEN ISDATE(bar) = 1
        THEN DATEPART(MONTHbar)
    END = 2;

I'm sure I've seen this solved in other clever ways before, and that someone will remind me how to really make this short circuit, but on a Monday night, this is the only workaround I know is guaranteed to work.  In some cases, SQL Server *will* short-circuit (see the comments from Nigel Ellis about 2/3 through this TechNet chat transcript), and you can see some working examples from Mark Cohen.  My point, though, remains: feel free to take advantage of these optimizations, but do not rely on them.



sp_who2

You're probably thinking, wow, that's a strange one to mention.  sp_who2 is undocumented and unsupported?  Absolutely.  Try to find sp_who2 in any current version of Books Online.  I reminded them that there was a reference to sp_who2 in the 2005 version of Books Online in Connect #207997 back in September of 2006.  Shortly thereafter, the last remaining trace of this procedure was removed.  I also asked for a supported and documented version in March of 2007, in Connect #264681.  Don't hold your breath, because all sp_* procedure development has been abandoned (which is why a lot of the sp_help style procedures haven't been updated to reflect new 2008 features).  I am also not trying to scare you into believing that sp_who2 will suddenly stop working in the next release or service pack - I think it is pretty much here to stay.  But with the advancements in performance tuning and troubleshooting, I question if it is of any value to use these days at all.  Instead, there are many alternatives.  Adam Machanic has written a great replacement with more flexibility and much more useful information than the new stuff (sp_WhoIsActive).  I also documented a few approaches to querying the DMVs for certain information in Chapter 29 of SQL Server MVP Deep Dives, entitled, "My Favorite DMVs and Why."


sp_MSforeachdb

This system procedure creates a cursor, loops through all of your databases, and performs some action against each of them (for some background, you can see this Database Journal article).  In one of my systems (2005 SP3), I was able to reproduce many times a scenario where a random number of rows would come back from sp_MSforeachdb.  It seemed to occur more frequently during our peak load times, but nonetheless, that got me spooked.  I looked at what it was doing and, not surprisingly, it is actually pretty simple if you discard all of the security stuff (you should only be running this kind of code for admin / maintenance functionality anyway, IMHO).  I was surprised even in 2008 R2 to still see this code in the procedure:

EXEC(N'declare hCForEachDatabase cursor global for select name from master.dbo.sysdatabases ...'

...the use of dbo.sysdatabases, as opposed to sys.databases, serves as further proof that there is no investment whatsoever in updating sp_* procedures to fit the new features and catalog views as SQL Server matures.  Anyway, I like to write my own versions of these procedures.  Not only does this enable me to avoid a flaky global cursor that is quite expensive, and future-proofs me against changes in their behavior (or deprecation that technically doesn't need to be announced), but it also allows me to add flexibility to the procedure, such as only executing code against a database with a certain naming pattern or in a certain recovery model, and only attempting to run on databases that are online and aren't read only.  For example:

CREATE PROCEDURE dbo.ForEachDB_MyWay
   
@cmd  NVARCHAR(MAX),
   
@name_pattern  NVARCHAR(257) = '%',
   
@recovery_model NVARCHAR(60) = NULL
AS
BEGIN
    SET NOCOUNT
ON;

    
DECLARE
        
@sql NVARCHAR(MAX),
        
@db  NVARCHAR(257);

    
DECLARE @c CURSOR
LOCAL FORWARD_ONLY STATIC READ_ONLY
        FOR
            SELECT 
QUOTENAME([name])
                
FROM sys.databases
                
WHER(@recovery_model IS NULL OR (recovery_model_desc = @recovery_model))
                AND 
[name] LIKE @name_pattern
                
AND [state] = 0
                
AND [is_read_only] = 0
ORDER BY [name];

    
OPEN @c;
   
    
FETCH NEXT FROM @c INTO @db;

    
WHILE @@FETCH_STATUS <> -1
    
BEGIN
        SET 
@sql = REPLACE(@cmd'?'@db);
        
BEGIN TRY
            
EXEC(@sql);
        
END TRY
        
BEGIN CATCH
            
-- I'll leave more advanced error handling as an exercise:
            
PRINT ERROR_MESSAGE();
        
END CATCH

        
FETCH NEXT FROM @c INTO @db;
    
END

    CLOSE
@c;
DEALLOCATE @c;
END
GO


-- simple example: print names of all DBs in FULL recovery:

EXEC dbo.ForEachDB_MyWay
   
@cmd  = N'PRINT ''?'';',
   
@recovery_model = N'FULL';

You could also do other interesting things with this construct, such as filtering on databases where log_reuse_wait = 0, or is_broker_enabled = 1, or where it is not a system database (database_id > 4).  You could also dictate an ORDER BY so that the databases you deem most important can appear at the top or the bottom of the queue, so to speak.  But most importantly, you'll be able to keep augmenting your own version of the procedure, as SQL Server adds more features and as you find new ways to want to work on sets of databases.

I asked for a supported and documented version of this stored procedure in Connect #264677, so that hundreds of DBAs out there don't have to develop their own procedure as above.  But it was promptly closed as "Won't Fix."

(Of course the same kind of logic can hold true for a similar undocumented stored procedure, sp_MSforeachtable.)

 

xp_fileexist / xp_getfiledetails

The writing is on the wall: the days of extended procedures are numbered, especially those that are undocumented.  Writing code today that relies on these modules being available in future versions of SQL Server is a little like Russian roulette.  The last thing you want to do during a SQL Server upgrade is worry about breaking code that accesses the file system.  Thankfully, there are many tutorials out there that will help you bring CLR to the rescue, in the event that you really need to perform file handling from within SQL Server.  Greg Larsen does a pretty good job of priming you to write your own xp_getfiledetails.

 

Summary

Sadly, I am just skimming the surface on features and behaviors that are undocumented and/or undefined.  There are literally dozens and dozens of undocumented stored procedures and extended procedures, and many engine behaviors that we all take for granted.  There are probably several behaviors I rely on every day that I don't realize are undocumented... it is definitely a difficult habit to avoid altogether.  But the realization that you *might* be using undocumented methods, or making assumptions based on observed and not guaranteed behavior, is often a good chunk of the battle.

Published Monday, February 08, 2010 11:01 PM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Peso said:

Relying on documented (and well-known) functions is not that safe either.

The change for REPLACE function broke our application when the behaviour for REPLACE changed in SQL Server 2008. We had the assumption the functionality was guaranteed, but it wasn't.

See REPLACE here

http://msdn.microsoft.com/en-us/library/ms143359.aspx

February 9, 2010 8:23 AM
 

AaronBertrand said:

Peso, I'm not sure I would classify that as "breaking" behavior, especially since you used the word "assumption" I think you at least somewhat feel the same... it sounds to me like they fixed a bug, and announced the change in behavior well in advance.  It's a little different with undocumented features, because they don't need to announce when those things are going to change; they can just rip them out from under you.

February 9, 2010 8:42 AM
 

Alexander Kuznetsov said:

Hi Aaron,

I think that this:

WHERE ISDATE(bar) = 1

   AND 2 = CASE ISDATE(bar)

       WHEN 1 THEN DATEPART(MONTH, bar)

       WHEN 0 THEN -1

   END

is logically equivalent to this simpler one:

WHERE CASE

 WHEN ISDATE(bar) = 1 THEN DATEPART(MONTH, bar)

END = 2

Am I missing something?

February 9, 2010 9:31 AM
 

AaronBertrand said:

Yes Alex, that's definitely prettier.  I should try blogging in the morning, and not at night.  At least my production code is mostly written while the sun is out.  :-)

February 9, 2010 9:39 AM
 

Barry S said:

Aaron,

I have also noticed some funkiness - skipped databases - while using sp_MSforeachdb and thought I was going crazy.

"...I was able to reproduce many times a scenario where a random number of rows would come back from sp_MSforeachdb.  It seemed to occur more frequently during our peak load times, but nonetheless, that got me spooked..."

I have not been able to reproduce it manually though as it seems to be random - Can you provide me a way to reproduce it...?

I am trying to convince management to allow me to re-write the code using this.

Thanks!

February 9, 2010 10:35 AM
 

AaronBertrand said:

Sorry Barry, what I meant was that I saw it quite often, enough to force me to write my own.  I don't have a reliable way to reproduce it at will.  Maybe pointing management at this blog post might help convince them?  I searched online for other people experiencing the problem as well (both recently and when I first encountered it), but could not come across anything.  I'm glad I'm not the only one who has observed the issue.

February 9, 2010 10:40 AM
 

Barry S said:

I spent much time searching for others with the same issue as well and this is the first place I saw the same issue mentioned.  I almost leaped out of my chair in excitement.  

I wonder if others can come out of the woodwork and comment on the same issue....

February 9, 2010 11:06 AM
 

AaronBertrand said:

I almost expected to see a Connect item on it, but then again it would be for nothing because it's undocumented, unsupporte and therefore ineligible for bug filing.  :-)  I've made a plea to the twitterverse and hope to rustle up someone else who has seen it...

February 9, 2010 11:22 AM
 

sqlbelle said:

I haven't noticed the skipped databases, but probably because I haven't used it as much .. Great post!

February 9, 2010 2:20 PM
 

Brad Schulz said:

Excellent excellent post.

The order of evaluation of WHERE conditions is probably the most difficult thing to accept (or explain to others), especially when they've come from another programming language (C#, VB, etc).

February 9, 2010 3:21 PM
 

Jason Strate said:

Nice items.  I've run into the "Ordering without an ORDER BY" item many times before.  Usually discovering that it was relied upon after tuning some procedures and releaseing the changes.

February 9, 2010 7:28 PM
 

Aaron Bertrand said:

In my last post in this series , I talked about some problems associated with relying on undocumented

February 10, 2010 5:49 PM
 

Aaron Bertrand said:

In my last post in this series , I talked about some problems associated with relying on undocumented

February 13, 2010 11:07 AM
 

RickHalliday said:

Good post Aaron!

The ordering issue is very important. As Brad stated it is difficult for application developers to grasp. However, as a database developer the one area where I have occassionally assumed ordering is with the quirky update. There are some caveats but as yet I have not been able to break it.

February 18, 2010 5:44 AM
 

IL said:

Unexpectedly run into not sufficient rights issue using xp_fileexist.

We have WSUS database on SQLExpress 2005 SP3 CU7 instance and it runs on behalf of NT AUTHORITY\Network service account. We use DatabaseBackup.sql script from http://ola.hallengren.com scheduled to backup WSUS database to D:\Backup\WSUSDatabase.

There was the error on xp_fileexists:

Msg 50000, Level 16, State 1, Server PDC-SERVER\SQLEXPRESS, Procedure DatabaseBackup, Line 198

The directory does not exist.

because Network service can't read any folders beyond root by default. It seems xp_fileexists stopped looking into directories beyond the root after installing CU7. Or may be at some point of time I've changed directory rights and kicked off Network service.

March 10, 2010 4:08 AM
 

Aaron Bertrand said:

Microsoft is often considered a leader, an innovator, a trend-setter. The same could be said for Apple,

December 14, 2010 8:15 AM
 

Aaron Bertrand said:

&lt;&lt;&lt;&lt; UPDATE TIP LINK &gt;&gt;&gt;&gt;I've complained about sp_MSforeachdb before. In part

December 29, 2010 11:02 AM
 

Brian Klein said:

I have been seeing this skipped databases issue using sp_MSforeachdb.

It happens very often now for us as the server has scaled up to well over 500 databases.  2005 SP3.

Thank you very much for this informative post!

December 16, 2011 6:05 PM
 

Aaron Bertrand said:

I read a statement on a forum recently that said something like: IDENTITY columns are the primary key,

February 27, 2012 1:52 PM
 

Aaron Bertrand said:

I've filed a few Connect items recently that I think are important. In #752210 , I complain that the

July 6, 2012 10:51 AM
 

Ian Yates said:

For the short circuiting situation, it seems the note titled "Expressions in Queries" on the SQL 2005 "behaviour changes to database engine features" page at http://msdn.microsoft.com/en-us/library/ms143359(v=sql.90).aspx, explains why short circuiting is tricky.  My reading of it is that because the engine tries to identify common expressions in the query it won't necessarily respect conditions such as CASE WHEN ISDATE( [CharDateCol] ) = 1 THEN cast( [CharDateCol] as datetime ) ELSE NULL END.

July 8, 2012 6:46 AM
 

Nancy Folsom said:

Regarding Order By: Unfortunately, SQL Server 2012 looks in OpenQuery statements against linked servers. In my case, the query is against a PostgreSQL database that requires Order By in conjunction with its "Distinct On" construction. It's very annoying to get the warning dialog whenever I have to work on views using this technique. PostgreSQL uses Limit instead of top.

I would like to either be able to suppress the warning, or, preferably, have SQL Server ignore the syntax in the OpenQuery. Since it ignores everything else in the query, so far as I can tell, there is no reason it shouldn't ignore this.

April 10, 2013 12:07 PM
 

crokusek said:

IMO, this issue should have been improved years ago.  A better implementation could allow a order by (with implied top 100%) to be used as a default whenever it is the last stage of the overall pipeline and ignored otherwise.  

Is there really any downside besides "a view isn't supposed to do that according to the standard" argument?

July 24, 2014 4:48 PM
 

Chris said:

This is great - it's inspired me to build my own proc for executing against each db

https://github.com/dontlookaway/DBA-FunctionsAndTools/blob/master/Stored%20Procedures/Process.ExecForEachDB.sql

January 22, 2016 11:56 AM
 

Yves Forget said:

With no more than 5 user databases, sp_MsforEachDB called once a night will skip around one instance per week.

June 1, 2016 2:28 PM

Leave a Comment

(required) 
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Archives

Privacy Statement