THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Aaron Bertrand

Aaron is a Product Evangelist for SentryOne, makers of Plan Explorer and a performance monitoring and event management platform for the Microsoft Data Platform and VMware. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features, and also blogs at blogs.sentryone.com and SQLPerformance.com; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at major conferences, user group meetings, and SQL Saturday events worldwide.

Bad habits to kick : ORDER BY ordinal

A few weeks ago, I wrote a post about forming a new habit: always terminate statements with semi-colons.  Today I thought I would start a series on kicking bad habits that many of us have developed over time.  I provided a little more background on this series over on my company's site.  Today's topic: using ordinal numbers in our ORDER BY clauses. 

At least once a week, I catch myself using ordinal position to define order.  For example, I might have this: 

SELECT foobar
FROM dbo.splunge
ORDER BY 1DESC;

This is lazy shorthand, and is bound to get screwed up at some point unless I have encrypted the stored procedure and threw away the key.  Invariably, someone at some point is going to come in and add a column, remove a column, or change the order of the columns.

Remember that a lot of us produce stored procedures to be consumed by essentially "dumb" clients - Windows forms or web applications that lack the ability to perform any ordering on their own.  So often the ORDER BY clause of a resultset coming back from a stored procedure is pretty important.  When editing a stored procedure that has an error-prone ORDER BY clause like the above, it is very easy to make changes to the SELECT list without even looking at the ORDER BY clause, or the reverse.  Imagine the web developer using the above procedure asks to change the SELECT list.  Now he wants mort, foo, bar.  (Note that order of the columns themselves should be irrelevant to the application, since it should be using column name and not ordinal references, but in some cases we like certain columns to appear in a certain order when we are debugging in SSMS and the like.)  So assume you do exactly as the developer asks, and edit the SELECT clause and alter the procedure:

SELECT mort, foobar
FROM dbo.splunge
ORDER BY 1DESC;

Oops!  Now the developer is complaining that his UI is broken because the order is all messed up.  This would not have happened if you had used the following syntax in the first place:

SELECT mort, foobar
FROM dbo.splunge
ORDER BY foobar DESC;

I know it is hard to spell out those column names when you're doing ad hoc stuff, but getting in the habit will potentially save your bacon someday.  And you can't use the excuse that you have very complicated expressions in your SELECT list, because ORDER BY is the only place in the entire query where you can reference an alias you provided.  So the following is completely legal:

SELECT foo RTRIM(CONVERT(INTABS(foo) % 2))
  
FROM dbo.splunge
  
ORDER BY foo;

There is nothing lost by being explicit, except for a few keystrokes.  But there is plenty to lose by not doing so - including some patience for the developer running to your cube like their head is on fire.  Ideally, when you are making changes to existing code, you examine the entire statement to ensure consistency.  But try to be proactive and protect yourself against the inevitable rushed and possibly botched emergency fix. 

I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code.  Up next: using loops to populate large tables.

Published Tuesday, October 06, 2009 11:49 AM by AaronBertrand

Comments

 

Armando Prato said:

My other peeves are with using SELECT * and with not naming the columns involved in an INSERT...SELECT

ie.  

INSERT INTO some_table

SELECT foo, bar FROM some_other_table

Add a column to some_table and you're potentially breaking your code.  To me, it's lazy not to be explicit with either ORDER BY or with INSERT...SELECT

October 6, 2009 12:01 PM
 

Nigel Ainscoe said:

Hear Hear say I. And in SQL 2008, intellisense will in many cases complete the column names for you so you have even less excuse to use ordinals. Even when just hacking a query for a quick report, it still makes sense to avoid ordinals as by using them you are introducing an easy place to put a hard to spot bug.

October 6, 2009 12:45 PM
 

Jack Corbett said:

I always find the use of ordinals in the order by interesting because I have never used them and didn't even know you could until I saw it in some vendor application.

I'd never do it.  I have to admit to not using a column list for inserts though.  Bad habit, I know and I will work on it.  Of course if you use SQLPrompt it does it for you, so I do it less now.

October 6, 2009 2:05 PM
 

AaronBertrand said:

Even without 3rd party tools, a column list is far too easy to get.  In Object Explorer, expand your table, and drag the "columns" node onto the query window.  They're probably not formatted the way you want them, but reformatting is less work than typing.

October 6, 2009 2:14 PM
 

Adam Machanic said:

Aaron has recently been doing an absolutely fantastic series of posts detailing various "bad habits"

October 8, 2009 10:57 PM
 

dd said:

Why have a separate order by clause at all (unless the ordering columns does not appear in the select list)?

Why not:

SELECT foo SORT, bar SORT DESC

 FROM dbo.splunge;

October 12, 2009 8:51 AM
 

AaronBertrand said:

dd, that seems ambiguous to me.  Do you have to order the columns in the select list in the same way that you want the order of the rows?  E.g. should SQL Server sort that statement by foo first, then bar desc, just because that is the order in the select list?  What if I want to return foo, bar in that order in the select list, but ordering of the rows should be first by bar and then by foo?

Also, it seems cumbersome to have to add SORT 15 times.  It also precludes the use of conditional ordering.

October 12, 2009 8:55 AM
 

Brian Tkatch said:

Other than in ad-hoc queries (and UNION ALLs) i do not use ordinals in an ORDER BY. Why do i use ordinals at all? Because of annoying naming conventions. FROM Customer ORDER BY Id, is easy to remember (and type!). FROM Customer ORDER BY CustomerId is redundant (annoying and more typo prone) and hard to remember (underscores this time? How about vowels?). FROM MCustomer_TBL ORDER BY X234_CustomerID_Q is hard to remember what it is called at all, redundant, and hard to remeber formatting.

So, i use ordinals. When it's real code, i open up notepad with the TABLE definition and copy&paste (faster than using the mouse withe the editor.)

October 12, 2009 10:45 AM
 

AaronBertrand said:

Brian, even faster than notepad would be to drag the column name from Object Explorer.  Just expand the Columns node for the table and click and drag into the query editor window.  I mentioned a similar tip for grabbing *all* the columns from the top node in my post about SELECT *:

http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/10/bad-habits-to-kick-using-select-omitting-the-column-list.aspx

October 12, 2009 10:48 AM
 

Brian Tkatch said:

Aaron, you are right. The drag is faster.

Though, i tend to be a keyboard user. So, i find alt-tab, ctrl-f, arrows, shift-end, crtl-c (ctrl-insert), alt-tab, ctrl-v (shift-insert) to be much faster and more convenient than f8, reach for mouse, move mouse over section, find scroll bar location with mouse and move mouse there, click, drag, let go, find table, expand table, expand columns, drag column into editor, alt-w h.

But it is a matter of conveniece, and each user's experience should be quite different.

October 12, 2009 12:16 PM
 

Doug said:

Brian, in your previous post you said, "Why do i use ordinals at all? Because of annoying naming conventions. FROM Customer ORDER BY Id, is easy to remember (and type!). FROM Customer ORDER BY CustomerId is redundant (annoying and more typo prone) and hard to remember (underscores this time? How about vowels?). FROM MCustomer_TBL ORDER BY X234_CustomerID_Q is hard to remember what it is called at all, redundant, and hard to remeber formatting."

Formatting aside, what happens if you have more than one column that is an id (typically a foreign key)..GUIDs?  I deal with tables that do not include the actual text in the tables themselves, rather a GUID reference to a master values table.  I would never shorthand my column names because the next guy that comes along will have to interpret what is written.  I agree with Aaron, always best to type them out or use the ObjectBrowser..or Intellisense. One thing I've tried to remind myself of is that the time I may save by taking shortcuts may cost the next guy more time the next pass through. So, its a wash, if not more costly.

October 14, 2009 10:47 AM
 

Brian Tkatch said:

Doug

I don't have GUIDs where one TABLE is the master. The idea itself raises a "red flag" for me. Of course, it could be justified, i just personally have never seen the case in the applications i have worked on.

When not constrained by company standards, i use the TABLE name as the FK COLUMN Name. For example:

Application: Id, Name, Description, Vendor, Licenses

Application_Version: Application, Version

In this case, Application_Version.Application is an FK to Application.Id. It is obvious, as Id is the PK.

If the same COLUMN is pulled twice, i add underscore and decription. For example:

Application: Id, Name, Description, Vendor, Licenses

Application_Version: Application, Version, Application_Replaced

Here, both Application and Application_Replaced FK to the same COLUMN, but the second which is an attribute and not a child gets an underscore and a description which makes it obvious to anyone who knows the model. (To get the know the model there is documentation.)

The point is, this is obvious to anyone using it, and thus becomes easy to types as well.

October 14, 2009 11:30 AM
 

AaronBertrand said:

Brian, the reason I prefer ApplicationID over Id is so that every column in the entire data dictionary that represents that column is named the exact same way.  This way, among other things, I never have multiple columns named "Id" coming back in a join, and have to figure out which is which.  And I can immediately understand this without first having to "know the model."

But again, these are preferences.  If Id makes more sense for your applications, by all means, go ahead.  Maybe I should start another series focused on "why these preferences are better than those preferences"?  :-)

October 14, 2009 11:43 AM
 

Brian Tkatch said:

@Aaron

Yeah, these are preferences.

To the point:

I ...

1) see no value in COLUMNs having the same name.

2) treat TABLEs as objects, and prefer referring to the object rather than an attribute

3) qualify all COLUMN names in a query that has multiple TABLEs, so there is never any confusion.

4) don't want somebody who "quickly understands" the model anywhere near it. That approach is error prone, and usually quite inefficient.

October 14, 2009 3:01 PM
 

AaronBertrand said:

Brian, I'm not sure why you see no value in columns having the same name throughout the data model.  Not only do you always know you'll be calling something a CustomerID, it also makes it very easy to find every reference to it in the metadata.  Maybe that's not valuable *TO YOU* but that does not mean it has no value in general.

Qualifying the table names is part of it, but when you say:

SELECT o.ID, c.ID FROM Orders AS o INNER JOIN Customers AS c ...

Your resultset still has two columns, both named "ID".  This can cause problems if you need to then put the result inside a CTE, or select the results into a table, or need distinct column names in the consumer (reporting services, home-grown applications, etc).  You can alias, sure, but then what have you gained?  Now instead of typing:

o.OrderID

You are typing:

o.ID AS OrderID

or

OrderID = o.ID

To me this is just another case where being explicit outweighs the cost of typing those extra characters.

As for "understanding the model," we don't always have that luxury.  Ever outsource?  Do you think they will ever have a full understanding of the model before they are allowed to produce any work?  Fat chance; the bean counters need results too.

October 14, 2009 4:02 PM
 

Brian Tkatch said:

@Aaron

Yes, i meant no value *to me*. Good point.

If the other TABLE will always use Customer, the benefit is the same. The only difference is that in Customer the COLUMN is called Id, everywhere else it is called Customer. Having the same name is only different in that the parent will have the same name the child uses.

As for qualifying, if the query writer chooses a bad alias that's his problem. I don't believe is changing the COLUMN name so the user cannot shoot himself in the foot. That is his prerogative.

October 14, 2009 4:39 PM
 

AaronBertrand said:

Brian, I used to do it exactly as you do, for the same reason.  I believe I even once said to a co-worker, "You don't need to say CustomerID because it's so obvious it's a Customer we're talking about; it's in the Customers table! Duh!"  Later on as I was writing and maintaining more complex queries involving these objects, I found that when the column was explicitly named in all cases, it just led to more self-documenting code, and much less scrolling to figure out which ID that one is.  Adding the fact that I had to change c.ID to CustomerID to make the resultset make sense just added to the pain.

I am all for changing my own conventions when it prevents *me* from shooting myself in the foot.  None of us is always at 100%.  Anyway, as for the "value," I see no value in it for me, to stop using Customers.CustomerID and switch to a less explicit Customers.ID.  So don't expect to see it in any of my code examples in the future.  :-)

October 14, 2009 4:59 PM
 

AaronBertrand said:

I guess my point is more along these lines:

In order to get someone to change their ways, there has to be something in it *for them*.  I think I've demonstrated cases where having the same column name in the same database to mean different things could cause problems.  What problem is caused by always calling a Customer's ID CustomerID?  Or on the flip side, what problem is solved by calling it a more generic "ID"?  I will certainly consider the value of your convention if you can tell me what I gain by using "ID" instead of "CustomerID" in that case.  And no, avoiding the strenuous work of typing 8 characters is not of value to me, in case you're wondering.

October 14, 2009 5:09 PM
 

Brian Tkatch said:

@Aaron

"avoiding the strenuous work of typing 8 characters is not of value to me"

Arg! There goes my good argument! :)

Redundancy annoys me greatly. It's not the eight characters, it's eight unneeded characters. The pain grows geometrically which each succeeding characters.

But, enough of that, the benefits as i see them, are two-fold.

1) To me, TABLEs are objects (or classes). A record in a TABLE is an instantiation of that object. Within that paradigm, when FKing to a TABLE, i want to object mentioned, not the technical detail of which COLUMN it is.

I find this paradigm helps people understand relationships in between TABLEs, put COLUMNs (attributes) in the correct TABLE, and write more intelligent queries.

2) Having Id, Name, Description as standard names for every object (where applicable) makes it very easy to remember what i need. The id is always "id", the name is always "name", the description is always "description".

Ad-hoc gets really easy:

SELECT * FROM Customer WHERE Id = 42;

SELECT * FROM People WHERE Name = 'Aaron Bertrand';

SELECT * FROM Product WHERE Description IS NULL;

If the Id COLUMN is always <table-name>ID, that would also work.

Though, in two ad-hoc cases even this is slightly inconvenient.

1) Many times i call TABLEs by their SYNONYMs because of long names of the TABLE. For a quick example, i may alias Customer to Cust.

If i use the SYNONYM, i have to remember that my <table-name>ID convention won't work on a SYNONYM: SELECT * FROM Cust WHERE CustID = 42; fails. However, SELECT * FROM Cust WHERE Id = 42; will always work.

2) During development and debugging, often i have to search multiple TABLEs. It becomes very convenient of all lookup TABLEs are only different to query by TABLE name and not also COLUMN name.

SELECT * FROM Customer WHERE Id = 42;

SELECT * FROM Product WHERE Id = 17;

SELECT * FROM People WHERE Id = 14;

It's just so much easier.

October 15, 2009 9:50 AM
 

AaronBertrand said:

Sorry Brian, I understand your reasoning to some extent, but I'm not sold.  it still seems to me that the primary benefit is avoiding typing 8 characters.  I value consistency over typing efficiency and total number of characters in a module.  <shrug>

October 15, 2009 9:59 AM
 

Brian Tkatch said:

@Aaron

To each their own. IMO. your method mostly works, but mine is better. :) Ultimately, it's a matter of preference, convenience, and ease of use. All of which are subjective.

October 15, 2009 11:06 AM
New Comments to this post are disabled

About AaronBertrand

...about me...

This Blog

Syndication

Archives

Privacy Statement