THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Aaron Bertrand

Bad habits to kick : using old-style JOINs

In my last post in this series, I talked about using simple loops to populate large tables.  This time I'd like to focus on getting rid of old, ANSI-89 joins.

I am sure most veterans know better than to use old ANSI-89 JOIN syntax, such as:

SELECT o.OrderIDod.ProductID
FROM dbo.Orders AS odbo.OrderDetails AS od
WHERE o.OrderDate >= '20091001'
AND o.OrderID od.ProductID;

One reason to avoid this syntax is that the query is often less readable than when you separate the joining criteria from the filter criteria.  For example, the above query re-written to use a "proper" join becomes:

SELECT o.OrderIDod.ProductID
FROM dbo.Orders AS o
INNER JOIN dbo.OrderDetails AS od
ON o.OrderID od.ProductID
WHERE o.OrderDate >= '20091001';

This is subjective, of course, but seems to be a consensus.  Now, bring in four or five more tables to the query, and then try to debug when you are not getting the expected number of rows.  When the joined-in tables are grouped together with their joining criteria, commenting them out or changing the criteria becomes a much simpler task.

The OUTER JOIN version of this syntax (which uses *=, =*) is already on the deprecation list, so there is another reason to avoid the syntax in general.  Many people have memorized which is which, but deprecation aside, isn't it better to explicitly state what you are doing?  For example:

SELECT p.ProductNamep.ProductID
FROM dbo.Products AS p
LEFT OUTER JOIN dbo.OrderDetails AS od
ON p.ProductID od.ProductID
WHERE od.ProductID IS NULL;

There are also documented cases where the old-style outer joins produce incorrect results depending on the ordering of evaluation of the "filter" criteria (which happened to include the joining criteria).  Largely a moot point since *= / =* is deprecated, but I wanted to point it out all the same.  After all, since I know many of us will be on 2000, 2005 or 2008 for some time, we *could* continue using this syntax if we wanted to.  But I don't think we should.

Are you sold yet?  No?  Well, there is an even bigger potential problem in using the implicit INNER JOIN syntax.  What if you forget the JOIN criteria in the WHERE clause?  You've turned it into a CROSS JOIN (otherwise known as the Cartesian Product).  Take the first query again, and leave out the last line:

SELECT o.OrderIDod.ProductID
FROM dbo.Orders AS odbo.OrderDetails AS od
WHERE o.OrderDate >= '20091001';

For every single order after October 1st, you're going to get a copy of every single row from OrderDetails *for all of time* since you've lost the joining criteria *and* the filtering criteria for that table.  This could be a disastrous mistake on a production system, and while it can be caught relatively quickly after deployment and testing, it can be prevented entirely at compile time, since an error will be thrown if you leave out the ON clause for an INNER JOIN.  Sure, you can still get your join criteria wrong and push the change without noticing (for example, ON o.OrderID = o.OrderID), but with the newer syntax, there's one less thing that is likely to go wrong.

My biggest beef about trying to help people kick these habits is that Microsoft themselves keep publishing sample code that promotes the syntax.  Again, this isn't so much a problem for the veterans (except those with insurmountable intertia), who fully understand how JOINs work, and know the history of ANSI-89 and ANSI-92.  But for someone brand new to SQL Server, they will pick up a current copy of Books Online, and see sample queries that enforce these coding standards.  The other day, I complained about this on Connect, when I came across yet another example of old-style joins in the SQL Server 2008 version of Books Online.  They quickly agreed that, when they review the documentation for 2008 R2, they will update any code samples they deem "suspect."

I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code.  Up next: using table aliases like (a, b, c) or (t1, t2, t3).


Published Thursday, October 08, 2009 12:26 PM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Glenn Berry said:

I completely agree with you about this. I see a lot of people with Oracle experience in classes that I teach using ANSI-89 JOINs, but I quickly break them of that habit (at least while the class lasts).

It is much harder to look at a query that uses ANSI-89 JOINs, and quickly pick out the actual part WHERE clause that matters the most for query optimization.

October 8, 2009 1:00 PM
 

mjswart said:

Man, ANSI-89. The name itself isn't even Y2K compliant.

But the old syntax makes me nostalgic. Do you remember? It was used at a time when people's computer choices were between *real* PCs and clones (non-IBM).

October 8, 2009 2:19 PM
 

mjswart said:

Here's a question though. What if you actually *want* a cartesian product.

For example, say you want the numbers 1-1,000,000 but only have a numbers table that stores 1000 numbers.

Then the choice is between:

select a.number + b.number * 1000

from numbers a, numbers b

and:

select a.number + b.number * 1000

from numbers a

cross join numbers b

In this case, the old syntax seems a bit less wrong than other scenarios you mentioned. But I guess the second example is saying in effect "I'm cross joining and I know it".

October 8, 2009 3:37 PM
 

AaronBertrand said:

Yes, I would still use the latter.  Then I *know* by looking at the code that my intention was to cross join.  Remember that if you wrote that code, you might not be the next person looking at it, scratching their head wondering about your intention.  When you state your intention, there is no mystery.

Arguably, you could state your intention by placing a comment at the beginning of the code block.  But that comment can still be ignored or get lost when copying the code to another place.  And you'd still have to form the habit of commenting in the first place.

October 8, 2009 3:44 PM
 

Adam Machanic said:

Aaron has recently been doing an absolutely fantastic series of posts detailing various "bad habits"

October 8, 2009 10:57 PM
 

Tony Sawyer said:

@mjswart

How about:

select a.number + b.number * 1000

from numbers a

join numbers b

on 1 = 1

?

October 9, 2009 4:15 AM
 

Sheila Filteau said:

What is your opinion on including the filter criteria with the join clause, when the filter is based on the value of a specific column? In your first example the WHERE would be an AND.

SELECT o.OrderID, od.ProductID  

FROM dbo.Orders AS o  

INNER JOIN dbo.OrderDetails AS d  

ON o.OrderID = od.ProductID  

AND o.OrderDate >= '20091001';

October 9, 2009 11:11 AM
 

AaronBertrand said:

I guess it's semantics, Sheila, and you can decide for yourself whether the syntax you propose is more meaningful for you.  I prefer putting filter criteria in the WHERE clause, and join criteria in the ON clause.  The OrderDate has nothing to do with the join (in other words, the join will work the same way if you comment that line out, it will just return more rows), so I don't think it belongs there.

October 9, 2009 11:14 AM
 

RussellH said:

@Glenn Berry

I agree, and Oracle people shouldn't have an excuse because Oracle has supported the modern syntax since version 9i.

October 9, 2009 12:18 PM
 

SqlServerKudos said:

Kudos for a great Sql Server article - Trackback from SqlServerKudos

October 9, 2009 2:00 PM
 

YoungerGuy said:

I'm the younger guy coming to my company and with less experience but a

lot fresher education and there in my company they do this kind of join statements all the time. It was a lot of "WTF" in the beginning, I could hardly believe my eyes. But the atmosphere got me and I'm guilty of writing this kind of statements. And another sin they do is to use cursors excessively and then they sometimes complain of a slow application? Why can it be so? He he!!

October 10, 2009 4:38 AM
 

Helping people kick bad SQL Server habits « OTO One to One Interactive said:

October 10, 2009 2:28 PM
 

Wild Rumpus said:

I'm a software engineer, but my current focus is writing application code.  I totally understand where you're coming from, but my development team doesn't have a DBA, and a lot of us "coders" are from back-in-the-day.  Are there any FREE, AUTOMATED tools that we can run our scripts through to highlight where we could be switching to new, good habits?

October 10, 2009 3:00 PM
 

AaronBertrand said:

Wild Rumpus, a lot of the "good" vs. "bad" habits are quite subjective, so I think it would be a stretch to expect a tool (particularly a free one) that will identify these things for you, never mind convert them.

That said, there are some tools that Microsoft ships that will help identify some things, such as old-style outer joins and usage of other deprecated features.

SQL Server 2008 Upgrade Advisor:

http://microsoft.com/downloads/details.aspx?FamilyID=F5A6C5E9-4CD9-4E42-A21C-7291E7F0F852

SQL Server 2005 Upgrade Advisor:

http://microsoft.com/downloads/details.aspx?FamilyID=1470e86b-7e05-4322-a677-95ab44f12d75

SQL Server 2005 Best Practices Analyzer:

http://microsoft.com/downloads/details.aspx?FamilyID=DA0531E4-E94C-4991-82FA-F0E3FBD05E63

October 10, 2009 4:39 PM
 

Jogo said:

One feature of oracle join syntax that i really like is the "USING" clause.  If you want to join two tables on a set of columns with the same name in both tables, you just say:

select * from table1 t1 join table2 t2

USING (SharedIdColumn1)

just so much clearer to read and less fiddly to write than

select * from table1 join table2

ON t1.SharedIdColum1 = t2.SharedIdColumn2

apparently this is actually part of the ansi specification.  I wish microsoft would just go ahead and implement it.  Or maybe not part of ansi spec in which case I wish MS would just go ahead and "borrow" it from Oracle :)

October 11, 2009 2:31 PM
 

AaronBertrand said:

October 11, 2009 2:40 PM
 

ogrig said:

Sorry to disagree, "old style ANSI-89 joins" are NOT less readable, quite the opposite.

They are non-standard, on the deprecation list (for a good reason), ... and have other issues you would be more aware of than I am, but not less readable.

It is you that got used to the new standard (not so new anymore:-) and are not confortable anymore with the old one.

My "problem" with the new standard is proper formatting of the code.

Let's take 3 ways of writing the same simple 3 tables query:

-- style 1

SELECT  dt.someColumn

FROM      tblInvoiceRun    ir

    JOIN tblInvoiceHeader hd ON ir.invoiceRunNo = hd.invoiceRunNo

    JOIN tblInvoiceDetail dt ON hd.invoiceNo    = dt.invoiceNo

WHERE ir.invoiceRunNo = 666

-- style 2

SELECT  dt.someColumn

FROM      tblInvoiceRun    ir

    JOIN tblInvoiceHeader hd

    ON ir.invoiceRunNo = hd.invoiceRunNo

    JOIN tblInvoiceDetail dt

    ON hd.invoiceNo    = dt.invoiceNo

WHERE ir.invoiceRunNo = 666

-- style 3

SELECT  dt.someColumn

FROM tblInvoiceRun    ir

    tblInvoiceHeader hd

    tblInvoiceDetail dt

WHERE ir.invoiceRunNo = 666

AND   ir.invoiceRunNo = hd.invoiceRunNo

AND   hd.invoiceNo    = dt.invoiceNo

style 1: clear separation of participants (tables) and rules (ON clause), but a bit of a pain to align properly and maintain (just think about adding a LEFT JOIN or multiple conditions in one of the ON clauses)

style 2: I know this is your recommended style, but for me it is just messy. The participants and rules are interspersed, which means you cannot use a proper table-like alignament. Just unnecessarily hard to follow.

style 3: very clear separation of participants (FROM clause) and rules (WHERE clause), very easy to write and align properly.

Just to be clear: I am not saying that ANSI-89 is the style to use, quite the contrary. But there are enough valid reasons to change. The fact that you, me, or someone else's cat do not find it easy to the eye is NOT one of them.

October 12, 2009 1:46 AM
 

Aaron Bertrand said:

For the most part, by readable, I meant that it is much more visually obvious where the join criteria lives and where the filter criteria lives.  Obviously if you've used the old style your entire career then your point of view is different from someone who hasn't.  To me the new style is more readable, but only part of that is for the subjective reason.

October 12, 2009 7:40 AM
 

Brian Tkatch said:

I agree with ogrig . ANSI-89 syntax is much easier to read. I despise the later syntax, and use it only if i have too.

Besides the lack of readability, there is another issue with the later new-fangled syntax. It destroys clause separation. Traditionally, the FROM clause includes records, the WHERE clause excludes records. A join clause usually excludes records and never includes records. Hence, it belongs in the WHERE clause.

October 12, 2009 10:56 AM
 

RBrown said:

Just a quick comment about your first "proper" join.  The aliasing seems to be inconsistent with the comparison in the ON clause...

SELECT o.OrderID, od.ProductID  FROM dbo.Orders AS o  INNER JOIN dbo.OrderDetails AS d  ON o.OrderID = od.ProductID  WHERE o.OrderDate >= '20091001';

should be:

SELECT o.OrderID, od.ProductID  FROM dbo.Orders AS o  INNER JOIN dbo.OrderDetails AS od  ON o.OrderID = od.ProductID  WHERE o.OrderDate >= '20091001';

October 12, 2009 2:52 PM
 

quizno said:

so i guess i'm confused.... i, too, think the "x INNER JOIN y ON z" syntax is LESS readable. INNER joins are never the problem, it's the left and right joins that seem to frustrate. What is the "correct" syntax to use when wanting (x LEFT JOIN y) RIGHT JOIN z is wanted (e.g. the RIGHT JOIN is between table z and the result of the LEFT JOIN of tables x and y)?

October 20, 2009 5:33 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement