THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Aaron Bertrand

Aaron is a Product Evangelist for SentryOne, makers of Plan Explorer and a performance monitoring and event management platform for the Microsoft Data Platform and VMware. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features, and also blogs at blogs.sentryone.com and SQLPerformance.com; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at major conferences, user group meetings, and SQL Saturday events worldwide.

Bad habits to kick : using old-style JOINs

In my last post in this series, I talked about using simple loops to populate large tables.  This time I'd like to focus on getting rid of old, ANSI-89 joins.

I am sure most veterans know better than to use old ANSI-89 JOIN syntax, such as:

SELECT o.OrderIDod.ProductID
FROM dbo.Orders AS odbo.OrderDetails AS od
WHERE o.OrderDate >= '20091001'
AND o.OrderID od.ProductID;

One reason to avoid this syntax is that the query is often less readable than when you separate the joining criteria from the filter criteria.  For example, the above query re-written to use a "proper" join becomes:

SELECT o.OrderIDod.ProductID
FROM dbo.Orders AS o
INNER JOIN dbo.OrderDetails AS od
ON o.OrderID od.ProductID
WHERE o.OrderDate >= '20091001';

This is subjective, of course, but seems to be a consensus.  Now, bring in four or five more tables to the query, and then try to debug when you are not getting the expected number of rows.  When the joined-in tables are grouped together with their joining criteria, commenting them out or changing the criteria becomes a much simpler task.

The OUTER JOIN version of this syntax (which uses *=, =*) is discontinued in SQL Server 2012, and has been deprecated since SQL Server 2008, so there is another reason to avoid the syntax in general.  Many people have memorized which is which, but deprecation aside, isn't it better to explicitly state what you are doing?  For example:

SELECT p.ProductNamep.ProductID
FROM dbo.Products AS p
LEFT OUTER JOIN dbo.OrderDetails AS od
ON p.ProductID od.ProductID
WHERE od.ProductID IS NULL;

There are also documented cases where the old-style outer joins produce incorrect results depending on the ordering of evaluation of the "filter" criteria (which happened to include the joining criteria).  Largely a moot point since *= / =* is deprecated, but I wanted to point it out all the same.  After all, since I know many of us will be on 2000, 2005 or 2008 for some time, we *could* continue using this syntax if we wanted to.  But I don't think we should.

Are you sold yet?  No?  Well, there is an even bigger potential problem in using the implicit INNER JOIN syntax.  What if you forget the JOIN criteria in the WHERE clause?  You've turned it into a CROSS JOIN (otherwise known as the Cartesian Product).  Take the first query again, and leave out the last line:

SELECT o.OrderIDod.ProductID
FROM dbo.Orders AS odbo.OrderDetails AS od
WHERE o.OrderDate >= '20091001';

For every single order after October 1st, you're going to get a copy of every single row from OrderDetails *for all of time* since you've lost the joining criteria *and* the filtering criteria for that table.  This could be a disastrous mistake on a production system, and while it can be caught relatively quickly after deployment and testing, it can be prevented entirely at compile time, since an error will be thrown if you leave out the ON clause for an INNER JOIN.  Sure, you can still get your join criteria wrong and push the change without noticing (for example, ON o.OrderID = o.OrderID), but with the newer syntax, there's one less thing that is likely to go wrong.

My biggest beef about trying to help people kick these habits is that Microsoft themselves keep publishing sample code that promotes the syntax.  Again, this isn't so much a problem for the veterans (except those with insurmountable inertia), who fully understand how JOINs work, and know the history of ANSI-89 and ANSI-92.  But for someone brand new to SQL Server, they will pick up a current copy of Books Online, and see sample queries that enforce these coding standards.  The other day, I complained about this on Connect, when I came across yet another example of old-style joins in the SQL Server 2008 version of Books Online.  They quickly agreed that, when they review the documentation for 2008 R2, they will update any code samples they deem "suspect."

I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code.  Up next: using table aliases like (a, b, c) or (t1, t2, t3).

Published Thursday, October 08, 2009 12:26 PM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Glenn Berry said:

I completely agree with you about this. I see a lot of people with Oracle experience in classes that I teach using ANSI-89 JOINs, but I quickly break them of that habit (at least while the class lasts).

It is much harder to look at a query that uses ANSI-89 JOINs, and quickly pick out the actual part WHERE clause that matters the most for query optimization.

October 8, 2009 1:00 PM
 

mjswart said:

Man, ANSI-89. The name itself isn't even Y2K compliant.

But the old syntax makes me nostalgic. Do you remember? It was used at a time when people's computer choices were between *real* PCs and clones (non-IBM).

October 8, 2009 2:19 PM
 

mjswart said:

Here's a question though. What if you actually *want* a cartesian product.

For example, say you want the numbers 1-1,000,000 but only have a numbers table that stores 1000 numbers.

Then the choice is between:

select a.number + b.number * 1000

from numbers a, numbers b

and:

select a.number + b.number * 1000

from numbers a

cross join numbers b

In this case, the old syntax seems a bit less wrong than other scenarios you mentioned. But I guess the second example is saying in effect "I'm cross joining and I know it".

October 8, 2009 3:37 PM
 

AaronBertrand said:

Yes, I would still use the latter.  Then I *know* by looking at the code that my intention was to cross join.  Remember that if you wrote that code, you might not be the next person looking at it, scratching their head wondering about your intention.  When you state your intention, there is no mystery.

Arguably, you could state your intention by placing a comment at the beginning of the code block.  But that comment can still be ignored or get lost when copying the code to another place.  And you'd still have to form the habit of commenting in the first place.

October 8, 2009 3:44 PM
 

Adam Machanic said:

Aaron has recently been doing an absolutely fantastic series of posts detailing various "bad habits"

October 8, 2009 10:57 PM
 

Tony Sawyer said:

@mjswart

How about:

select a.number + b.number * 1000

from numbers a

join numbers b

on 1 = 1

?

October 9, 2009 4:15 AM
 

Sheila Filteau said:

What is your opinion on including the filter criteria with the join clause, when the filter is based on the value of a specific column? In your first example the WHERE would be an AND.

SELECT o.OrderID, od.ProductID  

FROM dbo.Orders AS o  

INNER JOIN dbo.OrderDetails AS d  

ON o.OrderID = od.ProductID  

AND o.OrderDate >= '20091001';

October 9, 2009 11:11 AM
 

AaronBertrand said:

I guess it's semantics, Sheila, and you can decide for yourself whether the syntax you propose is more meaningful for you.  I prefer putting filter criteria in the WHERE clause, and join criteria in the ON clause.  The OrderDate has nothing to do with the join (in other words, the join will work the same way if you comment that line out, it will just return more rows), so I don't think it belongs there.

October 9, 2009 11:14 AM
 

RussellH said:

@Glenn Berry

I agree, and Oracle people shouldn't have an excuse because Oracle has supported the modern syntax since version 9i.

October 9, 2009 12:18 PM
 

YoungerGuy said:

I'm the younger guy coming to my company and with less experience but a

lot fresher education and there in my company they do this kind of join statements all the time. It was a lot of "WTF" in the beginning, I could hardly believe my eyes. But the atmosphere got me and I'm guilty of writing this kind of statements. And another sin they do is to use cursors excessively and then they sometimes complain of a slow application? Why can it be so? He he!!

October 10, 2009 4:38 AM
 

Wild Rumpus said:

I'm a software engineer, but my current focus is writing application code.  I totally understand where you're coming from, but my development team doesn't have a DBA, and a lot of us "coders" are from back-in-the-day.  Are there any FREE, AUTOMATED tools that we can run our scripts through to highlight where we could be switching to new, good habits?

October 10, 2009 3:00 PM
 

AaronBertrand said:

Wild Rumpus, a lot of the "good" vs. "bad" habits are quite subjective, so I think it would be a stretch to expect a tool (particularly a free one) that will identify these things for you, never mind convert them.

That said, there are some tools that Microsoft ships that will help identify some things, such as old-style outer joins and usage of other deprecated features.

SQL Server 2008 Upgrade Advisor:

http://microsoft.com/downloads/details.aspx?FamilyID=F5A6C5E9-4CD9-4E42-A21C-7291E7F0F852

SQL Server 2005 Upgrade Advisor:

http://microsoft.com/downloads/details.aspx?FamilyID=1470e86b-7e05-4322-a677-95ab44f12d75

SQL Server 2005 Best Practices Analyzer:

http://microsoft.com/downloads/details.aspx?FamilyID=DA0531E4-E94C-4991-82FA-F0E3FBD05E63

October 10, 2009 4:39 PM
 

Jogo said:

One feature of oracle join syntax that i really like is the "USING" clause.  If you want to join two tables on a set of columns with the same name in both tables, you just say:

select * from table1 t1 join table2 t2

USING (SharedIdColumn1)

just so much clearer to read and less fiddly to write than

select * from table1 join table2

ON t1.SharedIdColum1 = t2.SharedIdColumn2

apparently this is actually part of the ansi specification.  I wish microsoft would just go ahead and implement it.  Or maybe not part of ansi spec in which case I wish MS would just go ahead and "borrow" it from Oracle :)

October 11, 2009 2:31 PM
 

AaronBertrand said:

October 11, 2009 2:40 PM
 

ogrig said:

Sorry to disagree, "old style ANSI-89 joins" are NOT less readable, quite the opposite.

They are non-standard, on the deprecation list (for a good reason), ... and have other issues you would be more aware of than I am, but not less readable.

It is you that got used to the new standard (not so new anymore:-) and are not confortable anymore with the old one.

My "problem" with the new standard is proper formatting of the code.

Let's take 3 ways of writing the same simple 3 tables query:

-- style 1

SELECT  dt.someColumn

FROM      tblInvoiceRun    ir

    JOIN tblInvoiceHeader hd ON ir.invoiceRunNo = hd.invoiceRunNo

    JOIN tblInvoiceDetail dt ON hd.invoiceNo    = dt.invoiceNo

WHERE ir.invoiceRunNo = 666

-- style 2

SELECT  dt.someColumn

FROM      tblInvoiceRun    ir

    JOIN tblInvoiceHeader hd

    ON ir.invoiceRunNo = hd.invoiceRunNo

    JOIN tblInvoiceDetail dt

    ON hd.invoiceNo    = dt.invoiceNo

WHERE ir.invoiceRunNo = 666

-- style 3

SELECT  dt.someColumn

FROM tblInvoiceRun    ir

    tblInvoiceHeader hd

    tblInvoiceDetail dt

WHERE ir.invoiceRunNo = 666

AND   ir.invoiceRunNo = hd.invoiceRunNo

AND   hd.invoiceNo    = dt.invoiceNo

style 1: clear separation of participants (tables) and rules (ON clause), but a bit of a pain to align properly and maintain (just think about adding a LEFT JOIN or multiple conditions in one of the ON clauses)

style 2: I know this is your recommended style, but for me it is just messy. The participants and rules are interspersed, which means you cannot use a proper table-like alignament. Just unnecessarily hard to follow.

style 3: very clear separation of participants (FROM clause) and rules (WHERE clause), very easy to write and align properly.

Just to be clear: I am not saying that ANSI-89 is the style to use, quite the contrary. But there are enough valid reasons to change. The fact that you, me, or someone else's cat do not find it easy to the eye is NOT one of them.

October 12, 2009 1:46 AM
 

Aaron Bertrand said:

For the most part, by readable, I meant that it is much more visually obvious where the join criteria lives and where the filter criteria lives.  Obviously if you've used the old style your entire career then your point of view is different from someone who hasn't.  To me the new style is more readable, but only part of that is for the subjective reason.

October 12, 2009 7:40 AM
 

Brian Tkatch said:

I agree with ogrig . ANSI-89 syntax is much easier to read. I despise the later syntax, and use it only if i have too.

Besides the lack of readability, there is another issue with the later new-fangled syntax. It destroys clause separation. Traditionally, the FROM clause includes records, the WHERE clause excludes records. A join clause usually excludes records and never includes records. Hence, it belongs in the WHERE clause.

October 12, 2009 10:56 AM
 

RBrown said:

Just a quick comment about your first "proper" join.  The aliasing seems to be inconsistent with the comparison in the ON clause...

SELECT o.OrderID, od.ProductID  FROM dbo.Orders AS o  INNER JOIN dbo.OrderDetails AS d  ON o.OrderID = od.ProductID  WHERE o.OrderDate >= '20091001';

should be:

SELECT o.OrderID, od.ProductID  FROM dbo.Orders AS o  INNER JOIN dbo.OrderDetails AS od  ON o.OrderID = od.ProductID  WHERE o.OrderDate >= '20091001';

October 12, 2009 2:52 PM
 

quizno said:

so i guess i'm confused.... i, too, think the "x INNER JOIN y ON z" syntax is LESS readable. INNER joins are never the problem, it's the left and right joins that seem to frustrate. What is the "correct" syntax to use when wanting (x LEFT JOIN y) RIGHT JOIN z is wanted (e.g. the RIGHT JOIN is between table z and the result of the LEFT JOIN of tables x and y)?

October 20, 2009 5:33 PM
 

Code Guy said:

I asked a member of the ANSI SQL board why they went with this more complicated join clause.  He, (Joe Celko) said that the original syntax is in fact the correct method.  The Join syntax was for systems that didn't have the optimizers such as SQL CE and SQL Lite and was never intended to be used in the way we have stared using it. Microsoft got it confused and made it policy.  

The Cartesian join issue is not much of a problem if you test your sql.

He also said that they met for 2 days and decided that the language is produced SQL not Sequel.  I was greatful to have an answer to both of those questions.

March 30, 2010 2:11 PM
 

Edward said:

I have to agree with Aaron and disagree with ogrig.  Had a query I inherited once with Joins to 6 tables and a total of ten criteria in the Where clause.  The query kept blowing up because one of the tables had changed structure.  Trying to trouble shoot the joins on that was a godawful nightmare.  When I finally finished re-writing it, it was very straight forward with my Five JOIN lines and their criteria separated from the 4 Where criteria.  Because of that I was able to then simplify the where criteria down to two because of the equi-join required that criteria equal to each other didn't have to both be listed in the Where clause as equalling the same value.

May 4, 2012 4:52 PM
 

Syed said:

Is there any difference in the performance when using either of them.

April 24, 2013 10:44 AM
 

AaronBertrand said:

Syed no, there isn't, but I think you may have missed the point of the article...

April 24, 2013 10:55 AM
 

Stephen said:

I have to agree with Aaron on this as well.  Trying to decipher 5+ table joins in the old comma syntax is a hell I'd wish on no man.  

We've had accidental Cartesian products in our application a few times due to the comma syntax which have resulted in 60,000+ row tables being Cartesian joined instead of Inner joined.  I've never had this problem as my intentions are clear when I use the newer syntax, even though I'm guilty of shorthanding JOIN and LEFT JOIN. :P

May 7, 2013 6:29 PM
 

Simon said:

FYI the link to the deprecation of old-style joins is now outdated (it now points at features which will be cut from vNext after 2012)

Correct link is now: http://msdn.microsoft.com/en-us/library/ms144262.aspx (Discontinued Database Engine Functionality in SQL Server 2012)

May 17, 2013 7:10 PM
 

John said:

So true about the outdated join syntax, expecially the deprecated outer joins.

We have a couple of commercial products (same vendor) that persist in using the deprecated outer join syntax.  We brought this to the attention of the vendor over 5 years (yes, years) ago and they have yet to correct their source code.

Thus, we are "locked in" to using SQL 2008 R2 because the only way we can get the application to run correctly is to set the database compatability level to "80" (SQL 2000) for these databases.  This database compatability level no longer exists in SQL 2012 so we cannot upgrade our SQL Servers until the vendor fixes their code. Oh well.

June 13, 2014 4:22 PM
 

Nate Brunner said:

I agree COMPLETELY with ogrig and Brian Tkatch. The "old school" is much cleaner and more readable.

July 2, 2014 10:46 AM
 

AaronBertrand said:

@Nate ok, readability is quite subjective, and I am never going to convince everyone about any aspect of what makes code more or less readable - way too many preferences, styles, habits, etc.

But what about the other points that even ogrig conceded (and Brian ignored)? Are you really going to continue using old-style joins just for readability, even after they're deprecated? Are you really going to insist that it makes sense to use old-style inner joins even though you can no longer use old-style outer joins? Are "readable" inner joins - prone to losing important filter criteria - really worth having forced inconsistency?

I'm trying to provide advice that is most logical, especially for newer users who might be "learning" from legacy code or veterans. You run ignore that and run your shop however you like, of course. :-)

July 2, 2014 11:47 AM
 

MatthewMcK said:

I agree with Aaron,  but I find that given a choice I prefer to hire people that know what they are doing rather than a WTF? person. (their Uni. has failed them).  Not everything is development, and in some shops there are a lot of these queries. Maintenance matters.

In any case after a few years with any model a boilerplate query will already exist for most scenarios. I have no preference, and we must know both, but ANSI style join does allow bad behaviour such as not having the selection criteria in the same order as the tables are named.

Personally I like A,B,C,D,E etc. This should also be the way the DB designer thinks when modeling.  

It disturbs me if I see someone drafting a query without drawing a picture first (not using an ERD).

(SQLServer from 1993, DB2 from 1992, dabbler in Mapreduce)

November 11, 2014 11:26 PM
 

Gordon Ellis said:

I fully agree with ogrig, Brian, and Nate about the readability and clause separation of the old-style syntax. After all, logically, joins (inner joins anyway) ARE simply filtered cross joins, and as Sheila's post illustrates, there really is NO DIFFERENCE between a join condition and a filter condition. But I also agree with Aaron and others about the dangers of the old-style syntax.

Personally, I hate the "new" style so much that I avoid it where possible, and clearly separate my joins and conditions in the where clause, as so:

select

 a.SomeColumn,

 b.SomeOtherColumn

from

 tableA a,

 tableB b,

where

-- joins

 a.ID = b.a_ID and

-- conditions

 a.SomeValue > 10 and

 b.SomeString = 'abc'

A much better solution in my opinion would be for a new syntax where the type of join is specified in the FROM clause, and the join conditions are expressed separately in a new JOIN clause, followed by filter conditions in the WHERE clause. Something like:

select

 (whatever)

from --these are your tables and how they're joined

 tableA a left

 tableB b inner

 tableC c cross

 tableD

join --these are your join conditions (one condition or set of conditions for each join mentioned above)

 a.Field1 = b.Field1 and a.Field2 = b.Field2,

 a.Field3 > c.Field4

where --these are your filter conditions

 a.Field5 = 0 and

 d.Field6 <> 'abc' and

 etc.

The parser could then check to see that you provided join conditions that make sense for your join type, and throw an error if there is a mismatch (e.g. trying to do an inner or outer join without any conditions or with conditions that don't compare fields from both sides of the join, or trying to put conditions on a cross join, etc.)

January 4, 2016 10:26 AM
 

William said:

I used to like the old style, but now I am a fan of the new style. I have been bitten with the accidental Cartesian join by the old style, and, with lots of tables in your query, the old style is very hard to trace.

I find the formatting style below to be quite readable... (using tab chars that are 8 spaces wide):

select a.col1,

      b.col2,

      c.col3

from   table1 a

join   table2 b    on a.col2 = b.col2

left outer

join   (

      select col3

      from   table3

      ) c         on b.col3 = c.col3

where  a.col2 = '2'

and    a.col3 = '3'

order by a.col1, b.col2, c.col3

January 12, 2016 5:17 PM
 

philip said:

The *=/=* syntax was never ANSI; it was from Oracle, it never had a clearly defined meaning when more than two tables were joined, and arose from a misapplication of a notation in a Chris Date article about outer joins.

Something like ON was needed for proper outer join syntax. Coincidentally it made sense to use the same syntax for inner join, but since nested selects were added at the same time, ON was not needed for mixing outer and inner joins.

Comma has never been deprecated. The other joins are still defined in the SQL standard in terms of it.

Comma is just a different spelling for cross join with lower precedence. Arguments that comma is problematic are specious. (Especially considering how ridiculously & unnecessarily arcane and redundant the rest of select statement syntax and semantics is.)

May 31, 2016 10:15 PM

Leave a Comment

(required) 
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Archives

Privacy Statement