THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Rob Farley

- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and runs training courses around the world in SQL Server and BI topics.

Joins without JOIN

I’m now doing two sessions at the SQL Saturday event in Portland. I had been scheduled to do a single session (on indexes), but got an email yesterday asking if I could do another one as well. So now I’m going to do a session earlier in the day about Joins.

Yes, JOINs. Nice co-incidence to find that this month’s T-SQL Tuesday, hosted by Stuart Ainsworth and a week early because of the PASS Summit, is on that same topic. Plus the fact that I gave a presentation on it to the Adelaide SQL User Group last week.

So let’s jump in…

There are three types of joins that we write with the JOIN keyword – INNER, OUTER and CROSS. I’m imagine if you’ve read this far through this post, you’re very much aware of that. But there are other types of joins as well, that don’t use the JOIN keyword (and I’m not counting the comma short-hand for CROSS JOIN that we all used back in the 90s).

A join describes the way that two sets are related to each other within a query (and for the purposes of this post, I’m not going to count joins that concatenate two sets, such as using the UNION keyword). If you’re using the JOIN keyword, the relationship between the tables is shown in the ON clause – although if you’re using CROSS JOIN, the relationship is that every row in one set is related to every row in the other set.

If we don’t use the JOIN keyword, then where do we see joins? Well, in correlated sub-queries. I showed this when demonstrating that the APPLY keyword applies a join between one set an another in another recent blog post.

But every correlated sub-query, whether using APPLY or not must perform a join.

Consider the following:

SELECT *
    , (SELECT COUNT(*) FROM Production.Product AS p
       WHERE p.ProductSubcategoryID = s.ProductSubcategoryID) AS ProdCount
FROM Production.ProductSubcategory AS s;

There’s a join here, between Production.ProductSubcategory and Production.Product. The ON clause equivalent is the WHERE clause of the correlated sub-query. But what kind of join is it?

Let’s think... There’s clearly a relationship between the two tables, so it’s not a CROSS JOIN. Also, there’s nothing that would stop a row from the ProductSubcategory table to be returned, so it’s not an INNER JOIN (which only returns rows that match). It’s an OUTER JOIN. And we can prove this by looking at the execution plan.

The plan says “Right Outer Join”. Don’t be upset by the fact that it’s a Right Join rather than a Left Join – it’s just that Product table is coming from the top row and ProductSubcategory is from the bottom row. The ‘side’ of an Outer Join is just a matter of perspective. A Right Join is no different to a Left Join, it’s simply a question of which side of the desk you’re on.

Another type of join with JOIN uses EXISTS or IN. These two act very similarly, and I’m going to use EXISTS in my examples.

Consider the following query:

SELECT *
FROM Production.ProductSubcategory AS s
WHERE EXISTS
    (SELECT * FROM Production.Product AS p
     WHERE p.ProductSubcategoryID = s.ProductSubcategoryID);

and it’s converse:

SELECT *
FROM Production.ProductSubcategory AS s
WHERE NOT EXISTS
    (SELECT * FROM Production.Product AS p
     WHERE p.ProductSubcategoryID = s.ProductSubcategoryID);

Just as before, this performs a join between the two tables. But it’s not actually an Outer Join we see going on here. After all, we see that rows can be filtered out of the ProductSubcategory table. Furthermore, we don’t have access to any information in the Product table, not even the count of rows. But it does Filter. This is neither an Inner Join or an Outer Join. The EXISTS form is a Semi Join, and the NOT EXISTS form is an Anti Semi Join. These operations simply filter one set based on whether a match is present or not. We see this operation in the two plans involved.

The fact that the Query Optimizer chose to implement one with a Hash Match and one with a Nested Loop is irrelevant. We’re looking at the Logical aspects of the plan, not the Physical.

And so we see there are plenty of times that a Join can appear without the JOIN keyword.

But let me show you something interesting about the Anti Semi Join for a moment...

Many people don’t like NOT EXISTS, and would rather write my earlier query like this:

SELECT *
FROM Production.ProductSubcategory AS s
LEFT OUTER JOIN Production.Product AS p
ON p.ProductSubcategoryID = s.ProductSubcategoryID
WHERE p.ProductID IS NULL;

It’s relatively easy to show that these two queries should be identical. Both of them find rows where a match doesn’t exist. Both perform a logical Anti Semi Join. And yet many people will tell you that doing an Outer Join plus a WHERE clause is better – from a performance perspective. They will generally tell you that NOT EXISTS is logically equivalent.

We’ve seen plenty of times when the Query Optimizer will realise that two queries are identical and produce the same plan, but this doesn’t seem to be the case here.

This plan shows very much what we asked for. An Outer Join plus a Filter, as opposed to the Anti Semi Join.

And this has very little reason to be any faster than an Anti Semi Join operation. In fact, the interesting thing here is that the Outer Join completes. It tracks every successful match and passes that up to the Filter. Our Anti Semi Join operation doesn’t do that.

I see no reason to use an Outer Join plus Filter over NOT EXISTS. It may have been quicker at some point if the Anti Semi Join operation was done poorly in earlier versions, but I’d really like to hear if anyone can show me the Outer Join plus Filter method being quicker.

PS: This Saturday (Oct 8th) I’ll be presenting this kind of stuff in my “Deeper JOINs” session at SQL Saturday #92 in Portland, and in my pre-conference seminar a couple of days later (Monday 10th) in Seattle at the PASS Summit. You can still register for these events, so if you find this kind of thing interesting, get yourself over to the Pacific North West and I’ll see you there!

@rob_farley

Published Tuesday, October 04, 2011 2:43 PM by Rob Farley

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

vijay kashyap said:

outer apply can be used for this below code

SELECT *

   , (SELECT COUNT(*) FROM Production.Product AS p

      WHERE p.ProductSubcategoryID = s.ProductSubcategoryID) AS ProdCount

FROM Production.ProductSubcategory AS s;

October 4, 2011 6:53 AM
 

Rob Farley said:

Indeed it can. I figured it wasn't necessary to show it though, as I discussed APPLY plenty in a previous post.

October 4, 2011 7:24 AM
 

Todd Everett said:

Thanks for this post Rob.  I've always wondered what those anti joins were.  Coming from a background of always using correlated EXISTS and NOT EXISTS I had only recently realized EXISTS is just INNER JOIN.  I never made the leap that NOT EXISTS is of course outer join with a WHERE filter so thank you for showing that.  Where I work folks like EXISTS and NOT EXISTS because it makes it clear what you are doing logically where writing as a join makes that a little more ambiguous.  I was just recently tending to use the inner join instead of EXISTS as I thought it performed better.  Now I see that may not be the case.  Thanks for rounding out my understanding of this topic!

October 4, 2011 8:26 AM
 

Brad Schulz said:

I love that we both touched on the OUTER + IS NULL inefficiency. :)

http://bradsruminations.blogspot.com/2011/10/t-sql-tuesday-023-flip-side-of-join.html

--Brad

October 4, 2011 9:10 PM
 

Rob Farley said:

Hi Todd,

EXISTS isn't actually an Inner Join, it's a Semi Join. Slight difference... an Inner Join will project rows onto each match. A Semi Join simply filters based on existence.

Hope this helps...

Rob

October 5, 2011 1:23 AM
 

Kawahee said:

What is the number of rebinds on the operators at the bottom of the  NOT EXISTS plan? If there is rebinding then it will be slower than the LEFT JOIN, which should bind to the table once.

Correlated subqueries are bad for performance because of rebinds, no?

October 9, 2011 12:54 AM
 

Kawahee said:

Just an update, I looked at the query plans myself and they seem to be equivalent in performance - both had the same number of (re)binds. I added a few more predicates to the WHERE and ON clauses to try and get the query optimizer to revert to rebinding for each record but both plans had the same number of rebinds.

The LEFT JOIN was actually slower, by a small margin, once I'd added the extra predicates.

I also changed the last query from SELECT * to SELECT s.*, since it's only fair we ask for the same data each time.

I'm going to keep looking at this.

October 9, 2011 3:06 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement