THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

Please visit the same blog at - comments are now disabled here but you can comment on the new blog (all blog posts available here are available there, too).
You can follow me on Twitter: @marcorus

To join or not to join: that is the question (in LINQ)

A comment received by one reader of Programming LINQ suggested me to underline a concept that is not so intuitive using LINQ, especially if you come from years of SQL coding.

The idea is very simple. Two entities in LINQ might be related in the model. Whenever this happen, usually it is better to leverage on this existing relationship and not to write the join syntax in an explicit way. If you are using LINQ to SQL, the generated SQL code might be more performant or at least correspondant to the one generated by writing an explicit join in your LINQ query. The less constraints in your query, the better.

Let's look at an example on the Northwind database. Imagine you want to see a list of all categories with a flag set for the one which a particular product belongs to. This is a SQL query we could write:

    CASE WHEN p.ProductID IS NULL 
        THEN 0
        ELSE 1
    END AS Selected
FROM Categories c
LEFT JOIN Products p
    ON p.CategoryID = c.CategoryID
    AND p.ProductID = 10
ORDER BY CategoryName

Ok, we can write the same query in many other ways, but there are several more complex situations where a LEFT JOIN is used to test the presence of an element in a related table. A correspondant LINQ query might be the following one:

from c in dc.Categories
orderby c.CategoryName
join p in dc.Products.Where(p => p.ProductID == 10)
    on c.CategoryID equals p.CategoryID 
    into pj
from x in pj.DefaultIfEmpty()
select new {
    Selected = x != null

The LINQ query above will generate a SQL query containing a LEFT JOIN statement. However, a relationship exists between Categories and Customer, and you can leverage on this relationship in the point where you really need to traverse the relationship (in the projection statement). The following one is a better way to get the same result:

from c in dc.Categories
orderby c.CategoryName
select new {
    Selected = c.Products.Any( p => p.ProductID == 10 ) ? true : false

This new version has two advantages. First, it is shorter and express its intent more explicitly.  Second, it generates a SQL query with an EXISTS statement, similar to the following one.

SELECT CategoryID, CategoryName,
        WHEN EXISTS(
            SELECT NULL AS [EMPTY]
            FROM Products AS p
            WHERE (p.ProductID = 10) AND (p.CategoryID = c.CategoryID)
            ) THEN 1
        ELSE 0
    END) AS Selected
FROM Categories AS c
ORDER BY CategoryName

The execution plan used by SQL Server might be similar if not equal. However, using the implicit relationship between Categories and Products in the LINQ query is usually better, because it gives more freedom to the LINQ provider to generate a more efficient SQL code.

Published Wednesday, June 18, 2008 12:53 AM by Marco Russo (SQLBI)
Filed under:


No Comments
New Comments to this post are disabled

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog



Privacy Statement