Re: SQL to Linq - Left, Right and Inner Joins
- From: "Frans Bouma [C# MVP]" <perseus.usenetNOSPAM@xxxxxxxxx>
- Date: Mon, 07 Jul 2008 10:40:36 +0200
Jon Skeet [C# MVP] wrote:
Frans Bouma [C# MVP] <perseus.usenetNOSPAM@xxxxxxxxx> wrote:In simple queries like:
select distinct c.* from customers c cross join orders o
where c.CustomerId = o.Customerid
vs.
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
the execution plan is the same. It could be that in database ABC this isn't the case however.
I'd be very surprised to see such a database, personally, as least outside the embedded or "hobby" market.
In multi-branch joins with expressions other than simple field compares, it can become very awkward to optimize the query, especially if one side (or both!) is for example a derived table (subquery). You'll then see that it's not as optimal as you'd have hoped it to be.
The main thing is: if you want the RDBMS to perform an equ-join or a left join, specify it as such and it will do so. Specifying a query with different constructs and then relying on the optimizer to make the best of it isn't optimal and will likely fail in some (and least acceptable) situations.
Sqlserver's optimizer is more advanced than the rest out there, although Oracle is catching up. On Oracle and especially DB2 you'll still see many different execution plans with the different join types especially with multi-branch joins and the more advanced where clauses.
Though there are still differences in performance. Take these two:
select c.*
from customers c cross join orders o
cross join employees e cross join [order details] od
where
c.CustomerId = o.CustomerId
and
e.EmployeeId = o.EmployeeId
and o.OrderId = od.OrderId
vs.
select c.*
from customers c inner join orders o
on c.CustomerId = o.CustomerId
inner join employees e on e.EmployeeId = o.EmployeeId
inner join [order details] od on o.OrderId = od.OrderId
THe server trace shows that the duration of the query first query is much longer than the second query.
It also can be that with more explicit joins and explicit where clauses, the optimizer gives up and the execution plans will differ. For example, in some queries, moving the where clause to the ON clause of a left join can increase performance. For example:
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
inner join [order details] od on
o.OrderId = od.Orderid
where c.Country='USA'
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
and c.Country='USA'
inner join [order details] od on
o.OrderId = od.Orderid
The second query has less reads than the first query.
Again, I'd expect a decent optimiser to be able to work out the equivalent between them. That's the nice thing about set operations - there's so much that an optimiser can do.
till the optimizer can't decide what to do because the # of options grows too large.
The main trick is that with more complicated joins, every next table
to join with the set is joined more efficiently if the set it has to
be joined with is smaller. So if you can already limit that set with
the filter you'll apply anyway, it will bring down the overall query
burden. With simple queries this might not be noticable, but these
are fast anyway. It's the large queries which are the point of
interest as these will also take the majority of the performance
eaten by the application.
Well you can easily add where clauses to each effective side of a join in LINQ:
from c in customers
where c.Country='USA'
inner join (from o in OrderDetails where o.TotalCost > 100m)
on c.CustomerId = o.CustomerId
'inner join' ? ;)
I was more thinking about 'on c.CustomerId = o.CustomerId && o.OrderDate > myOrderDate'
in left/right join situations (which are utterly stupidly formulated in linq) it's sometimes essential that the predicate is formulated with the join statement, to prevent row exclusion in some edge queries (combinations of inner and left operations in a chain)
It's not like LINQ passes the join in one SQL statement and then applies the where clause separately: the query optimiser has the full query to work with.That's not the point.
I thought it was worth pointing out as your earlier post could easily give people the wrong impression:
<quote>
The cross-join execution path is often less ideal as the cross-join
action by itself is first handled, and THEN the where clause is handled
to filter out rows you don't want.
</quote>
That made it sound (to me at least) like the query engine *couldn't* take any notice of the where clause until after the cross join had been handled, due to the way LINQ was doing things - which just isn't true. As it is, it's a matter of how good the query optimiser is.
The way how Linq is designed forces more cross-joins upon the database than with equivalent more flexible query languages, as the developer using Linq often has no other option but to use nested from clauses and a subsequential where clause. This then results in suboptimal queries in some cases in the DATABASE. I was talking about that database level, not in linq providers
However, in a linq provider, it's undoable to optimize away a cross join with a where into inner joins, so the developer has to hope the optimizer produces a solid optimized query, which isn't something the developer should hope for, if the alternative, inner/left joins, is already available though not through linq.
The point is that Linq contains a set-theory operator 'join' but has
just 1 argument for it and has no freedom how to formulate the
predicate to use in the join of the two sets. I know linq works with
sequences and not sets, but it would have been way better if 'join'
in Linq would have had normal predicate support and a join argument,
it wouldn't have hurt Linq in any way shape or form.
We've had this debate elsewhere before, and I think we'll have to agree to differ.
A sequence is a set with an order. As a sequence is a set, and the different join operations are operations for sets, it's logical to assume that a more specialized form of a set inherits these operations, otherwise a sequence wouldn't be a set. Limiting operations on the sequence therefore means limiting the freedom a developer has, which is IMHO unnecessary and solely done to make things less complicated for linq to objects.
Btw, The cross joins in MS code often rely on a T-SQL specific statement 'CROSS APPLY', which isn't a known statement in many other databases.
I don't see that as a problem if it's generated by a provider which knows it's talking to a SQL Server database. I'd expect any good provider to use any useful vendor-specific extensions when it can.
It shows you never wrote a linq provider otherwise you'd know what you wrote isn't doable in the case of cross apply: there's no alternative for databases which don't support cross apply. So if you have a linq query with two nested froms and a where with a count filter on one side where the count is on the other side for example, cross apply is necessary otherwise the query can't be formulated unless it's rewritten. So if the user has to run the query on oracle, it can't do anything else than simply give up.
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
.
- Follow-Ups:
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Michel Walsh
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Jon Skeet [C# MVP]
- Re: SQL to Linq - Left, Right and Inner Joins
- References:
- SQL to Linq - Left, Right and Inner Joins
- From: shapper
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Michel Walsh
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Frans Bouma [C# MVP]
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Michel Walsh
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Frans Bouma [C# MVP]
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Jon Skeet [C# MVP]
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Frans Bouma [C# MVP]
- Re: SQL to Linq - Left, Right and Inner Joins
- From: Jon Skeet [C# MVP]
- SQL to Linq - Left, Right and Inner Joins
- Prev by Date: Re: Excape sequence question
- Next by Date: Re: SQL to Linq - Left, Right and Inner Joins
- Previous by thread: Re: SQL to Linq - Left, Right and Inner Joins
- Next by thread: Re: SQL to Linq - Left, Right and Inner Joins
- Index(es):
Relevant Pages
|