Re: SQL to Linq - Left, Right and Inner Joins



Jon Skeet [C# MVP] wrote:
Frans Bouma [C# MVP] <perseus.usenetNOSPAM@xxxxxxxxx> wrote:
In simple queries like:
select distinct c.* from customers c cross join orders o
where c.CustomerId = o.Customerid

vs.
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId

the execution plan is the same. It could be that in database ABC this isn't the case however.

I'd be very surprised to see such a database, personally, as least outside the embedded or "hobby" market.

In multi-branch joins with expressions other than simple field compares, it can become very awkward to optimize the query, especially if one side (or both!) is for example a derived table (subquery). You'll then see that it's not as optimal as you'd have hoped it to be.

The main thing is: if you want the RDBMS to perform an equ-join or a left join, specify it as such and it will do so. Specifying a query with different constructs and then relying on the optimizer to make the best of it isn't optimal and will likely fail in some (and least acceptable) situations.

Sqlserver's optimizer is more advanced than the rest out there, although Oracle is catching up. On Oracle and especially DB2 you'll still see many different execution plans with the different join types especially with multi-branch joins and the more advanced where clauses.

Though there are still differences in performance. Take these two:

select c.*
from customers c cross join orders o
cross join employees e cross join [order details] od
where
c.CustomerId = o.CustomerId
and
e.EmployeeId = o.EmployeeId
and o.OrderId = od.OrderId

vs.
select c.*
from customers c inner join orders o
on c.CustomerId = o.CustomerId
inner join employees e on e.EmployeeId = o.EmployeeId
inner join [order details] od on o.OrderId = od.OrderId

THe server trace shows that the duration of the query first query is much longer than the second query.

It also can be that with more explicit joins and explicit where clauses, the optimizer gives up and the execution plans will differ. For example, in some queries, moving the where clause to the ON clause of a left join can increase performance. For example:
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
inner join [order details] od on
o.OrderId = od.Orderid
where c.Country='USA'

select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
and c.Country='USA'
inner join [order details] od on
o.OrderId = od.Orderid

The second query has less reads than the first query.

Again, I'd expect a decent optimiser to be able to work out the equivalent between them. That's the nice thing about set operations - there's so much that an optimiser can do.

till the optimizer can't decide what to do because the # of options grows too large.

The main trick is that with more complicated joins, every next table
to join with the set is joined more efficiently if the set it has to
be joined with is smaller. So if you can already limit that set with
the filter you'll apply anyway, it will bring down the overall query
burden. With simple queries this might not be noticable, but these
are fast anyway. It's the large queries which are the point of
interest as these will also take the majority of the performance
eaten by the application.

Well you can easily add where clauses to each effective side of a join in LINQ:

from c in customers
where c.Country='USA'
inner join (from o in OrderDetails where o.TotalCost > 100m)
on c.CustomerId = o.CustomerId

'inner join' ? ;)

I was more thinking about 'on c.CustomerId = o.CustomerId && o.OrderDate > myOrderDate'

in left/right join situations (which are utterly stupidly formulated in linq) it's sometimes essential that the predicate is formulated with the join statement, to prevent row exclusion in some edge queries (combinations of inner and left operations in a chain)

It's not like LINQ passes the join in one SQL statement and then applies the where clause separately: the query optimiser has the full query to work with.
That's not the point.

I thought it was worth pointing out as your earlier post could easily give people the wrong impression:

<quote>
The cross-join execution path is often less ideal as the cross-join
action by itself is first handled, and THEN the where clause is handled
to filter out rows you don't want.
</quote>

That made it sound (to me at least) like the query engine *couldn't* take any notice of the where clause until after the cross join had been handled, due to the way LINQ was doing things - which just isn't true. As it is, it's a matter of how good the query optimiser is.

The way how Linq is designed forces more cross-joins upon the database than with equivalent more flexible query languages, as the developer using Linq often has no other option but to use nested from clauses and a subsequential where clause. This then results in suboptimal queries in some cases in the DATABASE. I was talking about that database level, not in linq providers

However, in a linq provider, it's undoable to optimize away a cross join with a where into inner joins, so the developer has to hope the optimizer produces a solid optimized query, which isn't something the developer should hope for, if the alternative, inner/left joins, is already available though not through linq.

The point is that Linq contains a set-theory operator 'join' but has
just 1 argument for it and has no freedom how to formulate the
predicate to use in the join of the two sets. I know linq works with
sequences and not sets, but it would have been way better if 'join'
in Linq would have had normal predicate support and a join argument,
it wouldn't have hurt Linq in any way shape or form.

We've had this debate elsewhere before, and I think we'll have to agree to differ.

A sequence is a set with an order. As a sequence is a set, and the different join operations are operations for sets, it's logical to assume that a more specialized form of a set inherits these operations, otherwise a sequence wouldn't be a set. Limiting operations on the sequence therefore means limiting the freedom a developer has, which is IMHO unnecessary and solely done to make things less complicated for linq to objects.

Btw, The cross joins in MS code often rely on a T-SQL specific statement 'CROSS APPLY', which isn't a known statement in many other databases.

I don't see that as a problem if it's generated by a provider which knows it's talking to a SQL Server database. I'd expect any good provider to use any useful vendor-specific extensions when it can.

It shows you never wrote a linq provider otherwise you'd know what you wrote isn't doable in the case of cross apply: there's no alternative for databases which don't support cross apply. So if you have a linq query with two nested froms and a where with a count filter on one side where the count is on the other side for example, cross apply is necessary otherwise the query can't be formulated unless it's rewritten. So if the user has to run the query on oracle, it can't do anything else than simply give up.

FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
.



Relevant Pages

  • Re: SQL to Linq - Left, Right and Inner Joins
    ... from customers c cross join orders o ... cross join employees e cross join [order details] od ... THe server trace shows that the duration of the query first query is much ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Query training -- Complex queries
    ... Second, let's take a little example, you could run in you Query Analyzer: ... from authors inner join authorsx ... A cross join that does not have a WHERE clause produces the Cartesian ... The size of a Cartesian product ...
    (microsoft.public.sqlserver.server)
  • Re: Help writing a query
    ... >> in the query designer. ... The key to it is that it's a CROSS JOIN ... > between groups and scenarios yielding the 15 rows in my example above. ... > ending up with syntax errors. ...
    (microsoft.public.access.queries)
  • Re: Please need help with Cross Join query
    ... It sounds to me like you want a cross join, ... union all ... >The problem with this query: ... >from SIZES s full join ...
    (microsoft.public.sqlserver.programming)
  • Re: Help with query from FoxPro
    ... "DAVID" wrote: ... Then join to the query instead of joining to tk_ctrl. ... you want to avoid having a cross join ... When you design your own tables, put a dummy join field ...
    (microsoft.public.access.queries)