Re: Finally which ORM tool?

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



(I hope I've actually finished all the sentences in here. I've been
called away *loads* of times when writing this.)

Frans Bouma [C# MVP] <perseus.usenetNOSPAM@xxxxxxxxx> wrote:
However it's not consistent: a variable passed to an extension
method used INSIDE the query is passed as the value immediately,
but the same variable passed to another extension method in a
lambda is passed as a memberaccess expression and not passed as its
value.

Firstly, it's not to do with extension methods at all. It's to do
with whether parameter is a lambda expression or not, and that's all
it has to do with.

sure, but you have to realize that. It's not always obvious. One could
argue that you have to know what everything ends up in but I find that
an excuse: the expression trees created are rather big sometimes and
sometimes different than you'd expect. A developer of code is living on
the level of C#, not on the level of expression trees and lambdas if
s/he uses native C# code. It's the same as knowing which IL is being
produced or which x86 code is being produced by the jit. I don't care,
and I also don't want to NEED to care, because if I need to care, the
abstraction level I'm living at is a facade, and there's no difference
with C++ with inline asm.

I don't think the developer should need to know the details of which
expression trees are generated, but I *do* think they need to know that
query expressions implicitly use lambda expressions. I also think they
need to know that lambda expressions are converted to either delegate
instances or expression trees depending on the situation. In other
words, I'd expect someone to know that:

from data in SomeSource
where data.SomeCondition
select data.SomeProjection

is translated into

SomeSource.Where(data => data.SomeCondition)
.Select(data => data.SomeProject)

If they know that much, then they know where the lambda expressions
are, and therefore which variables will be captured.

If they don't know that much, they shouldn't be using LINQ except to
find out what it's all about - in other words, to learn the above.

Now, as for consistency - it's consistent once you understand which
parts of a query expression are actually a shorthand for lambda
expressions. Is someone who doesn't want to learn the basics of query
expressions going to find that confusing? Yes. Should someone who
doesn't want to learn the basics of query expressions be using them
in production code? Absolutely not.

I don't see why one has to understand which parts are lambda's in the
expression trees (!) and which aren't if NO statements written use real
lambda's, all code is written using C# code, no lambdas in sight.

I didn't say which parts are lambdas in the expression trees. I said
which parts are lambdas in the query expressions. There's a *huge*
difference between query expressions and expression trees.

And lambdas *are* part of C# code, as are query expressions - so
someone using C# 3 and LINQ ought to know about both.

Also, my example of using the same variable in the where and in the
Take method doesn't show me why one is updated at execution time and
the other one isn't updated: why aren't BOTH updated? Because one is
translated into a lamdba expression in an expression tree under the
hood? Why do I even need to know that it is translated into an
expression tree? If I have to, it's a leaky abstraction.

It's not translated into a lambda expression in an expression tree -
it's translated into a lambda expression which is then converted into
an expression tree. It's very important to understand that the
translation is going on, because otherwise you won't understand
*anything* about what's going on.

Now, notice that the conditions/projections etc themselves aren't being
evaluated at the point of the query declaration - so why should the
values of variables within those conditions/projections be evaluated?
*That's* what I would find inconsistent.

There are all kinds of areas where if you have no idea what you're
doing, you can go wrong - there's nothing new in that. Lambda
expressions and query expressions aren't that hard, and education is
the key IMO.

query expressions aren't hard in general, but the details will kill a
lot of dreams, but that's MS' problem.

The thing is though that you can teach a C# developer how queries
work, and you don't have to educate them with how expression trees
work. That's also info not NEEDED to write correct queries on the
abstraction level of C#.

Which is why I didn't claim that they should know all about expression
trees. They need to know the broad concept, but not the details. They
*do* need to know about query expressions though, and the translations
which are performed.

Apply your consistency test to a mutable struct vs a mutable class,
with a value being passed to a method and then changed - you'll see
exactly the same "inconsistency". Does that mean we shouldn't have
the distinction between value types and reference types? No - it just
means that people need to know about the difference between them.

I used the SAME variable in two different places in the query. One
gets updated, the other one isn't when the variable changes value.
Sorry, but that's inconsistent behavior, and the reason is actually
irrelevant, because at the level of abstraction where the code is
written, there ARE NO expression trees and the query didn't contain any
lambdas, these are only created at runtime when the query is executed
and an expression tree is created.

Again I think you're confusing lamdba expressions with expression
trees. The expression trees are created at runtime, but the lambda
expressions are there at compile time, after the compiler has
translated the query expression as shown earlier.

That tree is often different than
what you've written in code. Relevant info? Why? Why does someone
writing linq queries have to think about expression trees? If that's
required info, why is this abstraction leaking into the level of C#'s
abstraction?

They don't need to think too deeply about the details of the expression
tree - but they *do* need to know that they're basically taking a
shortcut to writing lambda expressions.

so you're comfortable with creating a query q in method a, pass it
to method b and therefore requiring a session in method a, which is
for example not possible. Say I want to formulate what I want in
method A, but as I'm not allowed to directly use database access
code, I have to pass the specification to a layer where it IS
possible to use data access code. I now can't formulate the query
in method A, I have to pass what I want in a DIFFERENT
specification method. Also, where I specify it, I have to decide
which DB to use if I have a multi-db design.

You can use CompiledQuery for that sort of thing.

I looked up the (almost non-existend) docs about CompiledQuery but it
didn't tell me a lot of info. For example: is this compiled query
always usable, no matter what the provider is? No idea.

It's a LINQ-to-SQL specific feature, as I'd expect it to be. The
details of when a session is used, when the query is translated into
SQL etc is, and should be IMO, provider-specific.

It's not as if the enumerator is represented by a life cursor on a
resultset in the db.

Is that definitely true in all cases? I can see situations where it
would be very handy to effectively get a DataReader back turning
things into anonymous types on the fly. (Using full entities would
require remembering them all for uniqueness purposes, which would
negate a lot of the point of it, of course.)

If you want to kill your DB's performance, you should do that. :)
Keeping open a cursor means you keep open a resultset on the server.
That takes resources. If your resultset is pretty big, it can eat more
resources than you want to give up for a longer period of time.

I don't think the database should have to keep the whole evaluated
result set - or even match all the rows - unless you've asked it for
the ordering. After all, that's why you can't ask a result set for its
count without either explicitly using COUNT or basically fetching it
all.

However, I'm happy to accept your word for it being a bad idea.

<snip>

Sure, they can all use the simple syntaxis of selecting a set of
entities from a set, using a simple filter, but it quickly gets out
of hand. Take for example a silly method like .Distinct(), which
fails on linq to sql when a distinct voilating type is detected.

If you're asking for distinct values and the type violates
distinctness, why shouldn't it fail? Perhaps an example would help.

Because entity identity is verifyable when the data is read from the
db. (PK). So you can limit on teh client if you have to, by reading
enough rows till you're done. It's slightly slower but it's a way to
solve it.

Northwind: employee 1:n order. If you want all employees who have an
order filed for customers from the UK, you could do: (I use '*' for
simplicity here)

select e.*
from employees e inner join orders o on
e.EmployeeID = o.EmployeeID
inner join customers c
on o.CustomerID = c.CustomerID
where c.Country = 'UK'

Though, you'll get a lot of duplicates. So you apply distinct. But
that's not possible. So you have to filter on the client. You can,
because the PK identity of the entity instance (== the data!) is
available. No O/R mapper should give up with such a silly query.

Here's my LINQ query:

var query = from employee in context.Employees
join order in context.Orders
on employee equals order.Employee
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select employee;

query = query.Distinct();


and here's the generated SQL:

SELECT DISTINCT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate], [t0].
[HireDate], [t0].[Address], [t0].[City], [t0].[Region], [t0].
[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
CONVERT(VarBinary(MAX),[t0].[Photo]) AS [Photo], CONVERT(NVarChar(MAX),
[t0].[Notes]) AS [Notes], [t0].[ReportsTo], [t0].[PhotoPath]
FROM [dbo].[Employees] AS [t0]
INNER JOIN [dbo].[Orders] AS [t1]
ON [t0].[EmployeeID] = [t1].[EmployeeID]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID] WHERE [t2].[Country] = @p0

So LINQ to SQL didn't give up on it at all.

It's doing the DISTINCT on the database, and that wasn't even *trying*
to tweak it. Now, I wouldn't like to swear that SQL server will realise
that EmployeeID is distinct for distinct employees, so it only needs to
look at that part, but I'd certainly hope so.

What's better though is that you can also use subqueries to avoid the
duplicates:

select *
from employees where employeeID IN
(
select employeeID from
orders where CustomerID in
(
select customerID from customers
where country = 'UK'
)
)

No duplicates, no distinct needed. If you look closely, the execution
plan is the same on most db's.

The estimated execution plan from join version is very different, but I
don't know which would actually take longer.

However, let's see how close we can get to your query. This has the
same kind of feeling:

var query = from employee in context.Employees
where (from order in context.Orders
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select order.Employee).Contains(employee)
select employee;

which generates this:

SELECT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate],
[t0].[HireDate], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
[t0].[Photo], [t0].[Notes], [t0].[ReportsTo], [t0].[PhotoPath]
FROM [dbo].[Employees] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID]
LEFT OUTER JOIN [dbo].[Employees] AS [t3]
ON [t3].[EmployeeID] = [t1].[EmployeeID]
WHERE ([t3].[EmployeeID] = [t0].[EmployeeID])
AND ([t2].[Country] = @p0)
)

Alternatively, we can explicitly use the employee ID:

var query = from employee in context.Employees
where (from order in context.Orders
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select order.EmployeeID)
.Contains(employee.EmployeeID)
select employee;

which generates this:

SELECT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate],
[t0].[HireDate], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
[t0].[Photo], [t0].[Notes], [t0].[ReportsTo], [t0].[PhotoPath]
FROM [dbo].[Employees] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID]
WHERE ([t1].[EmployeeID] = ([t0].[EmployeeID]))
AND ([t2].[Country] = @p0)
)

That latter one looks pretty reasonable to me.

This is what I meant with tweakability below. In Linq I can't specify
to use the subquery train, I have to use joins, or rely on the provider
to be gentle for me. However, that last bit isn't possible: the o/r
mapper then has to know the db statistics about the sizes of the data
in the DB.

In what way have I not tweaked the query to use the subquery way of
doing things with the LINQ queries above?

A good query system allows this kind of simple tweakability.

Which I've just proved LINQ to SQL has. I should point out that those
were just the first three ways of doing it that I thought of. I haven't
actually tried to optimise it particularly.

Many, many queries are simple ones in my experience. It's nice to
have the ability to use the full power of the specific database or
LINQ provider when you need to, but it's also nice to have
consistency of querying when that's feasible.

Still, I'd have liked if they would have spend more time on this. They
could have added more standard elements but decided not to.

Well, I've shown that they've got DISTINCT in there. I know about the
LEFT OUTER JOIN issue, and I agree it's a shame. What else do you miss?

It gets really different when things like tweakability are added to
the equation. With linq, people have less control over how the SQL
looks like. This is actually pretty bad in the long run as the SQL
might for example use a subquery where it should have used a join
and vice versa. This can be solved with extension methods, but this
ties the query to the provider used.

Yes, if you absolutely have to tweak things, then that's fine - and I
fully believe that you ought to closely examine the SQL generated by
your LINQ provider - but there are many simple queries which don't
need tweaking.

You tell that to that team of DBAs which refuse to run your software
on their many TB big databases because they queries are too slow.

I have no problem with tweaking queries which need tweaking. If they
start claiming that *every* query, even the simplest "SELECT (fields)
FROM (table) WHERE (some simple condition)" needs extra tweaking, I'd
certainly need some evidence.

If you want to use an O/R mapper as a developer, and you find a team
of DBAs on the other side of the table and they refuse to accept the
fact that the queries are now generated by a program, you really have
to have your act together and proof that your queries are fast and
flexible to the schema and size of the table data, otherwise you're off
to hammering out stored proc call code all day.

Indeed. Good job that LINQ to SQL is reasonably tweakable then, isn't
it? And also that it lets you execute custom SQL when you need to.

This isn't something I cooked up just to be cocky. Many times we've
received emails from developers who wanted to use an o/r mapper and
they had to convince their boss and DBAs that the SQL produced is fast,
that the queries are tunable/tweakable so the DBAs will be happy.

Yes, I've no problem with that.

If the O/R mapper doesn't allow flexibility in that area, the o/r
mapper isn't going to be used a lot in the enterprise area where tables
can have millions of rows and each join has to be done with care.

Yup, and I reckon LINQ to SQL is *reasonably* flexible.

So, how is Linq supporting tweakability here? Does it for example
offer a simple IN subquery element, so the DEVELOPER can tweak the
code, based on the DBA's advice? Not really.

Well, the "Contains" is the broad equivalent of "IN" here. I don't know
how much difference there is between "WHERE EXISTS" and "IN",
admittedly - but I'd hope not a lot.

So the DEVELOPER, when
asked by the DBA if a costly join operation can be changed to a
subquery using query or vice versa (as they have both sweetspots), can
only answer: No, I can't do that. 10 to 1 the DBA will then say a
stored proc will be used instead.

Except that I've shown that the developer can offer a range of options
- and that was without trying hard.

That leaves the smaller simple stuff for the o/r mapper, and keeps in
place the myth that stored procs are the way to go when it comes to
serious data-access. While it's unnecessary, a query system should be
flexible enough to offer these kind of tweaks.

Absolutely.

I'm not saying this is a bad thing per se, Linq offers extension
methods, which are ideal for solving these problems, however it has
a price, and that price is giving up provider-independency.

Yup - so you pay that price when you need to, and when you don't need
to you've still got independence.

if you have to branch out to custom code, independence is gone for
100%.

Not at all. If I can do almost all my work using the same query
expressions for everything, but need just a few places where I tweak
the SQL or the query expression to satisfy a particular database, the
independence of the bulk of the work is still very useful. Likewise if
I can apply the same query expressions in LINQ to SQL as in LINQ to
Objects for a large proportion of my work, but tweak some of the LINQ
to SQL where necessary - I'm still gaining broad readability and
consistency, IMO.

That said, I don't think it's possible to create a 100% independent
query system. It just has to clear that for the people who think that
Linq WILL bring you that independent system, it's a facade, there is no
such thing as an independent system: you ALWAYS will have o/r mapper
specific code in your application, unless you abstract away everything,
which also has a (sometimes big) pricetag.

Yup, totally agreed. The abstraction will *always* be leaky - otherwise
we wouldn't need to look at the SQL, after all.

That doesn't mean there isn't a big benefit, of course.

[ADO.NET Entities]

I will be very disappointed if they don't go for a multi-db
design.

One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd
parties is that their original design, where the ado.net provider
had little to do to get things done has been changed to make it a
lot of work to get things done for the ado.net provider, which
means that the 3rd party ado.net provider has to implement a lot of
code to work with the EDM.

That's fairly reasonable - it's good to let the third parties make
their own providers work as well as possible.

though if it takes a lot of work, it will take a long time before open
source databases for example have implemented a provider. These things
aren't simple.

Possibly - I guess it depends if they see a big benefit. Mono had a
working implementation of C# 2 features long before .NET 2.0 was
actually released :) I know that's a simpler scenario, but it shows
that open source folks can certainly work quickly when they see the
benefit.

If they had understood it, they wouldn't have made IProvider
internal for linq to sql, so linq to sql (which had a multi-db
design at first) could be used on multiple db's as well.

Well, don't forget that LINQ to SQL is (as I understand it) a very
different team to the ADO.NET side of things.

Though it wouldn't have taken any more effort. Now they apparently
didn't design it in, (otherwise the design would be open and anyone
would be able to write a provider) so the design is targeted towards 1
db, which is IMHO odd as it doesn't take that much effort in a system
which is already largely designed around providers anyway.

It would have taken more effort, in the same way that there are all the
different dialects in NHibernate. You've got to work out how to do
paging for each database, "LIKE" queries for each database (Hibernate 3
used to get this wrong, btw - it didn't escape '%' when it was part of
a LIKE query; I don't know if this has been fixed) etc.

It would certainly have been nice though.

EDM is a core part of sqlserver 2008. Any db vendor not having an
EDM provider undermines the success of EDM: if only MS releases a
provider, which only works for sqlserver 2008, will it succeed?
Unlikely, because it's a separate download for developers, it's not
part of the .net framework.

So is the Oracle data provider, but that's pretty well used. Ditto
NUnit :)

That's different. If a .NET developer wants to use Oracle, chances are
s/he won't use the MS oracle provider, simply because of it
limitations. So there's a necessity. The EDM additional download isn't
a core citizen inside vs.net 2008, nor in .NET 3.5. Therefore you won't
build momentum around it as it would have had when it was released WITH
vs.net 2008 and .net 3.5.

It won't have as much momentum, no - but it could still have easily
enough to make it viable and compelling for third parties.

--
Jon Skeet - <skeet@xxxxxxxxx>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
.


Quantcast