Re: Finally which ORM tool?
- From: "Frans Bouma [C# MVP]" <perseus.usenetNOSPAM@xxxxxxxxx>
- Date: Thu, 18 Oct 2007 04:45:52 -0700
(snipped a lot away, as a lot has been said already in this thread :))
Jon Skeet [C# MVP] wrote:
Frans Bouma [C# MVP] <perseus.usenetNOSPAM@xxxxxxxxx> wrote:
Well, I think query expressions are clearer than text - as well
as being more easily verifiable by the compiler, of course
(within the bounds of being valid expressions - the compiler
can't determine whether or not there will be a valid SQL
translation).
As for the captured variable aspect of it - I still believe it's
a matter of education and becoming used to them, as well as not
abusing them. You can certainly get yourself into trouble pretty
easily, but then again you can also avoid getting yourself into
trouble fairly easily.
However it's not consistent: a variable passed to an extension
method used INSIDE the query is passed as the value immediately,
but the same variable passed to another extension method in a
lambda is passed as a memberaccess expression and not passed as its
value.
Firstly, it's not to do with extension methods at all. It's to do
with whether parameter is a lambda expression or not, and that's all
it has to do with.
sure, but you have to realize that. It's not always obvious. One could
argue that you have to know what everything ends up in but I find that
an excuse: the expression trees created are rather big sometimes and
sometimes different than you'd expect. A developer of code is living on
the level of C#, not on the level of expression trees and lambdas if
s/he uses native C# code. It's the same as knowing which IL is being
produced or which x86 code is being produced by the jit. I don't care,
and I also don't want to NEED to care, because if I need to care, the
abstraction level I'm living at is a facade, and there's no difference
with C++ with inline asm.
Now, as for consistency - it's consistent once you understand which
parts of a query expression are actually a shorthand for lambda
expressions. Is someone who doesn't want to learn the basics of query
expressions going to find that confusing? Yes. Should someone who
doesn't want to learn the basics of query expressions be using them
in production code? Absolutely not.
I don't see why one has to understand which parts are lambda's in the
expression trees (!) and which aren't if NO statements written use real
lambda's, all code is written using C# code, no lambdas in sight.
Also, my example of using the same variable in the where and in the
Take method doesn't show me why one is updated at execution time and
the other one isn't updated: why aren't BOTH updated? Because one is
translated into a lamdba expression in an expression tree under the
hood? Why do I even need to know that it is translated into an
expression tree? If I have to, it's a leaky abstraction.
There are all kinds of areas where if you have no idea what you're
doing, you can go wrong - there's nothing new in that. Lambda
expressions and query expressions aren't that hard, and education is
the key IMO.
query expressions aren't hard in general, but the details will kill a
lot of dreams, but that's MS' problem.
The thing is though that you can teach a C# developer how queries
work, and you don't have to educate them with how expression trees
work. That's also info not NEEDED to write correct queries on the
abstraction level of C#.
Apply your consistency test to a mutable struct vs a mutable class,
with a value being passed to a method and then changed - you'll see
exactly the same "inconsistency". Does that mean we shouldn't have
the distinction between value types and reference types? No - it just
means that people need to know about the difference between them.
I used the SAME variable in two different places in the query. One
gets updated, the other one isn't when the variable changes value.
Sorry, but that's inconsistent behavior, and the reason is actually
irrelevant, because at the level of abstraction where the code is
written, there ARE NO expression trees and the query didn't contain any
lambdas, these are only created at runtime when the query is executed
and an expression tree is created. That tree is often different than
what you've written in code. Relevant info? Why? Why does someone
writing linq queries have to think about expression trees? If that's
required info, why is this abstraction leaking into the level of C#'s
abstraction?
Having said that, I also don't have much problem with the way it's
been done.
so you're comfortable with creating a query q in method a, pass it
to method b and therefore requiring a session in method a, which is
for example not possible. Say I want to formulate what I want in
method A, but as I'm not allowed to directly use database access
code, I have to pass the specification to a layer where it IS
possible to use data access code. I now can't formulate the query
in method A, I have to pass what I want in a DIFFERENT
specification method. Also, where I specify it, I have to decide
which DB to use if I have a multi-db design.
You can use CompiledQuery for that sort of thing.
I looked up the (almost non-existend) docs about CompiledQuery but it
didn't tell me a lot of info. For example: is this compiled query
always usable, no matter what the provider is? No idea.
Something which is enumerable, isn't that semantically a
sequence, a set for you?
Sequence and set certainly aren't the same thing, but I'd say
that something which is enumerable is either a sequence in itself
or is a way of getting at a sequence. Think of it in terms of the
method: GetEnumerator returns an enumerator for the data.
There's nothing inconsistent in that being applied to a data
source rather than something which already contains the data.
Sequence and set aren't equal, true, but in this case, where linq
queries are executed on the db, the difference IS a bit artificial,
as the query IS fetched first, so the query object contains the
whole resultset requested, onto which the enumerator is created.
It's not as if the enumerator is represented by a life cursor on a
resultset in the db.
Is that definitely true in all cases? I can see situations where it
would be very handy to effectively get a DataReader back turning
things into anonymous types on the fly. (Using full entities would
require remembering them all for uniqueness purposes, which would
negate a lot of the point of it, of course.)
If you want to kill your DB's performance, you should do that. :)
Keeping open a cursor means you keep open a resultset on the server.
That takes resources. If your resultset is pretty big, it can eat more
resources than you want to give up for a longer period of time.
That's also why processing on the client is better of using batch
processing, i.e. page through a resultset.
Even if it's not true for LINQ to SQL, it could be true for other
LINQ providers in the future.
I doubt it.
A forward only cursor on the resultset which is already read in
full, isn't that helpful in a lot of cases either, you want the set
to work with.
Sometimes you do, sometimes you just want to process a record at a
time. Even if you're batching things, you may well not want to read
the whole batch in one go.
then you page through the batch. What if processing a row costs 2
seconds and you have 1000 rows, that's 2000 seconds before the
resultset is closed.
Sure, they can all use the simple syntaxis of selecting a set of
entities from a set, using a simple filter, but it quickly gets out
of hand. Take for example a silly method like .Distinct(), which
fails on linq to sql when a distinct voilating type is detected.
If you're asking for distinct values and the type violates
distinctness, why shouldn't it fail? Perhaps an example would help.
Because entity identity is verifyable when the data is read from the
db. (PK). So you can limit on teh client if you have to, by reading
enough rows till you're done. It's slightly slower but it's a way to
solve it.
Northwind: employee 1:n order. If you want all employees who have an
order filed for customers from the UK, you could do: (I use '*' for
simplicity here)
select e.*
from employees e inner join orders o on
e.EmployeeID = o.EmployeeID
inner join customers c
on o.CustomerID = c.CustomerID
where c.Country = 'UK'
Though, you'll get a lot of duplicates. So you apply distinct. But
that's not possible. So you have to filter on the client. You can,
because the PK identity of the entity instance (== the data!) is
available. No O/R mapper should give up with such a silly query.
What's better though is that you can also use subqueries to avoid the
duplicates:
select *
from employees where employeeID IN
(
select employeeID from
orders where CustomerID in
(
select customerID from customers
where country = 'UK'
)
)
No duplicates, no distinct needed. If you look closely, the execution
plan is the same on most db's.
This is what I meant with tweakability below. In Linq I can't specify
to use the subquery train, I have to use joins, or rely on the provider
to be gentle for me. However, that last bit isn't possible: the o/r
mapper then has to know the db statistics about the sizes of the data
in the DB.
A good query system allows this kind of simple tweakability.
As Linq relies on extension methods, it simply depends on what
kind of extension methods are implemented for the provider used.
For example, DefaultIfEmpty(), the stupid method which signals
left/right join. Is it possible to rely on this method to get a
left/right join, something FUNDAMENTAL to SQL? No.
So, it looks good on paper, but the common demeanor is pretty
small in this case.
Many, many queries are simple ones in my experience. It's nice to
have the ability to use the full power of the specific database or
LINQ provider when you need to, but it's also nice to have
consistency of querying when that's feasible.
Still, I'd have liked if they would have spend more time on this. They
could have added more standard elements but decided not to.
It gets really different when things like tweakability are added to
the equation. With linq, people have less control over how the SQL
looks like. This is actually pretty bad in the long run as the SQL
might for example use a subquery where it should have used a join
and vice versa. This can be solved with extension methods, but this
ties the query to the provider used.
Yes, if you absolutely have to tweak things, then that's fine - and I
fully believe that you ought to closely examine the SQL generated by
your LINQ provider - but there are many simple queries which don't
need tweaking.
You tell that to that team of DBAs which refuse to run your software
on their many TB big databases because they queries are too slow.
If you want to use an O/R mapper as a developer, and you find a team
of DBAs on the other side of the table and they refuse to accept the
fact that the queries are now generated by a program, you really have
to have your act together and proof that your queries are fast and
flexible to the schema and size of the table data, otherwise you're off
to hammering out stored proc call code all day.
This isn't something I cooked up just to be cocky. Many times we've
received emails from developers who wanted to use an o/r mapper and
they had to convince their boss and DBAs that the SQL produced is fast,
that the queries are tunable/tweakable so the DBAs will be happy.
If the O/R mapper doesn't allow flexibility in that area, the o/r
mapper isn't going to be used a lot in the enterprise area where tables
can have millions of rows and each join has to be done with care.
So, how is Linq supporting tweakability here? Does it for example
offer a simple IN subquery element, so the DEVELOPER can tweak the
code, based on the DBA's advice? Not really. So the DEVELOPER, when
asked by the DBA if a costly join operation can be changed to a
subquery using query or vice versa (as they have both sweetspots), can
only answer: No, I can't do that. 10 to 1 the DBA will then say a
stored proc will be used instead.
That leaves the smaller simple stuff for the o/r mapper, and keeps in
place the myth that stored procs are the way to go when it comes to
serious data-access. While it's unnecessary, a query system should be
flexible enough to offer these kind of tweaks.
I'm not saying this is a bad thing per se, Linq offers extension
methods, which are ideal for solving these problems, however it has
a price, and that price is giving up provider-independency.
Yup - so you pay that price when you need to, and when you don't need
to you've still got independence.
if you have to branch out to custom code, independence is gone for
100%.
That said, I don't think it's possible to create a 100% independent
query system. It just has to clear that for the people who think that
Linq WILL bring you that independent system, it's a facade, there is no
such thing as an independent system: you ALWAYS will have o/r mapper
specific code in your application, unless you abstract away everything,
which also has a (sometimes big) pricetag.
[ADO.NET Entities]
I will be very disappointed if they don't go for a multi-db
design.
One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd
parties is that their original design, where the ado.net provider
had little to do to get things done has been changed to make it a
lot of work to get things done for the ado.net provider, which
means that the 3rd party ado.net provider has to implement a lot of
code to work with the EDM.
That's fairly reasonable - it's good to let the third parties make
their own providers work as well as possible.
though if it takes a lot of work, it will take a long time before open
source databases for example have implemented a provider. These things
aren't simple.
It's not going to stop other projects from going multi-db (such as
LINQ to NHibernate) - it would just limit the usefulness of
ADO.NET Entities. Put it this way: people aren't likely to change
their database choice based on what ADO.NET 3 supports, but they
may well change their framework based on their choice of
database. MS would be foolish not to understand that.
If they had understood it, they wouldn't have made IProvider
internal for linq to sql, so linq to sql (which had a multi-db
design at first) could be used on multiple db's as well.
Well, don't forget that LINQ to SQL is (as I understand it) a very
different team to the ADO.NET side of things.
Though it wouldn't have taken any more effort. Now they apparently
didn't design it in, (otherwise the design would be open and anyone
would be able to write a provider) so the design is targeted towards 1
db, which is IMHO odd as it doesn't take that much effort in a system
which is already largely designed around providers anyway.
EDM is a core part of sqlserver 2008. Any db vendor not having an
EDM provider undermines the success of EDM: if only MS releases a
provider, which only works for sqlserver 2008, will it succeed?
Unlikely, because it's a separate download for developers, it's not
part of the .net framework.
So is the Oracle data provider, but that's pretty well used. Ditto
NUnit :)
That's different. If a .NET developer wants to use Oracle, chances are
s/he won't use the MS oracle provider, simply because of it
limitations. So there's a necessity. The EDM additional download isn't
a core citizen inside vs.net 2008, nor in .NET 3.5. Therefore you won't
build momentum around it as it would have had when it was released WITH
vs.net 2008 and .net 3.5.
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
.
- Follow-Ups:
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Re: Finally which ORM tool?
- References:
- Finally which ORM tool?
- From: AliRezaGoogle
- Re: Finally which ORM tool?
- From: James Crosswell
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Re: Finally which ORM tool?
- From: Frans Bouma [C# MVP]
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Re: Finally which ORM tool?
- From: Frans Bouma [C# MVP]
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Re: Finally which ORM tool?
- From: Frans Bouma [C# MVP]
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Re: Finally which ORM tool?
- From: Frans Bouma [C# MVP]
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Re: Finally which ORM tool?
- From: Frans Bouma [C# MVP]
- Re: Finally which ORM tool?
- From: Jon Skeet [C# MVP]
- Finally which ORM tool?
- Prev by Date: Re: A question about replacing a process with another process
- Next by Date: Re: LINQ. Who knows how to do this?
- Previous by thread: Re: Finally which ORM tool?
- Next by thread: Re: Finally which ORM tool?
- Index(es):