Re: Finally which ORM tool?



Jon Skeet [C# MVP] wrote:
Frans Bouma [C# MVP] <perseus.usenetNOSPAM@xxxxxxxxx> wrote:
Isn't that going into the red zone of 'magic programming' ? I
mean, you have a local variable, even if it's a value typed
variable, and you have a linq query and by changing the
variable's value, you manipulate the linq query IF you're
executing it at that moment.

It's probably slightly "magic" at the moment. I don't think it
will be for long. Sooner or later, devs are going to have to
understand what closures really are, and how with captured
variables it really is the variable which is captured, not its
value.

'Closures', now there's a word that's overloaded too many times :).
Are we talking about sets or graph paths? :)

Um, closures the fairly standard computer science term:
http://en.wikipedia.org/wiki/Closure_(computer_science)

Yet, I find it a confusing term, as the closures in logic and the
closures in math are often used in our field as well, take for example
the mathematics specific closure definition, especially in the context
of a query ;), hence my question.

True. I saw your blogpost today with the query which looked a bit
obscure to me. Sure, it's understandable if you look closely, but
that's not the point. The point is that one easily overlooks the
real specifics of the query: a human in general has a hard time
grasping any form of computer language, simply because the human
has to interpret the code before understanding what it REALLY does.

That was deliberately nasty code. It would be fairly hard to
accidentally do that sort of thing, and I would certainly discourage
its use in real code.

I don't know if your particular example isn't going to pop up
regularly, but I do know that most developers think that a query is an
imperative piece of code, and often fall into a trap where they want to
re-use elements of the result of the query as the input of the query,
which is what you illustrated.

And it is a declaration, but one which captures the variables. I
really don't think it's that bad, once you get used to it - it's
very similar to the whole "passing a reference type argument by
value doesn't mean all the data is copied, just the reference".
Once you accept where changes will be reflected, it's quite easy
to work with that.

Though it's a way to make things really complicated without a lot
of effort, or better: you somewhat have to PUT IN effort to make it
very clear. Perhaps not for the person who wrote the code and who
has spend weeks designing and writing it, but for the poor sods who
have to read the code after the genius left for another job.

Well, I think query expressions are clearer than text - as well as
being more easily verifiable by the compiler, of course (within the
bounds of being valid expressions - the compiler can't determine
whether or not there will be a valid SQL translation).

As for the captured variable aspect of it - I still believe it's a
matter of education and becoming used to them, as well as not abusing
them. You can certainly get yourself into trouble pretty easily, but
then again you can also avoid getting yourself into trouble fairly
easily.

However it's not consistent: a variable passed to an extension method
used INSIDE the query is passed as the value immediately, but the same
variable passed to another extension method in a lambda is passed as a
memberaccess expression and not passed as its value.

I find that inconsistent behavior, because the elements in the query
are all elements of that query, so why is one more special than the
other?

var q = (from c in customers
where c.Orders.Count == amount
select c).Take(amount);

amount is a variable. The first amount is passed as a memberaccess
expression because it's resulting in a lambda (that isn't shown here,
mind you, all I see is a boolean expression in THIS code), and the
second amount is passed as a value.

When I now change amount, before execution of q, it will change the
where lambda, but not the Take method.

Now, and how is this consistent? Sure, there's always an explanation,
however calling THIS consistent makes a lot of other stuff look
consistent all of a sudden as well...

It's an IEnumerable<T>, and therefore a resultset. If it would
be a definition alone, the queries of CHOPS would have been
fetched, simply because the declaration construction was with
'CHOPS'.

I don't see the logic in your first definition. Suppose it didn't
implement IEnumerable<T> but had an Execute() method which gave
back an IEnumerator<T> instead - in other words, we just changed
a method name and in doing so removed an interface
implementation. How can that change whether the object is itself
a resultset or not? Or would it still be a resultset in your view?

Because it implements IEnumerable, it by itself is an enumerable
resource. This IMHO implies that it's a set.

To me it impllies it's the source of a sequence.

List<T> also implements IEnumerable. Or most other collections for
that matter. Does IEnumerable on this object an outside resource? Or
does it imply you enumerate over the data INSIDE the object?

To me it implies the latter, and I fail to see why it's all of a
sudden completely different with a Queryable object.

If it had an Execute method (some o/r mappers add that method), the
thing still is that it's not a declaration alone. You can't execute
a declaration without the executor who interprets the query so it
gets executed.

Well, it's a few things:
1) It relates to a schema - crucial for keeping type safety etc

no, it relates to a model. The relational schema is totally not
relevant here. Example: I can create a linq query using entity meta
data and execute it on oracle or on sqlserver, with different mappings.
Does it matter? No. Not for the query. For example, say I have entity
Customer mapped onto a view in oracle and on a table in sqlserver...

2) It knows about its current session

but why?

3) It's the query itself
4) It's the means of executing the query

Same as with 2): why is this? What big problem does this solve? IMHO
it only creates problems.

I would have no objection to the idea of it not implementing
IEnumerable<T> directly but instead having an Execute method taking
the session. It would make a few bits of code a bit more long-winded,
but that's all.

that would already be better.

Having said that, I also don't have much problem with the way it's
been done.

so you're comfortable with creating a query q in method a, pass it to
method b and therefore requiring a session in method a, which is for
example not possible. Say I want to formulate what I want in method A,
but as I'm not allowed to directly use database access code, I have to
pass the specification to a layer where it IS possible to use data
access code. I now can't formulate the query in method A, I have to
pass what I want in a DIFFERENT specification method. Also, where I
specify it, I have to decide which DB to use if I have a multi-db
design.

For Linq to sql that's not of their concern, so you also don't see a
solution for that in their design, but it IS a problem.

With Linq they placed the executor inside the declaration. Why, I
have no idea, the only guess I have is that it's 'easier' for some
people who have no clue what they're doing anyway, but then again,
these people will have a hard time with some areas of linq anyway
so why bother introducing this 'feature' for them.

The analogy is a simple SQL string: you can't execute it without a
SQL engine. To use it as a metaphore here, this is the same as linq:
string query = "SELECT * FROM Customers WHERE Country='USA'";

foreach(var customer in query)
{
// do something with customer
}

Everyone will say: "you can't execute a string". No of course you
can't, as it contains the declaration of a query. You need an
execution engine to execute the query. That would have been better
IMHO, because it separates declaration from execution, which are
combined in a linq query.

Well, you need context - and that's what the DataContext (and the
tables off it) really provides.

I really don't think it's nearly as ugly as you seem to be making it
out to be though.

If you think I'm alone in this, you're mistaken. ;). I just find it
rather odd, that a lot of people spend hours and hours a day to
separate concerns in their design, yet this in-your-face combination of
concerns is apparently acceptable.

The query itself doesn't contain the data, it simply contains all
the information required to fetch the data. That doesn't make it
a resultset in my view. It's all a matter of definition though.

Something which is enumerable, isn't that semantically a sequence,
a set for you?

Sequence and set certainly aren't the same thing, but I'd say that
something which is enumerable is either a sequence in itself or is a
way of getting at a sequence. Think of it in terms of the method:
GetEnumerator returns an enumerator for the data. There's nothing
inconsistent in that being applied to a data source rather than
something which already contains the data.

Sequence and set aren't equal, true, but in this case, where linq
queries are executed on the db, the difference IS a bit artificial, as
the query IS fetched first, so the query object contains the whole
resultset requested, onto which the enumerator is created.

It's not as if the enumerator is represented by a life cursor on a
resultset in the db.

A forward only cursor on the resultset which is already read in full,
isn't that helpful in a lot of cases either, you want the set to work
with. ToList() creates a copy, as it's already in a set. Little things
which show that:

IList result = q.Execute<IList>();

would have been better (not ideal, the query still executes itself).

I mean: IF the user is interested in a forward only cursor on the
resultset, give the user a forward only cursor on the resultset.
However the set is already fetched, in full, so in that case, simply
return the set and be done with it, as the user wants the set, not a
cursor, as the set is what the query defines and what the user wanted.

Because the linq query isn't a declaration alone, you can't do
things with it like pass it to an execution engine of choice, you
have to execute it with the engine inside the query, or you have to
be lucky to be using a linq provider which gives you this
flexibility ;).

Sure - but the beautiful thing about LINQ (instead of LINQ to SQL) is
that different providers can choose their own way to go on this and
people can still use broadly the same query syntax.

That's only true on paper. Every Linq provider will implement
extension methods, which are specific for that linq provider. For
example we have extension methods for paging (as skip/take isn't going
to cut it, in this case) and adding prefetch paths to the query, and
likely more when we're completely done programming. The thing is that
others have different extension methods, and it's precisely THOSE
extension methods which make things interesting.

Sure, they can all use the simple syntaxis of selecting a set of
entities from a set, using a simple filter, but it quickly gets out of
hand. Take for example a silly method like .Distinct(), which fails on
linq to sql when a distinct voilating type is detected.

As Linq relies on extension methods, it simply depends on what kind of
extension methods are implemented for the provider used. For example,
DefaultIfEmpty(), the stupid method which signals left/right join. Is
it possible to rely on this method to get a left/right join, something
FUNDAMENTAL to SQL? No.

So, it looks good on paper, but the common demeanor is pretty small in
this case.

If you choose to
implement LINQ in a way which requires a context to be provided to
it, that's fine - and the query will still be easily recognisable to
someone who has used LINQ to NHibernate, or LINQ to SQL.

To some extend.

It's always easy to explain things using boring simple queries. Those
aren't the problem. The problems arise when things get more complicated
than for example a single value list from a single entity set: at that
moment, the core C#/VB.NET syntaxis for queries isn't enough or will
differ a lot from the expression tree generated so the o/r mapper will
likely require the use of extension methods on one hand and will have
to make decisions what caused this particular subtree to be there on
the other hand (as that's not always obvious, like with DefaultIfEmpty,
or multiple from clauses which result in nested SelectMany calls)

It gets really different when things like tweakability are added to
the equation. With linq, people have less control over how the SQL
looks like. This is actually pretty bad in the long run as the SQL
might for example use a subquery where it should have used a join and
vice versa. This can be solved with extension methods, but this ties
the query to the provider used.

I'm not saying this is a bad thing per se, Linq offers extension
methods, which are ideal for solving these problems, however it has a
price, and that price is giving up provider-independency.

The thing is though
that I find it very important that people understand that the place
where the executor is located in a linq query is an important issue,
and not something you can simply wave away as 'not that important'.
The thing is that it severily limits the user of the framework in
how s/he wants to use the query, while it doesn't give the user any
real benefits, except perhaps the 1 line of code they don't have to
type now.

How does it "severely limit" the user of the framework? If you don't
want to change the parameters, don't change the values of the
variables - I don't think it's something that people are likely to do
accidentally anyway, to be honest.

Not only the parameter stuff, also the ability to specify on which
context/session/adapter they want to execute the query is a thing which
is hard/not possible to do. It also implies that when creating the
query you NEED a session/context/adapter, which is in a lot of cases
not possible, simply because the context/session isn't known at that
point, or not available because at that spot it's not allowed to cut
corners and access the db for example.

Now, you also bring up another topic: separating the query from
its data connection. You can't easily separate it from its whole
data context in terms of the types involved, because at that
point you lose a lot of the benefits of LINQ - but I could
certainly envisage separating it from a "live" context. I don't
know what LINQ to Entities will have, but I wouldn't be at all
surprised to see that in there.

Depends on if they still move ahead towards a multi-db design. I
have the feeling they won't, as it could give their competitors in
the DB market the same advantage SqlServer will have now (as IBM
and Oracle have already made their DB engines capable of running
these kind of engines in-process so it won't be hard for them to
move an EDM layer into their databases, IMHO).

I will be very disappointed if they don't go for a multi-db design.

One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd parties is
that their original design, where the ado.net provider had little to do
to get things done has been changed to make it a lot of work to get
things done for the ado.net provider, which means that the 3rd party
ado.net provider has to implement a lot of code to work with the EDM.

It's not going to stop other projects from going multi-db (such as
LINQ to NHibernate) - it would just limit the usefulness of ADO.NET
Entities. Put it this way: people aren't likely to change their
database choice based on what ADO.NET 3 supports, but they may well
change their framework based on their choice of database. MS would be
foolish not to understand that.

If they had understood it, they wouldn't have made IProvider internal
for linq to sql, so linq to sql (which had a multi-db design at first)
could be used on multiple db's as well.

EDM is a core part of sqlserver 2008. Any db vendor not having an EDM
provider undermines the success of EDM: if only MS releases a provider,
which only works for sqlserver 2008, will it succeed? Unlikely, because
it's a separate download for developers, it's not part of the .net
framework. If I was oracle or IBM, I would create the provider, but not
release it until I really would have to (read: when EDM turns out to be
a big data-access success so developers in general start to look for
databases which support it). Looking at the reluctance of Oracle to
release 11g for windows, I wouldn't be surprised if they're not that
enthousiastic for releasing a provider for EDM.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
.



Relevant Pages

  • Re: Finally which ORM tool?
    ... method used INSIDE the query is passed as the value immediately, ... That's also info not NEEDED to write correct queries on the ... Sequence and set aren't equal, true, but in this case, where linq ... to use the subquery train, I have to use joins, or rely on the provider ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Finally which ORM tool?
    ... manipulate the linq query IF you're executing it at that moment. ... simply because the declaration construction was with 'CHOPS'. ... implement IEnumerablebut had an Execute() method which gave back ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Finally which ORM tool?
    ... of a query;), hence my question. ... IEnumerabledirectly but instead having an Execute method taking ... But it's not a problem with LINQ itself. ... Every Linq provider will implement ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Is LINQ consumes double the time of Traditional Data Connection?
    ... What about the next time you execute the query? ... This makes sense - not only has LINQ got to connect to the ... That assembly can be reused for subsequent queries, ... Try measuring the subsequent executions of the same query and I'm sure ...
    (microsoft.public.dotnet.general)
  • RE: SQL stored procedure executing twice
    ... I wasn't aware that DLookupwould execute the "domain" more than once. ... caused the stored procedure to execute twice. ... Dim stDocName As String ... My pass-thru query properties ...
    (microsoft.public.access.modulesdaovba)