Re: Finally which ORM tool?



Frans Bouma [C# MVP] <perseus.usenetNOSPAM@xxxxxxxxx> wrote:
'Closures', now there's a word that's overloaded too many times :).
Are we talking about sets or graph paths? :)

Um, closures the fairly standard computer science term:
http://en.wikipedia.org/wiki/Closure_(computer_science)

Yet, I find it a confusing term, as the closures in logic and the
closures in math are often used in our field as well, take for example
the mathematics specific closure definition, especially in the context
of a query ;), hence my question.

I thought it was reasonably unambiguous in this context - given that we
were talking about lambda expressions, the computer science use seemed
fairly obvious to me. Not to worry.

That was deliberately nasty code. It would be fairly hard to
accidentally do that sort of thing, and I would certainly discourage
its use in real code.

I don't know if your particular example isn't going to pop up
regularly, but I do know that most developers think that a query is an
imperative piece of code, and often fall into a trap where they want to
re-use elements of the result of the query as the input of the query,
which is what you illustrated.

Well, only time will tell - but I'd be surprised to see code like that
written with any significant expectations of "simple" behaviour.

Well, I think query expressions are clearer than text - as well as
being more easily verifiable by the compiler, of course (within the
bounds of being valid expressions - the compiler can't determine
whether or not there will be a valid SQL translation).

As for the captured variable aspect of it - I still believe it's a
matter of education and becoming used to them, as well as not abusing
them. You can certainly get yourself into trouble pretty easily, but
then again you can also avoid getting yourself into trouble fairly
easily.

However it's not consistent: a variable passed to an extension method
used INSIDE the query is passed as the value immediately, but the same
variable passed to another extension method in a lambda is passed as a
memberaccess expression and not passed as its value.

Firstly, it's not to do with extension methods at all. It's to do with
whether parameter is a lambda expression or not, and that's *all* it
has to do with.

Now, as for consistency - it's consistent once you understand which
parts of a query expression are actually a shorthand for lambda
expressions. Is someone who doesn't want to learn the basics of query
expressions going to find that confusing? Yes. Should someone who
doesn't want to learn the basics of query expressions be using them in
production code? Absolutely not.

There are all kinds of areas where if you have no idea what you're
doing, you can go wrong - there's nothing new in that. Lambda
expressions and query expressions aren't that hard, and education is
the key IMO.

Apply your consistency test to a mutable struct vs a mutable class,
with a value being passed to a method and then changed - you'll see
exactly the same "inconsistency". Does that mean we shouldn't have the
distinction between value types and reference types? No - it just means
that people need to know about the difference between them.

Because it implements IEnumerable, it by itself is an enumerable
resource. This IMHO implies that it's a set.

To me it impllies it's the source of a sequence.

List<T> also implements IEnumerable. Or most other collections for
that matter. Does IEnumerable on this object an outside resource? Or
does it imply you enumerate over the data INSIDE the object?

To me it implies the latter, and I fail to see why it's all of a
sudden completely different with a Queryable object.

There's nothing to say it *has* to be an outside resource, but there's
nothing inherently *preventing* it from being an outside resource. You
are, after all, calling GetEnumerator() - doesn't that suggest that it
could be taking some action?

If it had an Execute method (some o/r mappers add that method), the
thing still is that it's not a declaration alone. You can't execute
a declaration without the executor who interprets the query so it
gets executed.

Well, it's a few things:
1) It relates to a schema - crucial for keeping type safety etc

no, it relates to a model. The relational schema is totally not
relevant here.

Well, I view the model as a schema - just not the *relational* schema.
However, I'm very happy to use the term "model" here instead. Either
way, it's important for the LINQ query.

2) It knows about its current session

but why?

Yes, I know you dislike it - I was merely clarifying what the query
knows about.

3) It's the query itself
4) It's the means of executing the query

Same as with 2): why is this? What big problem does this solve? IMHO
it only creates problems.

Ditto.

I would have no objection to the idea of it not implementing
IEnumerable<T> directly but instead having an Execute method taking
the session. It would make a few bits of code a bit more long-winded,
but that's all.

that would already be better.

I still say it's less convenient in many situations, but I wouldn't
mind much either way.

Having said that, I also don't have much problem with the way it's
been done.

so you're comfortable with creating a query q in method a, pass it to
method b and therefore requiring a session in method a, which is for
example not possible. Say I want to formulate what I want in method A,
but as I'm not allowed to directly use database access code, I have to
pass the specification to a layer where it IS possible to use data
access code. I now can't formulate the query in method A, I have to
pass what I want in a DIFFERENT specification method. Also, where I
specify it, I have to decide which DB to use if I have a multi-db
design.

You can use CompiledQuery for that sort of thing.

For Linq to sql that's not of their concern, so you also don't see a
solution for that in their design, but it IS a problem.

But it's not a problem with LINQ itself. Providers can choose to
implement things that way if they want to.

Everyone will say: "you can't execute a string". No of course you
can't, as it contains the declaration of a query. You need an
execution engine to execute the query. That would have been better
IMHO, because it separates declaration from execution, which are
combined in a linq query.

Well, you need context - and that's what the DataContext (and the
tables off it) really provides.

I really don't think it's nearly as ugly as you seem to be making it
out to be though.

If you think I'm alone in this, you're mistaken. ;). I just find it
rather odd, that a lot of people spend hours and hours a day to
separate concerns in their design, yet this in-your-face combination of
concerns is apparently acceptable.

Well, you can work round it with CompiledQuery when you want to, and it
makes things slightly simpler when you *do* just want to execute a
query in a particular session.

As I've said, it's the same with Hibernate: you normally create a
Criteria against a Session. You *can* used a DetachedCriteria, but the
more normal operation is to create a Criteria when you need to, against
a session. I've used that model with no problems, and the sky didn't
fall down.

Something which is enumerable, isn't that semantically a sequence,
a set for you?

Sequence and set certainly aren't the same thing, but I'd say that
something which is enumerable is either a sequence in itself or is a
way of getting at a sequence. Think of it in terms of the method:
GetEnumerator returns an enumerator for the data. There's nothing
inconsistent in that being applied to a data source rather than
something which already contains the data.

Sequence and set aren't equal, true, but in this case, where linq
queries are executed on the db, the difference IS a bit artificial, as
the query IS fetched first, so the query object contains the whole
resultset requested, onto which the enumerator is created.

It's not as if the enumerator is represented by a life cursor on a
resultset in the db.

Is that definitely true in all cases? I can see situations where it
would be very handy to effectively get a DataReader back turning things
into anonymous types on the fly. (Using full entities would require
remembering them all for uniqueness purposes, which would negate a lot
of the point of it, of course.)

Even if it's not true for LINQ to SQL, it could be true for other LINQ
providers in the future.

A forward only cursor on the resultset which is already read in full,
isn't that helpful in a lot of cases either, you want the set to work
with.

Sometimes you do, sometimes you just want to process a record at a
time. Even if you're batching things, you may well not want to read the
whole batch in one go.

ToList() creates a copy, as it's already in a set. Little things
which show that:

IList result = q.Execute<IList>();

would have been better (not ideal, the query still executes itself).

I mean: IF the user is interested in a forward only cursor on the
resultset, give the user a forward only cursor on the resultset.
However the set is already fetched, in full, so in that case, simply
return the set and be done with it, as the user wants the set, not a
cursor, as the set is what the query defines and what the user wanted.

Sure - it's trivial to turn a cursor into a fully loaded result set.
The reverse isn't feasible, of course.

Because the linq query isn't a declaration alone, you can't do
things with it like pass it to an execution engine of choice, you
have to execute it with the engine inside the query, or you have to
be lucky to be using a linq provider which gives you this
flexibility ;).

Sure - but the beautiful thing about LINQ (instead of LINQ to SQL) is
that different providers can choose their own way to go on this and
people can still use broadly the same query syntax.

That's only true on paper. Every Linq provider will implement
extension methods, which are specific for that linq provider. For
example we have extension methods for paging (as skip/take isn't going
to cut it, in this case) and adding prefetch paths to the query, and
likely more when we're completely done programming. The thing is that
others have different extension methods, and it's precisely THOSE
extension methods which make things interesting.

In *some* queries it is - but in many cases a simple "from x where y
select z" will be perfectly fine.

Sure, they can all use the simple syntaxis of selecting a set of
entities from a set, using a simple filter, but it quickly gets out of
hand. Take for example a silly method like .Distinct(), which fails on
linq to sql when a distinct voilating type is detected.

If you're asking for distinct values and the type violates
distinctness, why shouldn't it fail? Perhaps an example would help.

As Linq relies on extension methods, it simply depends on what kind of
extension methods are implemented for the provider used. For example,
DefaultIfEmpty(), the stupid method which signals left/right join. Is
it possible to rely on this method to get a left/right join, something
FUNDAMENTAL to SQL? No.

So, it looks good on paper, but the common demeanor is pretty small in
this case.

Many, many queries are simple ones in my experience. It's nice to have
the ability to use the full power of the specific database or LINQ
provider when you need to, but it's also nice to have consistency of
querying when that's feasible.

If you choose to
implement LINQ in a way which requires a context to be provided to
it, that's fine - and the query will still be easily recognisable to
someone who has used LINQ to NHibernate, or LINQ to SQL.

To some extend.

It's always easy to explain things using boring simple queries. Those
aren't the problem. The problems arise when things get more complicated
than for example a single value list from a single entity set: at that
moment, the core C#/VB.NET syntaxis for queries isn't enough or will
differ a lot from the expression tree generated so the o/r mapper will
likely require the use of extension methods on one hand and will have
to make decisions what caused this particular subtree to be there on
the other hand (as that's not always obvious, like with DefaultIfEmpty,
or multiple from clauses which result in nested SelectMany calls)

If the provider has been well designed, the query should still be
readable, even if parts of it would not be available in other
providers.

It gets really different when things like tweakability are added to
the equation. With linq, people have less control over how the SQL
looks like. This is actually pretty bad in the long run as the SQL
might for example use a subquery where it should have used a join and
vice versa. This can be solved with extension methods, but this ties
the query to the provider used.

Yes, if you absolutely have to tweak things, then that's fine - and I
fully believe that you ought to closely examine the SQL generated by
your LINQ provider - but there are many simple queries which don't need
tweaking.

I'm not saying this is a bad thing per se, Linq offers extension
methods, which are ideal for solving these problems, however it has a
price, and that price is giving up provider-independency.

Yup - so you pay that price when you need to, and when you don't need
to you've still got independence.

How does it "severely limit" the user of the framework? If you don't
want to change the parameters, don't change the values of the
variables - I don't think it's something that people are likely to do
accidentally anyway, to be honest.

Not only the parameter stuff, also the ability to specify on which
context/session/adapter they want to execute the query is a thing which
is hard/not possible to do.

CompiledQuery makes it pretty easy to execute a query against an
arbitrary context.

It also implies that when creating the
query you NEED a session/context/adapter, which is in a lot of cases
not possible, simply because the context/session isn't known at that
point, or not available because at that spot it's not allowed to cut
corners and access the db for example.

Again, CompiledQuery doesn't require this.

[ADO.NET Entities]

I will be very disappointed if they don't go for a multi-db design.

One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd parties is
that their original design, where the ado.net provider had little to do
to get things done has been changed to make it a lot of work to get
things done for the ado.net provider, which means that the 3rd party
ado.net provider has to implement a lot of code to work with the EDM.

That's fairly reasonable - it's good to let the third parties make
their own providers work as well as possible.

It's not going to stop other projects from going multi-db (such as
LINQ to NHibernate) - it would just limit the usefulness of ADO.NET
Entities. Put it this way: people aren't likely to change their
database choice based on what ADO.NET 3 supports, but they may well
change their framework based on their choice of database. MS would be
foolish not to understand that.

If they had understood it, they wouldn't have made IProvider internal
for linq to sql, so linq to sql (which had a multi-db design at first)
could be used on multiple db's as well.

Well, don't forget that LINQ to SQL is (as I understand it) a very
different team to the ADO.NET side of things.

EDM is a core part of sqlserver 2008. Any db vendor not having an EDM
provider undermines the success of EDM: if only MS releases a provider,
which only works for sqlserver 2008, will it succeed? Unlikely, because
it's a separate download for developers, it's not part of the .net
framework.

So is the Oracle data provider, but that's pretty well used. Ditto
NUnit :)

If I was oracle or IBM, I would create the provider, but not
release it until I really would have to (read: when EDM turns out to be
a big data-access success so developers in general start to look for
databases which support it). Looking at the reluctance of Oracle to
release 11g for windows, I wouldn't be surprised if they're not that
enthousiastic for releasing a provider for EDM.

Hmm... I guess we'll have to wait and see. Still, there'll always be
other providers around. If MS really wants to lose ORM customers to
NHibernate etc, they will...

--
Jon Skeet - <skeet@xxxxxxxxx>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
.



Relevant Pages

  • Re: Finally which ORM tool?
    ... method used INSIDE the query is passed as the value immediately, ... That's also info not NEEDED to write correct queries on the ... Sequence and set aren't equal, true, but in this case, where linq ... to use the subquery train, I have to use joins, or rely on the provider ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Finally which ORM tool?
    ... manipulate the linq query IF you're executing it at that moment. ... simply because the declaration construction was with 'CHOPS'. ... implement IEnumerablebut had an Execute() method which gave back ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Finally which ORM tool?
    ... you manipulate the linq query IF you're ... implement IEnumerablebut had an Execute() method which gave ... Every Linq provider will implement ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: [PHP] how to curl
    ... serverB should be speaking http. ... Yes nathan i'm not able to understand how to pass the sql query can you ... i have a few questions as well; is the query being executed on the provider ... essentially represent the query you intend to execute. ...
    (php.general)
  • Re: Is LINQ consumes double the time of Traditional Data Connection?
    ... What about the next time you execute the query? ... This makes sense - not only has LINQ got to connect to the ... That assembly can be reused for subsequent queries, ... Try measuring the subsequent executions of the same query and I'm sure ...
    (microsoft.public.dotnet.general)