Re: Are Linq-SQL methods commutative

Tech-Archive recommends: Fix windows errors by optimizing your registry



Nicholas Paldino [.NET/C# MVP] wrote:

Frans,

See inline:

Andrus,

Absolutely different results.

Now that's unexpected. In database land, you can't get 2 different
results.

Not really, because to translate what is being done in LINQ to
database land, you would actually perform a query on the result set
of the previous query. The operations are not performed at the same
time.

That's not defined. There's no ruleset how to translate a Linq query
to SQL, on the contrary, it's often a struggle how to completely grasp
the intention of the linq query with SQL.

So, in the case of take->skip, true, you need multiple projections and
in the case of skip -> take, you need less, perhaps none. However, in
the case of a select statement which specifies BOTH at once, you can't
get 2 different resultsets by specifying them in a different order, see
Firebird's and PostgreSql's SELECT syntax for details.
Firebird:
SELECT FIRST m SKIP n * FROM ...

PostgreSQL:
SELECT * FROM ... LIMIT m OFFSET n

MySql actually does have an order, but it has just 1 clear intention:
you skip n rows and take m. You can swap them, but that doesn't matter
for the statement, you can't specify take m, skip n from that m

MySQL:
SELECT * FROM ... LIMIT n, m


Oh yes, I know... this is SQL and not linq. Let's talk about that for
a second. A developer writing a linq query which is executed on the
database has to understand the query WILL become SQL. The translation
is far from trivial, so the intention of the linq query has to be
represented by the SQL query as much as possible.

The developer isn't helped by fuzzy linq syntaxis, as predicting what
SQL is going to be generated is then even more difficult than it is
now. Take is used for TOP if Skip isn't there.

Is take used for TOP IF skip is there?

No, it's not. Well, not always. If Skip is there and Take is there, it
should be translated into a paging query (paging is skipping rows and
taking the top n rows from what's left). If paging is performed 'TOP'
makes no sense.

So if Take is specified first, and then Skip, the query has to use a
temp table to store the taken rows and then skip the first n rows from
that. Otherwise it can't always be done: TOP can't always be added to a
subquery: if you sort on a column not in the subquery, you can run into
errors if TOP / DISTINCT is used in the subquery.

Assuming the ordering is the same on each of them (because Skip
and Take make no sense without ordering, LINQ to SQL will create
an order for you, which irritates me to no end, but that's a
separate thread),

Why? SELECT * FROM Table without ordering has no defined ordering,
so using a limit + skip operator on that set results in a set of
undefined rows, you'll never know what rows will be returned.

Well, that's not completely true. If you are using a clustered index
in SQL Server, then the ordering of the result set will be on that
index, however, without that index, the order is undefined
(basically, however it is stored in the file).

That's just luck, if I recall correctly, but I have to check BOL for
semantics on this. The SQL standard (not obeyed by anyone in full,
admitted) describes this explicitly. I don't like to hammer on standard
specs that much, but this one thing is essential to understand so
people don't run into problems with unexpected results.

The RDBMS can for example decide to return a part of the rowset before
another part because it has that one already in memory.

But I agree with you, in database land, it makes no sense, because
you have to define an ordering. LINQ to SQL gets around this by
ordering by all the columns in the table if another order is not
found (I believe, I have to test that part, but without a specific
ordering, it orders on every column in the table).

I think it orders on every column in the projection. If it would order
on every column in the source of the projection, it can run into issues
with DISTINCT where not every column of the source is in the projection.

they will produce different results.

Say your query will produce the ordered set {1, 2, 3}. Let n = 1,
m = 2.

The first query:

var query = query.Skip(n).Take(m);

Will return the ordered set {2, 3}, while the second query:

var query = query.Take(m).Skip(n);

Will return the ordered set {2}.

The reason for this is that in the first query, the Skip method
skips one element, then takes the remaining two, while in the
second query, the first two elements are taken, and then the
first one is skipped.

I tried it out to be sure, and indeed Linq to Sql generates two
different queries (which really hurt my eyes but that's another
story). The thing is though that on databases, people will use the
take/skip combination to page through a bigger resultset. However
consulting the manual for both shows no word whatsoever about the
order in which these two statements have to be specified.

Well, there is no order that they HAVE to be specified in. Yes, they
are usually used in conjunction, but it is not a requirement that
they be used in conjunction. They can be specified in any order,
which is what the OP's question was arising from. He wanted to know
if the order in which the operations were applied would affect the
outcome, which it definitely will.

Though the intention is to page, i.e. first skip, then take. When
would you first take and then skip? With Skip(n) and Take(m), isn't:
set.Skip(n).Take(m) the same as set.Take(m+n).Skip(n) ?

the net result is that you get a set of rows which is limited to a
size x from a potentially bigger set with size y. In other words:
specifying Take first and then Skip has no real value over specifying
Skip first and then Take. Mind you: the intention is to get a resultset
which has a fixed size and is a subset from a bigger set.

As some databases offer you this in the SELECT statement, one could
argue (and there are no rules for this, so why not!) that Skip and
Take, IF specified both in the same projection scope, are translated
into the parameters for the LIMIT/OFFSET etc. keywords of the SELECT
statement of the projection scope they're specified on.

I do understand the order, but it's a bit strange as well. For
example, Linq apparently has no problem with an Order by placed in
front of a where, however there IS an order in take/skip which are
used combined as a paging mechanism. Of course, this follows from
the specs of both, but semantically, the intention of what people
want to do, e.g. to page, shouldn't require an order in the
statements for paging, if other elements in the query also don't
really require an order (they do, but that's whiped under the rug)

Well, it depends here on what operations we are talking about. You
are making the error of assuming that Take and Skip are both sub
parts of a larger page operation, and assuming that because other
elements of the operation that can be applied (i.e. where) don't
require order, these don't.

Why else would you want to use take and skip in the same projection
scope? Their sole purpose is to limit the resultset to a fixed set and
to specify where to start in the overall resultset for the subset to
return.

But that's wrong. Like you said, in db-land, not applying an order
results in an undefined order in the result set. If you are going to
page off that, then the DB is completely within its rights to return
a differently ordered result set between subsequent queries, and your
paging results will simply be wrong. You would be depending on an
implementation detail that the DB would consistently return the
undefined order to you.

'Order' in my remark above was about skip first/take after that, not
about an ORDER BY added to the query by the linq provider. Sorry about
that confusion.

The reason Take and Skip (should) require an order is because of this
fact. Think of it. The definition of Take is "take the FIRST n
elements". Skip is defined as "skip the NEXT n elements". Because of
FIRST and NEXT, you require order. Without knowing the order, the
concepts of FIRST and NEXT do not exist.

Though I've shown above that you can use Take and Skip in any order to
obtain the same rows. The INTENTION of the developer is to get row n
till n+m from a set of rows of size m+k, agreed? Why else would Take
and Skip show up in the same projection scope.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
.



Relevant Pages

  • Re: Are Linq-SQL methods commutative
    ... No - in LINQ each step effectively works off the result of the ... linq query which has to meet a SQL query they have in mind. ... freedom a SQL statement gives them: the SQL statement is the one ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Building expression trees to filter derived entity classes
    ... up the inheritance correctly in your LINQ to SQL mapping? ... you project the your LINQ to SQL entity objects onto some other POCO ... LINQ to SQL is still trying to process the entire query, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Are Linq-SQL methods commutative
    ... because to translate what is being done in LINQ to database ... you would actually perform a query on the result set of the previous ... SQL Server, then the ordering of the result set will be on that index, ... But I agree with you, in database land, it makes no sense, because you ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: LINQ to SQL vs. Stored Procedures (Performance)
    ... If it is an oft used LINQ ... query, you will find that SQL Server creates stats and the query will run ... Stored Procedure and a LINQ to SQL command, ...
    (microsoft.public.dotnet.framework)
  • Re: Pull Specific Data by Previous Months Date (Date Entered via prom
    ... I would copy & paste the SQL above into notepad, do a find & replace, find ... save this query and then create a new ... from the Prev Month BV by specifying the date and will show you the Prev ...
    (microsoft.public.access.queries)