Re: ORDER BY and IDENTITY

From: Itzik Ben-Gan (itzik_at_REMOVETHIS.SolidQualityLearning.com)
Date: 01/06/05


Date: Thu, 6 Jan 2005 20:40:03 +0200

Just a follow-up to let you know that I haven't forgotten this...

I got a definitive answer that techniques 1 and 3 are not guaranteed.
Technique 2 is still under investigation.
I'll be back with more info once I get it...

-- 
BG, SQL Server MVP
www.SolidQualityLearning.com
"Itzik Ben-Gan" <itzik@REMOVETHIS.SolidQualityLearning.com> wrote in message 
news:OQCNhO15EHA.3416@TK2MSFTNGP09.phx.gbl...
> Hi,
>
> I guess I'm the MVP you're referring to here, and that your question 
> relates to my comments on the subject in the following article:
> http://www.windowsitpro.com/SQLServer/Article/ArticleID/44138/SQLServer_44138.html
>
> I see that you already got the pointers to the following articles 
> discussing the SELECT INTO part of the problem:
> http://support.microsoft.com/default.aspx?scid=kb;en-us;273586
> http://www.windowsitpro.com/SQLServer/Article/ArticleID/43553/43553.html
>
> The situation is somewhat similar to mathematics, where someone comes up 
> with an assumption/conjecture based on observations, e.g.,  Fermat's last 
> theorem which was proven right, and Goldbach's conjecture which is still 
> unsolved. In Mathematics, suffice to come up with one example that 
> contradicts the assumption to prove it wrong. With a T-SQL behavior, I'd 
> say that it's nice to have an example, but suffice that Microsoft says 
> that something works in a certain way to prove it so. I understand that 
> you suspect that the SELECT INTO + IDENTITY + ORDER BY (call it Technique 
> 1) variation might produce identity values that do not reflect the sort, 
> but think it's wrong of INSERT INTO + ORDER BY into table with IDENTITY 
> (call it Technique 2) to produce identity values that don't reflect the 
> sort.
> While we're at it, many programmers also believe that SELECT INTO + 
> IDENTITY FROM (SELECT + TOP 100 PERCENT + ORDER BY) (call it Technique 3) 
> generates identity values that reflect the desired sort.
>
> I just generated a contradicting (on my machine) example for the SELECT 
> INTO statement:
>
> set nocount on
> use tempdb
> go
> drop table ta, tb, #t
> go
> create table ta(a int not null)
> insert into ta values(3)
> insert into ta values(1)
> insert into ta values(2)
>
> create table tb(b int not null primary key)
> insert into tb values(4)
> insert into tb values(1)
> insert into tb values(3)
> insert into tb values(2)
>
> select b, identity(int, 1, 1) as rn
> into #t
> from ta join tb on a = b
> order by b
>
> select * from #t order by b
>
> -- Output:
> b           rn
> ----------- ----------- 
> 1           2
> 2           3
> 3           1
>
> -- Plan:
> |--Table Insert(OBJECT:([#t]), SET:([#t].[rn]=[Expr1005], 
> [#t].[b]=[tb].[b]))
> |--Top(ROWCOUNT est 0)
>  |--Compute Scalar(DEFINE:([Expr1005]=setidentity([Expr1004], -7, 0, 
> '#t')))
>   |--Sort(ORDER BY:([tb].[b] ASC))
>    |--Compute Scalar(DEFINE:([Expr1004]=getidentity(-7, 0, '#t')))
>     |--Nested Loops(Inner Join, OUTER REFERENCES:([ta].[a]))
>      |--Table Scan(OBJECT:([tempdb].[dbo].[ta]))
>      |--Clustered Index 
> Seek(OBJECT:([tempdb].[dbo].[tb].[PK__tb__522A84E8]), 
> SEEK:([tb].[b]=[ta].[a]) ORDERED FORWARD)
>
> You can see in the plan that the getidentity() function is invoked prior 
> to the sort. That's the case on my machine (SQL2K Dev / SP3, single CPU) 
> with my data. I don't know how this code would behave on other machines 
> and with other data. But that was exactly my point.
>
> But you mentioned that you can accept that Technique 1 is not guaranteed, 
> but not Technique 2. Why? Because you assume that the SELECT must be first 
> fully processed and then loaded into the table? There's nothing in the 
> relational model or ANSI SQL to support this because such an INSERT is 
> non-relational and non-standard. So you expect an undocumented proprietary 
> feature to behave in a certain way because that's what seems right to you. 
> You can clearly see in the plan that the Compute Scalar operator with the 
> getidentity() function is invoked before the data is actually loaded, so 
> why doesn't it make sense that it might be invoked before the sort?
> The way I see it, if Technique 1 doesn't guarantee anything, I'd be very 
> reluctant to assume Technique 2 does without Microsoft explicitly 
> guaranteeing it.
>
> I'm afraid I currently don't have contradicting examples for Techniques 2 
> and 3, and since I'm at home, I don't have access to machines with 
> multiple CPUs.
> I'll keep looking for ones, and if someone has contradicting examples, I'd 
> be glad to hear about those.
>
> I posted a request in the private MVP forum to get Microsoft's official 
> stand on the subject for all variations of the techniques. I'll post any 
> info that can be shared publicly.
>
> Cheers,
> -- 
> BG, SQL Server MVP
> www.SolidQualityLearning.com
>
>
> "Questar" <Questar@discussions.microsoft.com> wrote in message 
> news:7A556581-9D66-4E2E-8634-CA4E1342AE44@microsoft.com...
>> It has come to our attention that the expected (and observed) behavior of 
>> a
>> specific T-SQL statement is not guaranteed.  You have probably 
>> encountered
>> the issue before, but I will provide two examples here.
>>
>> Syntax Example 1:
>>
>> CREATE TABLE #T1 (T1 int, Z1 varchar(20))
>>
>> INSERT #T1 VALUES (1,'X')
>> INSERT #T1 VALUES (2,'Y')
>> INSERT #T1 VALUES (3,'Z')
>>
>> SELECT T1 AS T2, Z1 AS Z2, IDENTITY(int,1,1) AS I2 INTO #T2 FROM #T1 
>> ORDER
>> BY T1 DESC
>>
>> DROP TABLE #T1
>> DROP TABLE #T2
>>
>> Syntax Example 2:
>>
>> CREATE TABLE #T1 (T1 int, Z1 varchar(20))
>> CREATE TABLE #T2 (T2 int, Z2 varchar(20), I2 int IDENTITY(1,1))
>>
>> INSERT #T1 VALUES (1,'X')
>> INSERT #T1 VALUES (2,'Y')
>> INSERT #T1 VALUES (3,'Z')
>>
>> INSERT #T2 (T2,Z2) SELECT T1,Z1 FROM #T1 ORDER BY T1 DESC
>>
>> DROP TABLE #T1
>> DROP TABLE #T2
>>
>> According to a SQL Server MVP, the SQL Server 2000 execution plans for 
>> these
>> two examples are not guaranteed to have the sort occur before the 
>> assignment
>> of IDENTITY values.  The MVP said the expected behavior was guaranteed in 
>> SQL
>> Server 2005.  These two examples produce very similar, but not identical,
>> execution plans.  Looking at the statements logically I could understand 
>> if
>> the syntax of Example 1 resulted in the Compute Scalar (IDENTITY) 
>> operation
>> happening before the Sort operation, but that behavior would seem very 
>> wrong
>> for Example 2.  I believe there's a significant amount of T-SQL code 
>> being
>> used (including some of ours) that depends upon the syntax of Example 2
>> resulting in the Sort operation happening before the Compute Scalar
>> (IDENTITY) operation.
>>
>> Does anybody have any additional thoughts or information on this matter?
>>
>
> 


Relevant Pages

  • Re: ORDER BY and IDENTITY
    ... I guess I'm the MVP you're referring to here, ... Technique 2) to produce identity values that don't reflect the sort. ... The way I see it, if Technique 1 doesn't guarantee anything, I'd be very ...
    (microsoft.public.sqlserver.programming)
  • Re: ORDER BY and IDENTITY
    ... >> Technique 2) to produce identity values that don't reflect the sort. ... >> The way I see it, if Technique 1 doesn't guarantee anything, I'd be very ... >> BG, SQL Server MVP ...
    (microsoft.public.sqlserver.programming)
  • Re: ORDER BY and IDENTITY
    ... "Gert-Jan Strik" wrote in message ... >> I got a reply from Microsoft which says that technique 2 (INSERT SELECT ... >> BG, SQL Server MVP ... >>>>> sort, ...
    (microsoft.public.sqlserver.programming)
  • Re: ORDER BY and IDENTITY
    ... > Technique 2 is still under investigation. ... >> generates identity values that reflect the desired sort. ... >> feature to behave in a certain way because that's what seems right to you. ... >> BG, SQL Server MVP ...
    (microsoft.public.sqlserver.programming)
  • Re: ORDER BY and IDENTITY
    ... > Technique 2) to produce identity values that don't reflect the sort. ... > The way I see it, if Technique 1 doesn't guarantee anything, I'd be very ... >> resulting in the Sort operation happening before the Compute Scalar ...
    (microsoft.public.sqlserver.programming)

Loading