Re: Can you write code directly in CIL ???




"Peter Olcott" <olcott@xxxxxxx> wrote in message
news:BbAsf.38051$QW2.20783@xxxxxxxxxxxxx
>
> "Willy Denoyette [MVP]" <willy.denoyette@xxxxxxxxxx> wrote in message
> news:OysdFI9CGHA.2436@xxxxxxxxxxxxxxxxxxxxxxx
>>
>> "Peter Olcott" <olcott@xxxxxxx> wrote in message
>> news:R5zsf.38046$QW2.31800@xxxxxxxxxxxxx
>>>
>>> "Willy Denoyette [MVP]" <willy.denoyette@xxxxxxxxxx> wrote in message
>>> news:O%23d6lF5CGHA.1032@xxxxxxxxxxxxxxxxxxxxxxx
>>>>
>>>> "Peter Olcott" <olcott@xxxxxxx> wrote in message
>>>> news:egksf.38007$QW2.25703@xxxxxxxxxxxxx
>>>>>
>>>>> "Jon Skeet [C# MVP]" <skeet@xxxxxxxxx> wrote in message
>>>>> news:MPG.1e1b84c73d78be9098cbe4@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>> Nicholas Paldino [.NET/C# MVP] <mvp@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>> I will second that the C++ compiler is better at optimizing IL
>>>>>>> output
>>>>>>> than the C# compiler. However, as Willy stated, it will not always
>>>>>>> produce
>>>>>>> verifiable code... I believe the article you were looking for is in
>>>>>>> MSDN
>>>>>>> magazine.
>>>>>>
>>>>>> No, the article was definitely someone posting in this group saying,
>>>>>> "I
>>>>>> want to be able to embed IL in my C# code, here's why." He then
>>>>>> produced some better IL (which I suspect *was* verifiable) which the
>>>>>> C#
>>>>>> compiler "could" have produced from the source C# (i.e. the behaviour
>>>>>> was identical).
>>>>>>
>>>>>> I'm sure this will improve over time, but to be honest it's usually
>>>>>> the
>>>>>> JIT that has more to do with optimisation IMO.
>>>>>
>>>>> I wouldn't think that this would be the case for two reasons:
>>>>> (1) CIL (for the most part) forms a one-to-one mapping with assembly
>>>>> language
>>>>
>>>> Not true, IL is kind of high level language compared to X86 assembly,
>>>> one single IL instruction translates to x assembly level instructions
>>>> where x is certainly not 1.
>>> Many of the instructions (all the ones in my critical 100 line function)
>>> would map one-to-one with assembly language. All of the code in this
>>> critical 100 line function is comparisons, branches, and the data
>>> movement of single integers.
>>>
>>
>> No they are not, IL is based on a pure stack based virtual machine
>> execution environment, it has not such thing like registers, it has no
>> notion of a real memory location, it has no access to the runtime stack.
>>
>> Just to give you an idea what I'm trying to explain, consider following
>> C# method and it's compiler generated IL method.
>>
>> [C#]
>> static void Foo()
>> {
>> int v = 0;
>> int[] ar = new int[5] {0,1,2,3,4};
>> for (int i = 0;i != 5 ;i++ )
>> {
>> v += ar[i];
>> }
>> }
>> //
>>
>> [compiler generated IL]
>> .method private hidebysig static void Foo() cil managed
>> {
>> // Code size 39 (0x27)
>> .maxstack 3
>> .locals init (int32 V_0,
>> int32[] V_1,
>> int32 V_2)
>> IL_0000: ldc.i4.0
>> IL_0001: stloc.0
>> IL_0002: ldc.i4.5
>> IL_0003: newarr [mscorlib]System.Int32
>> IL_0008: dup
>> IL_0009: ldtoken field valuetype
>> '<PrivateImplementationDetails>{E21D91A1-F27C-4190-94E3-4FB17E12D29A}'/'__StaticArrayInitTypeSize=20'
>> '<PrivateImplementationDetails>{E21D91A1-F27C-4190-94E3-4FB17E12D29A}'::'$$method0x6000002-1'
>> IL_000e: call void
>> [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class
>> [mscorlib]System.Array,
>>
>> valuetype [mscorlib]System.RuntimeFieldHandle)
>> IL_0013: stloc.1
>> IL_0014: ldc.i4.0
>> IL_0015: stloc.2
>> IL_0016: br.s IL_0022
>>
>> IL_0018: ldloc.0
>> IL_0019: ldloc.1
>> IL_001a: ldloc.2
>> IL_001b: ldelem.i4
>> IL_001c: add
>> IL_001d: stloc.0
>> IL_001e: ldloc.2
>> IL_001f: ldc.i4.1
>> IL_0020: add
>> IL_0021: stloc.2
>> IL_0022: ldloc.2
>> IL_0023: ldc.i4.5
>> IL_0024: bne.un.s IL_0018
>>
>> IL_0026: ret
>> } // end of method Tester::Foo
>>
>> and here is what the JIT compiler actually generated from this (!! CPU
>> specific !!)
>>
>> 00cb0098 57 push edi
>> 00cb0099 56 push esi
>> 00cb009a ba05000000 mov edx,0x5
>> 00cb009f b92a981579 mov ecx,0x7915982a
>> 00cb00a4 e86b21c5ff call 00902214
>> 00cb00a9 8d7808 lea edi,[eax+0x8]
>> 00cb00ac be68204000 mov esi,0x402068
>> 00cb00b1 f30f7e06 movq xmm0,qword ptr [esi]
>> 00cb00b5 660fd607 movq qword ptr [edi],xmm0
>> 00cb00b9 f30f7e4608 movq xmm0,qword ptr [esi+0x8]
>> 00cb00be 660fd64708 movq qword ptr [edi+0x8],xmm0
>> 00cb00c3 83c610 add esi,0x10
>> 00cb00c6 83c710 add edi,0x10
>> 00cb00c9 a5 movsd
>> 00cb00ca 33d2 xor edx,edx
>> 00cb00cc 8b4804 mov ecx,[eax+0x4]
>> 00cb00cf 3bd1 cmp edx,ecx
>> 00cb00d1 730b jnb 00cb00de
>> 00cb00d3 83c201 add edx,0x1
>> 00cb00d6 83fa05 cmp edx,0x5
>> 00cb00d9 75f4 jnz 00cb00cf
>> 00cb00db 5e pop esi
>> 00cb00dc 5f pop edi
>> 00cb00dd c3 ret
>> 00cb00de e8fe453e79 call mscorwks!JIT_RngChkFail (7a0946e1)
>> 00cb00e3 cc int 3
>>
>> Now try for yourself to build an IL module from the assembly code, and
>> please make sure it compiles, is verifiable and runs as fast as the C#
>> generated IL above. Or try to tweak the IL so it translates into better
>> (faster) X86 code.
>
> Show me the source code.
>

What else do you want?, I gave you the C# source code (the Foo method), it's
corresponding IL and the X86 code produced by the JIT.


>
> http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html
> The above link is much more telling. There is a 450% difference in
> performance between C++ and C# for something as simple as nested loops.
> Also the difference between optimized code and code compiler with
> optimization disabled can be at least an order of magnitude. If there is a
> 450% difference in the performance on something as simple as a nested
> loop, this shows that there is significant room for improvement.
>
This very specific (but broken [1] and cluless) benchmark (the loop) is a
sample where the NATIVE C compilers optimizer does a better job than the JIT
compiler/optimizer, but this has nothing to do with the IL code.

[1] This is the correct code which is still ~50% slower than the (corrected)
C++ code ( __int64 x=0; ).

int a = 0, b = 0, c = 0, d= 0, e = 0, f = 0;
long x=0;
startTime = DateTime.Now;
for (a=0; a!=n; a++)
for (b=0; b!=n; b++)
for (c=0; c!=n; c++)
for (d=0; d!=n; d++)
for (e=0; e!=n; e++)
for (f=0; f!=n; f++)
x+=a+b+c+d+e+f;

C# with /o+
Nested Loop elapsed time: 10015 ms - 479232000000

C++ with /O2 /EHsc
Nested Loop elapsed time: 6171 ms 479232000000

Willy.


.



Relevant Pages

  • Re: What Delphi users really want (other than bugfixes)
    ... > Maybe you can't directly access SIMD from .NET code but what do you make ... Still waiting for any of that "JIT-only optimization" to happen ... in the real world and have it beat an AOT compiler... ... I would be more impressed if the JIT was able to overtake ye olde ...
    (borland.public.delphi.non-technical)
  • Re: C++ /clr vs. unmanaged C++
    ... The JIT can do more cross-module optimization than the native C++ compiler, but the C++ compiler considers a lot more optimizations, only some of which are used when generating IL. ... Also, when compiling /clr without using .NET features, you don't take advantage of the .NET memory management scheme which usually compensates for the reduced optimization. ...
    (microsoft.public.dotnet.framework.interop)
  • Re: Java performance, CRTJVAPGM vs JIT
    ... JIT compiler compiles each Java method every time it sees it until the ... Sun's HotSpot JIT, but the points it makes are valid for various other ... performance improvement for a small investment in optimization time. ...
    (comp.sys.ibm.as400.misc)
  • Re: Redundant IL generated by the compiler
    ... The compiler is doing what you told it to do. ... So if you are that concerned with doing it right first time and not wasting ... to optimization; however, I'm a firm believer of doing things right the ... unrolling) because the JIT would do a better job of it seeing that it's ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Brian Kernighan, maybe Im not worthy, maybe Im scum
    ... what experienced programmers do, ... optimization, ... Thugs" ad nauseum fits that a lot more closely than discussing compiler ... be modified outside a loop, and guessing ...
    (comp.programming)