Re: Can you write code directly in CIL ???




"Willy Denoyette [MVP]" <willy.denoyette@xxxxxxxxxx> wrote in message
news:OysdFI9CGHA.2436@xxxxxxxxxxxxxxxxxxxxxxx
>
> "Peter Olcott" <olcott@xxxxxxx> wrote in message
> news:R5zsf.38046$QW2.31800@xxxxxxxxxxxxx
>>
>> "Willy Denoyette [MVP]" <willy.denoyette@xxxxxxxxxx> wrote in message
>> news:O%23d6lF5CGHA.1032@xxxxxxxxxxxxxxxxxxxxxxx
>>>
>>> "Peter Olcott" <olcott@xxxxxxx> wrote in message
>>> news:egksf.38007$QW2.25703@xxxxxxxxxxxxx
>>>>
>>>> "Jon Skeet [C# MVP]" <skeet@xxxxxxxxx> wrote in message
>>>> news:MPG.1e1b84c73d78be9098cbe4@xxxxxxxxxxxxxxxxxxxxxxx
>>>>> Nicholas Paldino [.NET/C# MVP] <mvp@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>> I will second that the C++ compiler is better at optimizing IL output
>>>>>> than the C# compiler. However, as Willy stated, it will not always
>>>>>> produce
>>>>>> verifiable code... I believe the article you were looking for is in MSDN
>>>>>> magazine.
>>>>>
>>>>> No, the article was definitely someone posting in this group saying, "I
>>>>> want to be able to embed IL in my C# code, here's why." He then
>>>>> produced some better IL (which I suspect *was* verifiable) which the C#
>>>>> compiler "could" have produced from the source C# (i.e. the behaviour
>>>>> was identical).
>>>>>
>>>>> I'm sure this will improve over time, but to be honest it's usually the
>>>>> JIT that has more to do with optimisation IMO.
>>>>
>>>> I wouldn't think that this would be the case for two reasons:
>>>> (1) CIL (for the most part) forms a one-to-one mapping with assembly
>>>> language
>>>
>>> Not true, IL is kind of high level language compared to X86 assembly, one
>>> single IL instruction translates to x assembly level instructions where x is
>>> certainly not 1.
>> Many of the instructions (all the ones in my critical 100 line function)
>> would map one-to-one with assembly language. All of the code in this critical
>> 100 line function is comparisons, branches, and the data movement of single
>> integers.
>>
>
> No they are not, IL is based on a pure stack based virtual machine execution
> environment, it has not such thing like registers, it has no notion of a real
> memory location, it has no access to the runtime stack.
>
> Just to give you an idea what I'm trying to explain, consider following C#
> method and it's compiler generated IL method.
>
> [C#]
> static void Foo()
> {
> int v = 0;
> int[] ar = new int[5] {0,1,2,3,4};
> for (int i = 0;i != 5 ;i++ )
> {
> v += ar[i];
> }
> }
> //
>
> [compiler generated IL]
> .method private hidebysig static void Foo() cil managed
> {
> // Code size 39 (0x27)
> .maxstack 3
> .locals init (int32 V_0,
> int32[] V_1,
> int32 V_2)
> IL_0000: ldc.i4.0
> IL_0001: stloc.0
> IL_0002: ldc.i4.5
> IL_0003: newarr [mscorlib]System.Int32
> IL_0008: dup
> IL_0009: ldtoken field valuetype
> '<PrivateImplementationDetails>{E21D91A1-F27C-4190-94E3-4FB17E12D29A}'/'__StaticArrayInitTypeSize=20'
> '<PrivateImplementationDetails>{E21D91A1-F27C-4190-94E3-4FB17E12D29A}'::'$$method0x6000002-1'
> IL_000e: call void
> [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class
> [mscorlib]System.Array,
>
> valuetype [mscorlib]System.RuntimeFieldHandle)
> IL_0013: stloc.1
> IL_0014: ldc.i4.0
> IL_0015: stloc.2
> IL_0016: br.s IL_0022
>
> IL_0018: ldloc.0
> IL_0019: ldloc.1
> IL_001a: ldloc.2
> IL_001b: ldelem.i4
> IL_001c: add
> IL_001d: stloc.0
> IL_001e: ldloc.2
> IL_001f: ldc.i4.1
> IL_0020: add
> IL_0021: stloc.2
> IL_0022: ldloc.2
> IL_0023: ldc.i4.5
> IL_0024: bne.un.s IL_0018
>
> IL_0026: ret
> } // end of method Tester::Foo
>
> and here is what the JIT compiler actually generated from this (!! CPU
> specific !!)
>
> 00cb0098 57 push edi
> 00cb0099 56 push esi
> 00cb009a ba05000000 mov edx,0x5
> 00cb009f b92a981579 mov ecx,0x7915982a
> 00cb00a4 e86b21c5ff call 00902214
> 00cb00a9 8d7808 lea edi,[eax+0x8]
> 00cb00ac be68204000 mov esi,0x402068
> 00cb00b1 f30f7e06 movq xmm0,qword ptr [esi]
> 00cb00b5 660fd607 movq qword ptr [edi],xmm0
> 00cb00b9 f30f7e4608 movq xmm0,qword ptr [esi+0x8]
> 00cb00be 660fd64708 movq qword ptr [edi+0x8],xmm0
> 00cb00c3 83c610 add esi,0x10
> 00cb00c6 83c710 add edi,0x10
> 00cb00c9 a5 movsd
> 00cb00ca 33d2 xor edx,edx
> 00cb00cc 8b4804 mov ecx,[eax+0x4]
> 00cb00cf 3bd1 cmp edx,ecx
> 00cb00d1 730b jnb 00cb00de
> 00cb00d3 83c201 add edx,0x1
> 00cb00d6 83fa05 cmp edx,0x5
> 00cb00d9 75f4 jnz 00cb00cf
> 00cb00db 5e pop esi
> 00cb00dc 5f pop edi
> 00cb00dd c3 ret
> 00cb00de e8fe453e79 call mscorwks!JIT_RngChkFail (7a0946e1)
> 00cb00e3 cc int 3
>
> Now try for yourself to build an IL module from the assembly code, and please
> make sure it compiles, is verifiable and runs as fast as the C# generated IL
> above. Or try to tweak the IL so it translates into better (faster) X86 code.

Show me the source code.

>
>>>
>>>> (2) End users are waiting on the JIT to complete, no time to waste doing
>>>> optimizations that could have been done before the softwae shipped.
>>>>
>>>
>>> Wrong again, IL is not optimized that much, THE optimizer is the JIT. It's
>> The JIT probably does all the processor specific optimizations. These don't
>> affect performance nearly as much as the ones that are not processor
>> specific.
>>
>
> Apart from the processor specific optimizations (which are significant) it
> performs most of the optimizations performed by a C/C++ compiler back-end
> optimizer (both the C++ back-end optimizer and the JIT optimizer has been
> written by the same team), only difference is that it happens at run-time, so
> it is somewhat constrained by time, but this is largely compensated by the
> processor/memory specific optimizatons.
> Check this link and see how managed code compares to unmanaged code at the
> performance level.
> http://www.grimes.demon.co.uk/dotnet/man_unman.htm
>
>
> Willy.
>
>

http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html
The above link is much more telling. There is a 450% difference in performance
between C++ and C# for something as simple as nested loops. Also the difference
between optimized code and code compiler with optimization disabled can be at
least an order of magnitude. If there is a 450% difference in the performance on
something as simple as a nested loop, this shows that there is significant room
for improvement.


.



Relevant Pages

  • Re: What micros do you actually hate to work with?
    ... the compiler was pretty good to start with. ... level optimizations and fine tuning every bit of the compiler. ... emitting fewer instructions in the end. ... as assembler written by an expert. ...
    (comp.arch.embedded)
  • Re: What micros do you actually hate to work with?
    ... the compiler was pretty good to start with. ... level optimizations and fine tuning every bit of the compiler. ... emitting fewer instructions in the end. ... can be conditionally executed just like all other instructions on ARM ...
    (comp.arch.embedded)
  • Definition of basic blocks
    ... are used in the compiler back end. ... What I actually want to know, is, if call instructions are treated like ... any other instruction or if they cause the end of a basic block. ... [It entirely depends on your language and the goals of your optimizations. ...
    (comp.compilers)
  • Re: WaitForSingleObject() will not deadlock
    ... One is to hijack the semantics of volatile to disable compiler optimizations ... and otherwise let the compiler to agressive optimization. ... Agressive optimizations are the ones that work on the edge of the semantics of the ... Because the compiler can see into lock and unlock, it is able to reduce f ...
    (microsoft.public.vc.mfc)
  • Re: WaitForSingleObject() will not deadlock
    ... represent an incorrect implementation of the language. ... the *compiler* does not guarantee this. ... but to state it in terms of the execution instead of the formal semantics of the language ... as long as the optimizations do not change the semantics of the language). ...
    (microsoft.public.vc.mfc)

Loading