Re: Why is C# 450% slower than C++ on nested loops ??




"Stefan Simek" <simek.nospam@xxxxxxxxxxxxxxxxx> wrote in message
news:O0xafYfDGHA.344@xxxxxxxxxxxxxxxxxxxxxxx
> Peter Olcott wrote:
>> "Stefan Simek" <simek.nospam@xxxxxxxxxxxxxxxxx> wrote in message
>> news:e2rqYVXDGHA.2912@xxxxxxxxxxxxxxxxxxxxxxx
>>
>>>Peter Olcott wrote:
>>>
>>>>"Anders Borum" <anders@xxxxxxxxxxxxxx> wrote in message
>>>>news:edHCjaSDGHA.3528@xxxxxxxxxxxxxxxxxxxxxxx
>>>>
>>>>
>>>>>Hello!
>>>>>
>>>>>
>>>>>
>>>>>>It must have been after the JIT, because C# beat C++ on some things, just
>>>>>>not the nested loops. The author of the link would know relatively obvious
>>>>>>things such as this.
>>>>>
>>>>>Unless somebody documents what methods have been used to measure the
>>>>>benchmarks, I am disposing the data altogether. Benchmarks should be
>>>>>verifiable ..
>>>>
>>>>
>>>>I searched high and low and finally found some stuff on unverified code. The
>>>>MSDN documentation mentions it yet does not discuss what it means. It could
>>>>be anything from security checks to tests for memory leaks. It doesn't say
>>>>what its purpose is.
>>>>
>>>>
>>>>>--
>>>>>With regards
>>>>>Anders Borum / SphereWorks
>>>>>Microsoft Certified Professional (.NET MCP)
>>>>>
>>>>
>>>>
>>>>
>>>I guess you should google for 'define:verify'. Verifiable benchmarks have
>>>nothing to do with verifiable code. Anders just wanted the benchmarks to be
>>>verifiably *valid*, which they aren't by any means. A nested loop written the
>>>way it is in the benchmark is measuring nothing but a compiler's ability to
>>>optimize nested loops that do more or less nothing. Verifiable code means (in
>>>the .NET vocabulary) that the CLR can statically verify the code to ensure it
>>>will do nothing it is not expected to do. But enough of that, let's see what
>>>native assembly the
>>
>>
>> Yet I just found a Microsoft article that says that this sort of technology
>> is between 20 and 50 years away. Code is not expected to be incorrect.
>> Proving that code is not incorrect is technolgy that does not yet exist. I
>> don't know what it is verifying, but, correct code is not it. The posted
>> benchmark was crucial to my needs because my critical 100-line function is
>> mostly loops.
>
> That's why the code must meet certain criteria in order to be verifiable
> today. C# compiler generates such code, and the C++/CLI compiler is able to do
> so.
>
>>
>>
>>>compilers generate for the loop and get over with it.
>>>
>>><...>
>>>
>>>ha! I won't post the complete disassemblies (the C++ one is terribly cryptic
>>>so it would do no help), but I've found the following. The C++ compiler (8.0
>>>in my case, but I suspect 6.0 is doing the same) manages to pre-cache the
>>>additions in the outer loops, which the C# compiler doesn't. Thus, in the C#
>>>code all the additions that happen at
>>>
>>>x+=a+b+c+d+e+f;
>>>
>>>are evaluated for every single innermost loop, while c++ does something like
>>>the following:
>>>
>>>for (a = 0; a < n; a++)
>>>{
>>> for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
>>> {
>>> for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
>>> {
>>> for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
>>> {
>>> for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
>>> {
>>> for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
>>> x += tmp;
>>> }
>>> }
>>> }
>>> }
>>>}
>>>
>>>If you compile this code in C#, the execution time is 6734 ms for .NET 2.0
>>>and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my
>>>machine. But note that the innermost loop in the C++ is only four
>>>instructions. This can hardly be matched to any real-life algorithm, and I
>>>can assure you that if there was anything more than 'nothing' in the inner
>>>loop, the performance difference would be much less than 80% (I expect it to
>>>drop far below 10% for most real-life algorithms).
>>
>>
>> Since this is not the same code that was posted on the benchmark, it is not a
>> valid test. If you are worried about arithmetic overflow then define X as a
>> double. I want to know why there was a 450% difference in the original
>> benchmark. When you change the code, this avoids rather than answers the
>> question. It is reported that 2005 does a much better job of optimization of
>> .NET code, yet, only with the C++, not the C# compiler.
>
> No. I gave you and exact answer *why* there is the difference. The C++
> translates the original code logic to something comparable to the snippet I
> have posted. C# compiler fails to do this optimization. But it has nothing to
> do with nested loops per se. It is an optimization of the addition performed
> in the innermost loop.
>
>>
>>
>>>That said, I wouldn't expect C# (or .NET) to match pure native code
>>>performance for purely computational tasks like the one you describe is. C#
>>>can win easily in cases where a lot of dynamic allocation is involved, etc.,
>>>but it will probably never outperform optimized native C++ (or handwritten
>>>assembly) for computational tasks. If you need to squeeze every clock of your
>>>CPU, you will probably get best results using an optimizing compiler targeted
>>>for exactly the CPU the code will run on.
>>
>>
>> There is no reason (in theory) why .NET code would need to be any slower than
>> native code. The reason in practice seems to be that Microsoft just hasn't
>> gotten around to implementing every possible optimization in every language,
>> yet. The one tricky thing about optimization in .NET is that it is divided
>> into two different processes. Some of the optimizations at the compile step
>> can screw up some of the optimizations at the JIT step. The problem is that
>> the compile step does not know how many registers the target architecture
>> has.
>
> That's why I used the word *probably*. The JIT is much more time constrained,
> so it *might* never be able to match the optimizations of traditional
> compilers. It *should* certainly give better *average*

http://msdn.microsoft.com/msdnmag/issues/04/05/VisualC2005/
If you read this link you will see that this is not even the way that it works.
As I already said, most of the optimizations are accomplished by the language
compiler, not the JIT.

> results across platforms in the future, because it makes use of any available
> hardware present.
>
>>
>>
>>>I didn't want to write all this, but after reading through the long
>>>discussions on this matter without touching the importatnt points, I just
>>>decided to do so :)
>>>
>>>Just my 2c.
>>>
>>>Stefan
>>
>>
>>
>
> If you really want to get answers to your questions, not just troll around,
> you really can get them here. If you don't...
>
> Stefan


.



Relevant Pages

  • Re: Why is C# 450% slower than C++ on nested loops ??
    ... The posted benchmark was crucial to ... > compilers generate for the loop and get over with it. ... > additions in the outer loops, which the C# compiler doesn't. ... gotten around to implementing every possible optimization in every language, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Support for optimization for dual core proc in C++
    ... Does the C++ in VS2005 allow optimization to parallelize and vectorize ... automatically like the product IntelC++ Compiler Version 10.0 does? ... restructures and optimizes loops to ensure that auto-vectorization, ...
    (microsoft.public.vc.language)
  • Re: ifort volatile with O3
    ... move code out of loops when possible. ... such that the VOLATILE parts don't do what ... the OP did ask about optimization level 3. ... compiler is free to move things around so long as the final result ...
    (comp.lang.fortran)
  • Re: ifort volatile with O3
    ... move code out of loops when possible. ... such that the VOLATILE parts don't do what ... the OP did ask about optimization level 3. ... compiler is free to move things around so long as the final result ...
    (comp.lang.fortran)
  • Support for optimization for dual core proc in C++
    ... Does the C++ in VS2005 allow optimization to parallelize and vectorize loops ... automatically like the product IntelC++ Compiler Version 10.0 does? ...
    (microsoft.public.vc.language)