Re: Why is C# 450% slower than C++ on nested loops ??



Peter Olcott wrote:
"Anders Borum" <anders@xxxxxxxxxxxxxx> wrote in message news:edHCjaSDGHA.3528@xxxxxxxxxxxxxxxxxxxxxxx

Hello!


It must have been after the JIT, because C# beat C++ on some things, just not the nested loops. The author of the link would know relatively obvious things such as this.

Unless somebody documents what methods have been used to measure the benchmarks, I am disposing the data altogether. Benchmarks should be verifiable ..


I searched high and low and finally found some stuff on unverified code. The MSDN documentation mentions it yet does not discuss what it means. It could be anything from security checks to tests for memory leaks. It doesn't say what its purpose is.

--
With regards
Anders Borum / SphereWorks
Microsoft Certified Professional (.NET MCP)




I guess you should google for 'define:verify'. Verifiable benchmarks have nothing to do with verifiable code. Anders just wanted the benchmarks to be verifiably *valid*, which they aren't by any means. A nested loop written the way it is in the benchmark is measuring nothing but a compiler's ability to optimize nested loops that do more or less nothing. Verifiable code means (in the .NET vocabulary) that the CLR can statically verify the code to ensure it will do nothing it is not expected to do. But enough of that, let's see what native assembly the compilers generate for the loop and get over with it.

<...>

ha! I won't post the complete disassemblies (the C++ one is terribly cryptic so it would do no help), but I've found the following. The C++ compiler (8.0 in my case, but I suspect 6.0 is doing the same) manages to pre-cache the additions in the outer loops, which the C# compiler doesn't. Thus, in the C# code all the additions that happen at

x+=a+b+c+d+e+f;

are evaluated for every single innermost loop, while c++ does something like the following:

for (a = 0; a < n; a++)
{
    for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
    {
        for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
        {
            for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
            {
                for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
                {
                    for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
                        x += tmp;
                }
            }
        }
    }
}

If you compile this code in C#, the execution time is 6734 ms for .NET 2.0 and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on my machine. But note that the innermost loop in the C++ is only four instructions. This can hardly be matched to any real-life algorithm, and I can assure you that if there was anything more than 'nothing' in the inner loop, the performance difference would be much less than 80% (I expect it to drop far below 10% for most real-life algorithms).

That said, I wouldn't expect C# (or .NET) to match pure native code performance for purely computational tasks like the one you describe is. C# can win easily in cases where a lot of dynamic allocation is involved, etc., but it will probably never outperform optimized native C++ (or handwritten assembly) for computational tasks. If you need to squeeze every clock of your CPU, you will probably get best results using an optimizing compiler targeted for exactly the CPU the code will run on.

I didn't want to write all this, but after reading through the long discussions on this matter without touching the importatnt points, I just decided to do so :)

Just my 2c.

Stefan
.



Relevant Pages

  • Re: Looking for benchmark test tool
    ... I used to work on the Bell Labs optimizer for the C compiler. ... Our marketing department gave us a bunch of about a dozen benchmarks to run ... But I wrote a Loop Invariant Code Motion optimization that noticed ... It is thought to measure floating point ...
    (comp.os.linux.misc)
  • Re: percentage based CPU scheduling
    ... we had inferred a CPU architecture difference between 2 models: ... > benchmarks, so they hauled in the firmware designer, who confirmed the ... Well our optimizer did loop invariant code motion, ...
    (comp.os.linux.misc)
  • Re: Why is C# 450% slower than C++ on nested loops ??
    ... Verifiable benchmarks have ... > Verifiable code means that the CLR can statically ... > the loop and get over with it. ... > pre-cache the additions in the outer loops, which the C# compiler doesn't. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Why is C# 450% slower than C++ on nested loops ??
    ... Benchmarks should be verifiable .. ... A nested loop written the way it is in the benchmark is measuring nothing but a compiler's ability to optimize nested loops that do more or less nothing. ... The C++ compiler manages to pre-cache the additions in the outer loops, which the C# compiler doesn't. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... >> both less efficient and less safe than the Fortran and Basic standard. ... >> The C for loop is actually trying to do what a do loop does. ... sloppy thinking that results from confusing a programming language ... > I do not believe that you are capable of writing a conforming C compiler. ...
    (comp.programming)