Re: using MFC VC++ - which is more efficient - float or double?

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Joseph M. Newcomer wrote:
See my results below.
Measurements made in debug mode are usually useless. You can only measure results made in
release mode. Otherwise, you have no idea what you are actually measuring. Doing
performance measurement in debug mode is like doing stress testing on a cardboard model of
a bridge. You get some data, but it doesn't tell you anything useful.

Here's the revised code. Note that I disabled /GL and in the linker removed the /LTCG
option. At some point, I need to take the time to do an analysis of the actual code I saw
being executed, as opposed to the code the compiler claimed to have generated; they were
quite different pieces of code, and require some careful study to tell what has happened.
But not this week, because I'm on deadline (and ten minutes ago, the new hardware arrived,
so I'm back to having a live embedded system to work with again)
joe

// fdub.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

double d = 1.0;
double d2 = 5.0;

float f = 1.0;
float f2 = 5.0;

int I = 5000000;
int I2 = 5;

__int64 I64 = 50000000000;
__int64 I642 = 5;

__declspec(noinline) void NoComputation()
{
return;
}

__declspec(noinline) void DoubleComputation()
{
d = d * 2.141632 / (d2 * 3.141592);
d2 *= 1.1416;
}

__declspec(noinline) void FloatComputation()
{
f = f * 2.141632f / (f2 * 3.141592f);
f2 *= 1.1416f;
}

__declspec(noinline) void IntComputation()
{
I = (I * 2) / (I2 * 3);
I2 *= 3;
}

__declspec(noinline) void Int64Computation()
{
I64 = (I64 * 2) / (I642 * 3);
I642 *= 3;
}

const int MAX_TESTS = 100000;

int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER start;
LARGE_INTEGER end;
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
NoComputation();
QueryPerformanceCounter(&end);
double tn = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("No computation: %1.9f\n"), tn);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
FloatComputation();
QueryPerformanceCounter(&end);

double tf = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("Float computation: %1.9f %1.9f\n"),tf , tf - tn);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
DoubleComputation();
QueryPerformanceCounter(&end);

double td = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("Double computation: %1.9f %1.9f\n"), td, td - tn);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
IntComputation();
QueryPerformanceCounter(&end);

double id = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("Int computation: %1.9f %1.9f\n"), id, id - tn);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
Int64Computation();
QueryPerformanceCounter(&end);

double i64d = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("Int64 computation: %1.9f %1.9f\n"), i64d, i64d - tn);
return 0;
}


On Thu, 23 Jul 2009 09:47:16 -0500, Stephen Myers
<""StephenMyers\"@discussions@xxxxxxxxxxxxx"> wrote:

Joe,

Please see below.

Joseph M. Newcomer wrote:
More data:

Just for completeness, I added integer computations

Win32, VS2008, release mode

No computation: 0.000001676
Float computation: 0.001314134 0.001312457
Double computation: 0.001352686 0.001351010
Int computation: 0.002558146 0.002556470
Int64 computation: 0.012508294 0.012506617

Hmm..floating point is FASTER than integer (2X faster!), and MUCH FASTER than 64-bit
arithmetic on a 32-bit machine (10X faster). Note that in this case, I had forced a
__declspec(noinline) and disabled LTCG (Link Time Code Generation) because LTCG changed
the behavior, and now double is 1.03x float. LTCG does VERY interesting things to the
code, and essentially does loop unrolling. That's a topic for some future discussion,
because I'm responding to these while some long and repetitive rebuilds go on.

Note also that integer divide is always notoriously slow.

=============================================================================

?IntComputation@@YAXXZ PROC ; IntComputation, COMDAT

; 37 : I = (I * 2) / (I2 * 3);

00000 a1 00 00 00 00 mov eax, DWORD PTR ?I2@@3HA ; I2
00005 8d 0c 40 lea ecx, DWORD PTR [eax+eax*2]
00008 a1 00 00 00 00 mov eax, DWORD PTR ?I@@3HA ; I
0000d 03 c0 add eax, eax
0000f 99 cdq
00010 f7 f9 idiv ecx

; 38 : I2 *= 3;

00012 89 0d 00 00 00
00 mov DWORD PTR ?I2@@3HA, ecx ; I2
00018 a3 00 00 00 00 mov DWORD PTR ?I@@3HA, eax ; I

; 39 : }

0001d c3 ret 0
?IntComputation@@YAXXZ ENDP ; IntComputation

=============================================================================

?Int64Computation@@YAXXZ PROC ; Int64Computation, COMDAT

; 43 : I64 = (I64 * 2) / (I642 * 3);

00000 a1 04 00 00 00 mov eax, DWORD PTR ?I642@@3_JA+4
00005 8b 0d 00 00 00
00 mov ecx, DWORD PTR ?I642@@3_JA
0000b 56 push esi
0000c 57 push edi
0000d 6a 00 push 0
0000f 6a 03 push 3
00011 50 push eax
00012 51 push ecx
00013 e8 00 00 00 00 call __allmul
00018 6a 00 push 0
0001a 8b fa mov edi, edx
0001c 8b 15 04 00 00
00 mov edx, DWORD PTR ?I64@@3_JA+4
00022 6a 02 push 2
00024 8b f0 mov esi, eax
00026 a1 00 00 00 00 mov eax, DWORD PTR ?I64@@3_JA
0002b 52 push edx
0002c 50 push eax
0002d e8 00 00 00 00 call __allmul
00032 57 push edi
00033 56 push esi
00034 52 push edx
00035 50 push eax
00036 e8 00 00 00 00 call __alldiv

; 44 : I642 *= 3;

0003b 89 3d 04 00 00
00 mov DWORD PTR ?I642@@3_JA+4, edi
00041 5f pop edi
00042 89 35 00 00 00
00 mov DWORD PTR ?I642@@3_JA, esi
00048 a3 00 00 00 00 mov DWORD PTR ?I64@@3_JA, eax
0004d 89 15 04 00 00
00 mov DWORD PTR ?I64@@3_JA+4, edx
00053 5e pop esi

; 45 : }

00054 c3 ret 0
?Int64Computation@@YAXXZ ENDP ; Int64Computation
On Wed, 22 Jul 2009 15:02:38 -0400, Joseph M. Newcomer <newcomer@xxxxxxxxxxxx> wrote:

See below...
On Wed, 22 Jul 2009 09:46:02 -0700, JRGlide <JRGlide@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

First, I have to admit that I'm not up on the latest hardware. So everthing I'm about to say may be null and void.

I don't think its true that the hardware does everything as doubles. As far as I know everything is floating point and doubles takes extra processing to emulate. I've measure it in the past and doubles were about five times slower than floats. Again, maybe this has now changed.
****
Strike one. Not true. The hardware works with 80-bit floating point representation. So
there's nothing that is 'saved' by using float. Since you have not said under what
context you made the measurements, they have no relevance to the discussion.
****
I can think of three reasons to use floats instead of doubles if you don't need the precision:

1. In case I'm right and doubles are slower. It would be easy enough to do a timing study and find out. Just do two loops 100 million times - multiply two floats in one and two doubles in the other and time them. Even better, do a divide.
****
Floats are slower. See data and sample program below.
****
2. If you are working with large sets of data, then memory. We use MATLAB a lot where I work and MATLAB does everything as doubles. This gets to be a problem sometimes even with a Gig or several Gigs of memory. And if you happen to be storing the data, the files are twice as large. Why waste memory/storage for nothing?
****
Probably because memory is largely irrelevant on nearly every kind of app. Unless you
really need massive arrays, the discussion of size is irrelevant. For large data
structures, it matters a lot. I know of one machine that ran with 8-bit floating point (3
bits exponent, 5 bits mantissa) and all functions were done with table lookup. But it is
a red herring if the discussion is about performance.
3. So that the next person looking at your code doesn't scratch their heads wondering why you did everything in double precision.
****
More seriously, I tend to scratch my head and wonder why anyone bothers to do anything in
float!
joe
****
"Mechi" wrote:

Hi!
I've been told that even if I only need small numbers 9up to 100) I should still use int - since it is more efficient in VC++.
What about float and double? precision isn't important - which type is "built-in" and will work faster/more efficiently?
Thanks,
Mechi
*****
Tests are run on a 1.81GHz AMD64 dual CPU dual-core system running 32-bit Vista-32
Ultimate SP2.

VS2008, debug mode
TOTAL TOTAL-NULL
No computation: 0.003317740
Float computation: 0.004153321 0.000835581
Double computation: 0.004058616 0.000740876

No computation: 0.003304051
Float computation: 0.004180699 0.000876648
Double computation: 0.004056940 0.000752889

No computation: 0.003267175
Float computation: 0.004173994 0.000906819
Double computation: 0.004041854 0.000774679

VS2008, release mode
/O2 /Oi /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD
/Gy /c /Zi /TP .\fdub.cpp
]

No computation: 0.000001676
Float computation: 0.001328381 0.001326705
Double computation: 0.000838375 0.000836699

No computation: 0.000002794
Float computation: 0.001325587 0.001322794
Double computation: 0.000836699 0.000833905

No computation: 0.000002794
Float computation: 0.001326146 0.001323353
Double computation: 0.000835022 0.000832229

// fdub.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

double d = 1.0;
double d2 = 5.0;

float f = 1.0;
float f2 = 5.0;

void NoComputation()
{
return;
}

void DoubleComputation()
{
d = d * 2.141632 / (d2 * 3.141592);
d2 *= 1.1416;
}

void FloatComputation()
{
f = f * 2.141632f / (f2 * 3.141592f);
f2 *= 1.1416f;
}

const int MAX_TESTS = 100000;

int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER start;
LARGE_INTEGER end;
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
NoComputation();
QueryPerformanceCounter(&end);

double tn = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("No computation: %1.9f\n"), tn);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
FloatComputation();
QueryPerformanceCounter(&end);

double tf = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("Float computation: %1.9f %1.9f\n"),tf , tf - tn);

QueryPerformanceCounter(&start);
for(int i = 0; i < MAX_TESTS; i++)
DoubleComputation();
QueryPerformanceCounter(&end);

double td = (double)(end.QuadPart - start.QuadPart)/ (double)freq.QuadPart;
_tprintf(_T("Double computation: %1.9f %1.9f\n"), td, td - tn);

return 0;
}
============================================
Now look at the code. The code is IDENTICAL except for the operand sizes! But the float
code is consistently slower!

?FloatComputation@@YAXXZ PROC ; FloatComputation, COMDAT

; 25 : f = f * 2.141632f / (f2 * 3.141592f);

00000 d9 05 00 00 00
00 fld DWORD PTR ?f@@3MA ; f
00006 dc 0d 00 00 00
00 fmul QWORD PTR __real@4001221000000000
0000c d9 05 00 00 00
00 fld DWORD PTR ?f2@@3MA ; f2
00012 dd 05 00 00 00
00 fld QWORD PTR __real@400921fb00000000
00018 d8 c9 fmul ST(0), ST(1)
0001a de fa fdivp ST(2), ST(0)
0001c d9 c9 fxch ST(1)
0001e d9 1d 00 00 00
00 fstp DWORD PTR ?f@@3MA ; f

; 26 : f2 *= 1.1416f;

00024 dc 0d 00 00 00
00 fmul QWORD PTR __real@3ff243fe60000000
0002a d9 1d 00 00 00
00 fstp DWORD PTR ?f2@@3MA ; f2

; 27 : }

00030 c3 ret 0
?FloatComputation@@YAXXZ ENDP ; FloatComputation

?DoubleComputation@@YAXXZ PROC ; DoubleComputation, COMDAT

; 19 : d = d * 2.141632 / (d2 * 3.141592);

00000 dd 05 00 00 00
00 fld QWORD PTR ?d@@3NA ; d
00006 dc 0d 00 00 00
00 fmul QWORD PTR __real@4001220ff540895d
0000c dd 05 00 00 00
00 fld QWORD PTR ?d2@@3NA ; d2
00012 dd 05 00 00 00
00 fld QWORD PTR __real@400921fafc8b007a
00018 d8 c9 fmul ST(0), ST(1)
0001a de fa fdivp ST(2), ST(0)
0001c d9 c9 fxch ST(1)
0001e dd 1d 00 00 00
00 fstp QWORD PTR ?d@@3NA ; d

; 20 : d2 *= 1.1416;

00024 dc 0d 00 00 00
00 fmul QWORD PTR __real@3ff243fe5c91d14e
0002a dd 1d 00 00 00
00 fstp QWORD PTR ?d2@@3NA ; d2

; 21 : }

00030 c3 ret 0
?DoubleComputation@@YAXXZ ENDP ; DoubleComputation

Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
Now I'm very confused! My tests give quite different results. I'm looking mostly at a debug build as I assume optimization will cloud the results. I'm seeing on the order of 25x faster 32 bit int timing.

These results are using long with debug.

No computation: 0.003031508
Float computation: 0.257270727 0.254239218
Double computation: 0.255351279 0.252319770
Int computation: 0.011679471 0.008647963

Obviously, I'm missing something here. Could you post the modified source?

Steve
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

My results - release mode. VS2008 SP
/O2 /Oi /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Yu"stdafx.h" /Fp"Release\Computations.pch" /Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /nologo /c /Zi /TP /errorReport:prompt

Intel Core 2 Duo E8400 @ 3.0GHz
No computation: 0.000000147
Float computation: 0.022255369 0.022255221
Double computation: 0.021203043 0.021202896
Int computation: 0.000716712 0.000716565
Int64 computation: 0.006513633 0.006513486

Intel Core 2 Duo T7700 @ 2.4Ghz
No computation: 0.000002514
Float computation: 0.083031503 0.083028988
Double computation: 0.039165313 0.039162799
Int computation: 0.000672991 0.000670476
Int64 computation: 0.007552915 0.007550401

I suspect the dismal Int64 preformance is the result of using calls such as __allmul which probably emulate 64 bit integer math for a 32 bit processor.

My assumptions have been that int math is faster.
The good news for me is that integers win on the processors I'm actually using.

I am suprised at the vast difference between the AMD and Intel processors.

Steve
.



Relevant Pages

  • Re: My scripting language - any suggestions?
    ... bool Is ... As for the int and float representation... ... void SetNull; ...
    (comp.compilers)
  • Re: Error freeing memory....
    ... int pvarid; ... float *mean; ... struct AR { ... void USAGE; ...
    (comp.lang.c)
  • Re: "PORTING C" > code for CURR_SYS_OSC
    ... micro-seconds by simply calling a delay function on the fly. ... void set_curr_sys_osc(float extCrys, float fpllidiv, float fpllmul, float ... {return (int) curr_sys_osc;} ...
    (microsoft.public.vc.language)
  • Re: Circumventing the -fno-strict-aliasing switch
    ... void* works just as well as char* (and gives you the advantage ... I guess the only option I have to safely 'convert' between int and ... float is something like this (please allow for some typos/semantic ...
    (comp.lang.c)
  • Re: using MFC VC++ - which is more efficient - float or double?
    ... I'm about to say may be null and void. ... As far as I know everything is floating point and doubles takes extra processing to emulate. ... there's nothing that is 'saved' by using float. ... This gets to be a problem sometimes even with a Gig or several Gigs of memory. ...
    (microsoft.public.vc.mfc)