Re: Rounding of the double
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Sat, 02 Jun 2007 21:45:40 -0400
When in doubt, trust the representation. If I get involved in details of " how many
digits precision", the first rule is that only the binary bits matter. Decimal
representations do not; they are only approximations of the true value which is stored. So
arguing about precision of binary floating point by showing decimal numbers is usually
suspect.
The formal specification of floating point precision is usually expressed as ±1 LSB (Least
Significant Bit). So if you have a representation like IEEE, which has an implied 1 in
the high-order position, you have nominally 1 additional bit of precision, but it is
always expressed in binary.
So rewriting the program below,
#include "stdafx.h"
typedef union {
double d;
unsigned __int64 i;
CString HexString() {
CString s;
s.Format(_T("%c%04I64u (%+04I64d) %013I64x"), i & 0x8000000000000000 ? _T('-') :
_T('+'),
i >> 52 & 0x00000000000007FF,
(((__int64)(i >> 52 & 0x00000000000007FF)) - 1023),
i & 0x000FFFFFFFFFFFFF);
return s;
}
} dint;
int main()
{
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
// TODO: change error code to suit your needs
_tprintf(_T("Fatal Error: MFC initialization failed\n"));
return 1;
}
dint a1;
a1.d = 4e-15;
dint a2;
a2.d = 1 + a1.d;
dint a3;
a3.d = 1-a2.d;
bool a4 = 1 == a2.d;
printf("A1 = %+.20e %016I64X %s\n", a1.d, a1.i, a1.HexString());
printf("A2 = %+.20e %016I64X %s\n", a2.d, a2.i, a2.HexString());
printf("A3 = %+.20e %016I64X %s\n", a3.d, a3.i, a3.HexString());
printf("A4 = %s\n", (a4) ? "TRUE" : "FALSE");
return 0;
}
I get
A1 = +4.00000000000000030000e-015 3CF203AF9EE75616 +0975 (-048) 203af9ee75616
A2 = +1.00000000000000400000e+000 3FF0000000000012 +1023 (+000) 0000000000012
A3 = -3.99680288865056350000e-015 BCF2000000000000 -0975 (-048) 2000000000000
A4 = FALSE
That is a1 is 0x1203af9ee75616 x 2**-48
or
1.0010 0000 0011 1010 1111 1001 1110 1110 0111 0101 0110 0001 0110 x 2**-48
(53 bits of mantissa including the implied high-order 1-bit)
that is,
0.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0011 1010 1111
1001 1110 0111 0101 0110 0001 0110
I leave the rest as Exercises For The Readers.
joe
On Sat, 02 Jun 2007 22:35:38 GMT, MrAsm <mrasm@xxxxxxx> wrote:
On Sat, 02 Jun 2007 12:59:35 -0500, "Doug Harrison [MVP]"Joseph M. Newcomer [MVP]
<dsh@xxxxxxxx> wrote:
#include <stdio.h>
int main()
{
double a1 = 4e-15;
double a2 = 1+a1;
double a3 = 1-a2;
bool a4 = 1 == a2;
printf("A1 = %.20e\n", a1);
printf("A2 = %.20e\n", a2);
printf("A3 = %.20e\n", a3);
printf("A4 = %s\n", (a4) ? "TRUE" : "FALSE");
}
The output I get is (VC8, cl a.cpp):
A1 = 4.00000000000000030000e-0151 23456789012345xxxxxx
| |
10 15
The digits I put a 'x' under are *not* significant IMHO, in fact
IEEE754 double format has a precision of 15 digits.
e.g.:
http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html
And in fact you can see a spurious "3" in one of the 'x' positions
(and in fact you just wrote C++ code: a1=4e-15, so the "3" is actually
spurious).
A2 = 1.00000000000000400000e+0001 23456789012345xxxxxx
| |
10 15
Now the 4 is spurious. In fact, the 4 is over the 15th significant
digit.
A4 = FALSE
bool a4 = 1 == a2;
But you get 'false' here because you are comparing floating point
numbers the *wrong* way, because IMHO you *can't* do operator== to
compare floating point numbers for equality, I believe that, I believe
that you can only do "fuzzy" compares with floating points, e.g.
|1 - a2| < tolerance
fabs( 1 - a2 ) < tolerance.
BTW: I would very much like to know also what David Webber (who <cite
url="http://www.mozart.co.uk/information/author/authinfo.htm">is a
mathematician, theoretical physicist</cite>) thinks about that.
Thanks.
MrAsm
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- References:
- Re: Rounding of the double
- From: Les
- Re: Rounding of the double
- From: Alex
- Re: Rounding of the double
- From: Tom Serface
- Re: Rounding of the double
- From: David Webber
- Re: Rounding of the double
- From: Doug Harrison [MVP]
- Re: Rounding of the double
- From: MrAsm
- Re: Rounding of the double
- From: Doug Harrison [MVP]
- Re: Rounding of the double
- From: MrAsm
- Re: Rounding of the double
- Prev by Date: Re: Copying Bitmap
- Next by Date: Re: Rounding of the double
- Previous by thread: Re: Rounding of the double
- Next by thread: Re: Rounding of the double
- Index(es):
Relevant Pages
|
Loading