Re: floating point mantissa?
From: Nick (SeaNICK_at_Noemail.nospam)
Date: Sat, 18 Dec 2004 15:54:12 -0800
> That implementation is fully standards conforiming with respect to the
> precision of long-double: it's not required to actually have more
> precision than double.
so why was long double 80 bits before,and 64 now? was it more-than-compliant
in the days when it compiled to 80 bit numbers? or compliant to a different
incidentally I have begun to think i will go with my proprietary struct
mentioned in my original mail.
unsigned mantissa : 96; // no assumed leading 1, so the value doesnt have to
be denormalized in order to support smaller numbers with more accuracy
unsigned exponent : 31; // (2^exp)- 1073741823 // way too large but might be
interesting to see how far I could zoom
unsigned sign : 1; // 0 = (+), 1 = (-)
the difficult pieces now appear to be:
1. quick verification that the number is not one of the predefined special
easily solved probably, by maybe just a goto statement using the hiword
and/or loword values, and doing the majority of the work in the default
section. not sure if this will be very optimized or not.
2. optimized normalization, denormalization methods.
whats the quickest way to see the largest bit in a value? my initial
attempts come to something like 3 instructions per comparison and that makes
every normalization a 146 cpu cycle operation - 48 comparisons for 96
possible values, one to subtract the magnitude detected from the magnitude
required, and one to shift the value the number of bits specified by the
result to the previous equation. I obviously don't know assembly yet, but
IMHO 146 cycles just for a normalization of a single floating point number
seems extremely slow. how many cpu cycles does it take for a regular
floating point normalization operation?
3. addition and subtraction operators seem easy, just use the largest
exponent value of the two operands. but what about multiplication and
4. there are some values that floating point numbers have a very hard time
with. I understand that for instance due to the leading one, 0 is not
possible. so if I kept the leading one instead of assuming it, would that
actually make the floating point number work for me, or are there still
difficulties like odd numbers, etc. ? what about an alternate method to make
those work, like binary coded decimal? I imagine I would probably be
throwing any concept of performance away at that point though.
I appreciate all your responses so far, and would be grateful for any
further suggestions you might have.
"Carl Daniel [VC++ MVP]" <email@example.com>
wrote in message news:%23u3giss4EHA.2180@TK2MSFTNGP10.phx.gbl...
> "Nick" <SeaNICK@Noemail.nospam> wrote in message
>> how do I force Visual C++ .NET 2003 to use long doubles as they are
>> originally intended to be used?
> Under VC++ the 'long double' type is identical to the 'double' type and
> there's nothing you can do to change that. That implementation is fully
> standards conforiming with respect to the precision of long-double: it's
> not required to actually have more precision than double.