Re: floating point mantissa?

From: Jerry Coffin (jcoffin_at_taeus.us)
Date: 12/19/04


Date: Sat, 18 Dec 2004 23:06:49 -0700

In article <#XlJgzV5EHA.208@TK2MSFTNGP12.phx.gbl>,
SeaNICK@Noemail.nospam says...

[ ... ]

> so why was long double 80 bits before,and 64 now? was it more-than-compliant
> in the days when it compiled to 80 bit numbers? or compliant to a different
> standard?

The standards haven't changed in this respect. IIRC, the explanation
I heard for switching to a 64-bit long double during the move to 32-
bit OSes was that at the time they supported various other processors
(e.g. the MIPS, Alpha and PowerPC) most of which didn't/don't
directly support an 80-bit type.

[ ... ]
 
> 2. optimized normalization, denormalization methods.
> whats the quickest way to see the largest bit in a value? my initial
> attempts come to something like 3 instructions per comparison and that makes
> every normalization a 146 cpu cycle operation - 48 comparisons for 96
> possible values, one to subtract the magnitude detected from the magnitude
> required, and one to shift the value the number of bits specified by the
> result to the previous equation. I obviously don't know assembly yet, but
> IMHO 146 cycles just for a normalization of a single floating point number
> seems extremely slow. how many cpu cycles does it take for a regular
> floating point normalization operation?

I suspect your guess at a cycle count is often going to be quite a
ways off. A Pentium 4 does NOT use barrel shifters, so shifting
something by N bits requires (approximately) N clock cycles.

With that given, you might want to consider a simpler design: start
by moving the mantissa into a temporary location with one extra bit.
In a loop, shift the mantissa left a bit (and decrement the exponent)
until the extra bit is set.

If you have a reasonable expectation that the CPU _will_ have a
barrel shifter, then it's probably better to find the MSB using a
binary search.

> 3. addition and subtraction operators seem easy, just use the largest
> exponent value of the two operands. but what about multiplication and
> division?

The exponents are handled by adding/subtracting. Multiplication is
done about like multiplication is (was?) done by hand on paper.

Assuming the mantissa is an unsigned int, multiplication could come
out something vaguely like this:

unsigned multiply(unsigned one, unsigned two) {
        unsigned answer = 0;

        do {
                if ( one & 1)
                        answer += two;
                one >>= 1;
                two <<= 1;
        } while (one != 0);
        return answer;
}

One typical division method is also similar to long division like we
used to do on paper. Here's a quick hack at the general idea:

unsigned divide(unsigned num, unsigned divisor) {
        unsigned denom=divisor;
        unsigned current = 0x80000000; // only top bit set.
        unsigned answer=0;

        do {
                if ( num >= denom) {
                        num -= denom;
                        answer |= current;
                }
                current >>= 1;
                denom >>= 1;
        } while (current != 0);
        return answer;
}

This division routine assumes both mantissas have already been
normalized. Of course a real version probably needs to check for
things like 0/0 first.

> 4. there are some values that floating point numbers have a very hard time
> with. I understand that for instance due to the leading one, 0 is not
> possible. so if I kept the leading one instead of assuming it, would that
> actually make the floating point number work for me, or are there still
> difficulties like odd numbers, etc. ?

The typical method is to create a special value in the _exponent_
that represents 0. In this case, the mantissa is ignored.

> what about an alternate method to make
> those work, like binary coded decimal? I imagine I would probably be
> throwing any concept of performance away at that point though.

Yes, pretty much.

A few other points: exponents are typically represented with a bias
notation -- I.e. you use an unsigned number, but interpret some
number near the middle of the range as representing 0.

Unfortunately, C and C++ don't allow you to specify a bit-width wider
than the size of a word, so your 'unsigned mantissa: 96' won't
normally work (this is really too bad).

In C++, you'd probably want to create a class template for the
mantissa, and probably another for the exponent. You could overload
the bit manipulation operators and such so the routine above would
need little change to work with 96-bit mantissas instead.

-- 
    Later,
    Jerry.
The universe is a figment of its own imagination.


Relevant Pages

  • Re: BigNum -- Floating Point
    ... > mantissa and exponent. ... In that the current implementation already uses 32-bit "digits", ... (in one design I'm considering) ...
    (comp.programming)
  • Re: Fixed-point Math help
    ... > suggestion of using google is probably a good one. ... > can be packed in any format you like or is convenient to you. ... but it is usually assumed for the mantissa at any convenient ... > values of the exponent. ...
    (comp.arch.embedded)
  • A Collating Representation for Extremely Gradual Overflow
    ... the binary point does not move as the exponent changes from 000...001 to ... the first part of the mantissa, whether it is 1, 01, or 001, before the ... After I thought of a further improvement, letting the exponent field ... This extremely gradual underflow representation does more than regular ...
    (comp.arch.arithmetic)
  • Re: QI and MQ Coder: First real-life experiences
    ... > is used to align the mantissa? ... If you look at SWI as a floating point number, ... So for that concept of "precision", ... The SW exponent, of course, as shown e.g. in EncDec.c, function ...
    (comp.compression)
  • Re: Confused about Floating Point format
    ... Normalization means to move the mantissa until it is "correct", ... that means getting rid of the leading zeros. ... or else the entire mantissa is going to be zero. ... ISO 7185 Standard Pascal web site: http://www.moorecad.com/standardpascal ...
    (comp.lang.asm.x86)