tango.math.IEEE

Low-level Mathematical Functions which take advantage of the IEEE754 ABI.

License:
BSD style: , Digital Mars.

Authors:
Don Clugston, Walter Bright, Sean Kelly

struct IeeeFlags;
IEEE exception status flags

These flags indicate that an exceptional floating-point condition has occured. They indicate that a NaN or an infinity has been generated, that a result is inexact, or that a signalling NaN has been encountered. The return values of the properties should be treated as booleans, although each is returned as an int, for speed.

Example:
   real a=3.5;
   // Set all the flags to zero
   resetIeeeFlags();
   assert(!ieeeFlags.divByZero);
   // Perform a division by zero.
   a/=0.0L;
   assert(a==real.infinity);
   assert(ieeeFlags.divByZero);
   // Create a NaN
   a*=0.0L;
   assert(ieeeFlags.invalid);
   assert(isNaN(a));

   // Check that calling func() has no effect on the
   // status flags.
   IeeeFlags f = ieeeFlags;
   func();
   assert(ieeeFlags == f);



@property int inexact();
The result cannot be represented exactly, so rounding occured. (example: x = sin(0.1); }

@property int underflow();
A zero was generated by underflow (example: x = real.min_normal*real.epsilon/2;)

@property int overflow();
An infinity was generated by overflow (example: x = real.max*2;)

@property int divByZero();
An infinity was generated by division by zero (example: x = 3/0.0; )

@property int invalid();
A machine NaN was generated. (example: x = real.infinity * 0.0; )

@property IeeeFlags ieeeFlags();
Return a snapshot of the current state of the floating-point status flags.

void resetIeeeFlags();
Set all of the floating-point status flags to false.

enum RoundingMode: short;
IEEE rounding modes. The default mode is ROUNDTONEAREST.

RoundingMode setIeeeRounding(RoundingMode roundingmode);
Change the rounding mode used for all floating-point operations.

Returns the old rounding mode.

When changing the rounding mode, it is almost always necessary to restore it at the end of the function. Typical usage:
    auto oldrounding = setIeeeRounding(RoundingMode.ROUNDDOWN);
    scope (exit) setIeeeRounding(oldrounding);


RoundingMode getIeeeRounding();
Get the IEEE rounding mode which is in use.

PrecisionControl reduceRealPrecision(PrecisionControl prec);
Set the number of bits of precision used by 'real'.

Returns:
the old precision. This is not supported on all platforms.

real frexp(real value, out int exp);
Separate floating point value into significand and exponent.

Returns:
Calculate and return x and exp such that value =x*2 and .5 <= |x| < 1.0

x has same sign as value.

Special Values
value returns exp
±0.0 ±0.0 0
+∞ +∞ int.max
-∞ -∞ int.min
±NAN ±NAN int.min


real ldexp(real n, int exp);
Compute n * 2

References:
frexp

int ilogb(real x);
Extracts the exponent of x as a signed integral value.

If x is not a special value, the result is the same as .

Remarks:
This function is consistent with IEEE754R, but it differs from the C function of the same name in the return value of infinity. (in C, ilogb(real.infinity)== int.max). Note that the special return values may all be equal.

Special Values
x ilogb(x) Invalid?
0 FP_ILOGB0 yes
±∞ FP_ILOGBINFINITY yes
NAN FP_ILOGBNAN yes


real logb(real x);
Extracts the exponent of x as a signed integral value.

If x is subnormal, it is treated as if it were normalized. For a positive, finite x:

1 <= x * FLT_RADIX < FLT_RADIX

Special Values
x logb(x) divide by 0?
±∞ +∞ no
±0.0 -∞ yes


real scalbn(real x, int n);
Efficiently calculates x * 2.

scalbn handles underflow and overflow in the same fashion as the basic arithmetic operators.

Special Values
x scalb(x)
±∞ ±∞
±0.0 ±0.0


real fdim(real x, real y);
Returns the positive difference between x and y.

If either of x or y is NAN, it will be returned.

Returns:
Special Values
Arguments fdim(x, y)
x > y x - y
x <= y +0.0


real fabs(real x);
Returns |x|

Special Values
x fabs(x)
±0.0 +0.0
±∞ +∞


real fma(float x, float y, float z);
Returns (x * y) + z, rounding only once according to the current rounding mode.

BUGS:
Not currently implemented - rounds twice.

creal expi(real y);
Calculate cos(y) + i sin(y).

On x86 CPUs, this is a very efficient operation; almost twice as fast as calculating sin(y) and cos(y) seperately, and is the preferred method when both are required.

int isNaN(real x);
Returns !=0 if e is a NaN.

int isNormal(X)(X x);
Returns !=0 if x is normalized.

(Need one for each format because subnormal floats might be converted to normal reals)

bool isIdentical(real x, real y);
bool isIdentical(ireal x, ireal y);
bool isIdentical(creal x, creal y);
Is the binary representation of x identical to y?

Same as ==, except that positive and negative zero are not identical, and two NANs are identical if they have the same 'payload'.

int isSubnormal(float f);
int isSubnormal(double d);
int isSubnormal(real x);
Is number subnormal? (Also called "denormal".) Subnormals have a 0 exponent and a 0 most significant significand bit, but are non-zero.

int isZero(real x);
Return !=0 if x is ±0.

Does not affect any floating-point flags

int isInfinity(real x);
Return !=0 if e is ±∞;.

real nextUp(real x);
double nextDoubleUp(double x);
float nextFloatUp(float x);
Calculate the next largest floating point value after x.

Return the least number greater than x that is representable as a real; thus, it gives the next point on the IEEE number line.

Special Values
x nextUp(x)
-∞ -real.max
±0.0 real.min_normal*real.epsilon
real.max
NAN NAN


Remarks:
This function is included in the IEEE 754-2008 standard.

nextDoubleUp and nextFloatUp are the corresponding functions for the IEEE double and IEEE float number lines.

X splitSignificand(X)(ref X x);
Reduces the magnitude of x, so the bits in the lower half of its significand are all zero. Returns the amount which needs to be added to x to restore its initial value; this amount will also have zeros in all bits in the lower half of its significand.

real nextDown(real x);
double nextDoubleDown(double x);
float nextFloatDown(float x);
Calculate the next smallest floating point value before x.

Return the greatest number less than x that is representable as a real; thus, it gives the previous point on the IEEE number line.

Special Values
x nextDown(x)
real.max
±0.0 -real.min_normal*real.epsilon
-real.max -∞
-∞ -∞
NAN NAN


Remarks:
This function is included in the IEEE 754-2008 standard.

nextDoubleDown and nextFloatDown are the corresponding functions for the IEEE double and IEEE float number lines.

real nextafter(real x, real y);
Calculates the next representable value after x in the direction of y.

If y > x, the result will be the next largest floating-point value; if y < x, the result will be the next smallest value. If x == y, the result is y.

Remarks:
This function is not generally very useful; it's almost always better to use the faster functions nextUp() or nextDown() instead.

IEEE 754 requirements not implemented: The FE_INEXACT and FE_OVERFLOW exceptions will be raised if x is finite and the function result is infinite. The FE_INEXACT and FE_UNDERFLOW exceptions will be raised if the function value is subnormal, and x is not equal to y.

int feqrel(X)(X x, X y);
To what precision is x equal to y?

Returns:
the number of significand bits which are equal in x and y. eg, 0x1.F8p+60 and 0x1.F1p+60 are equal to 5 bits of precision.

Special Values
x y feqrel(x, y)
x x typeof(x).mant_dig
x >= 2*x 0
x = x/2 0
NAN any 0
any NAN 0


Remarks:
This is a very fast operation, suitable for use in speed-critical code.

int signbit(real x);
Return 1 if sign bit of e is set, 0 if not.

real copysign(real to, real from);
Return a value composed of to with from's sign bit.

T ieeeMean(T)(T x, T y);
Return the value that lies halfway between x and y on the IEEE number line.

Formally, the result is the arithmetic mean of the binary significands of x and y, multiplied by the geometric mean of the binary exponents of x and y. x and y must have the same sign, and must not be NaN.

Note:
this function is useful for ensuring O(log n) behaviour in algorithms involving a 'binary chop'.

Special cases: If x and y are within a factor of 2, (ie, feqrel(x, y) > 0), the return value is the arithmetic mean (x + y) / 2. If x and y are even powers of 2, the return value is the geometric mean, ieeeMean(x, y) = sqrt(x * y).

real NaN(ulong payload);
Create a NAN, storing an integer inside the payload.

For 80-bit or 128-bit reals, the largest possible payload is 0x3FFF_FFFF_FFFF_FFFF. For doubles, it is 0x3_FFFF_FFFF_FFFF. For floats, it is 0x3F_FFFF.

ulong getNaNPayload(real x);
Extract an integral payload from a NAN.

Returns:
the integer payload as a ulong.

For 80-bit or 128-bit reals, the largest possible payload is 0x3FFF_FFFF_FFFF_FFFF. For doubles, it is 0x3_FFFF_FFFF_FFFF. For floats, it is 0x3F_FFFF.


Page generated by Ddoc. Portions Copyright (C) 2001-2005 Digital Mars.