**Common Lisp the Language, 2nd Edition**

In
general,
computations with floating-point numbers are only approximate.
The *precision* of a floating-point number is not necessarily
correlated at all with the *accuracy* of that number.
For instance, 3.142857142857142857 is a more precise approximation
to than 3.14159, but the latter is more accurate.
The precision refers to the number of bits retained in the representation.
When an operation combines a short floating-point number with a long one,
the result will be a long floating-point number. This rule is made
to ensure that as much accuracy as possible is preserved; however,
it is by no means a guarantee.
Common Lisp numerical routines do assume, however, that the accuracy of
an argument does not exceed its precision. Therefore
when two small floating-point numbers
are combined, the result will always be a small floating-point number.
This assumption can be overridden by first explicitly converting
a small floating-point number to a larger representation.
(Common Lisp never converts automatically from a larger size to a smaller one.)

Rational computations cannot overflow in the usual sense (though of course there may not be enough storage to represent one), as integers and ratios may in principle be of any magnitude. Floating-point computations may get exponent overflow or underflow; this is an error.

X3J13 voted in June 1989 (FLOAT-UNDERFLOW)
to address certain problems relating to floating-point overflow and
underflow, but certain parts of the proposed solution were not adopted, namely
to add the macro `without-floating-underflow-traps` to the language and to
require certain behavior of floating-point overflow and underflow.
The committee agreed that this area of the language requires more
discussion before a solution is standardized.

For the record, the proposal that was considered and rejected
(for the nonce) introduced a macro
`without-floating-underflow-traps`
that would execute its body in such a way that, within its dynamic extent,
a floating-point underflow
must not signal an error but instead must produce
either a denormalized number or zero as the result.
The rejected proposal also specified the following treatment of overflow and underflow:

- A floating-point computation that overflows should signal
an error of type
`floating-point-overflow`. - Unless the dynamic extent of a use of
`without-floating-underflow-traps`, a floating-point computation that underflows should signal an error of type`floating-point-underflow`. A result that can be represented only in denormalized form must be considered an underflow in implementations that support denormalized floating-point numbers.

When rational and floating-point numbers are compared or combined by
a numerical function, the rule of *floating-point contagion*
is followed: when a rational meets a floating-point number,
the rational is first converted to a floating-point number of
the same format. For functions such as `+`
that take more than two arguments,
it may be that part of the operation is carried out exactly using
rationals and then the rest is done using floating-point arithmetic.

X3J13 voted in January 1989
(CONTAGION-ON-NUMERICAL-COMPARISONS)
to apply the rule of floating-point
contagion stated above to the case of *combining* rational and floating-point numbers.
For *comparing*, the following rule is to be used instead:
When a rational number and a floating-point number are to be compared
by a numerical function, in effect the floating-point number is first
converted to a rational number as if by the function `rational`,
and then an exact comparison of two rational numbers is performed.
It is of course valid to use a more efficient implementation than
actually calling the function `rational`, as long as the result
of the comparison is the same. In the case of complex numbers, the
real and imaginary parts are handled separately.

For functions that are mathematically associative (and possibly commutative), a Common Lisp implementation may process the arguments in any manner consistent with associative (and possibly commutative) rearrangement. This does not affect the order in which the argument forms are evaluated, of course; that order is always left to right, as in all Common Lisp function calls. What is left loose is the order in which the argument values are processed. The point of all this is that implementations may differ in which automatic coercions are applied because of differing orders of argument processing. As an example, consider this expression:

(+ 1/3 2/3 1.0D0 1.0 1.0E-15)

One implementation might process the arguments from left to right,
first adding `1/3` and `2/3` to get `1`, then converting that
to a double-precision floating-point number for combination
with `1.0D0`, then successively converting and adding `1.0` and
`1.0E-15`. Another implementation might process the arguments
from right to left, first performing a single-precision floating-point addition
of `1.0` and `1.0E-15` (and probably losing some accuracy
in the process!), then converting the sum to double precision
and adding `1.0D0`, then converting `2/3` to double-precision
floating-point and adding it, and then converting `1/3` and adding that.
A third implementation might first scan all the arguments, process
all the rationals first to keep that part of the computation exact,
then find an argument of the largest floating-point format among all
the arguments and add that, and then add in all other arguments,
converting each in turn (all in a perhaps misguided attempt to make
the computation as accurate as possible). In any case, all three
strategies are legitimate. The user can of course control the order of
processing explicitly by writing several calls; for example:

(+ (+ 1/3 2/3) (+ 1.0D0 1.0E-15) 1.0)

The user can also control all coercions simply by writing calls to coercion functions explicitly.

In general, then, the type of the result of a numerical function is a floating-point number of the largest format among all the floating-point arguments to the function; but if the arguments are all rational, then the result is rational (except for functions that can produce mathematically irrational results, in which case a single-format floating-point number may result).

There is a separate rule of complex contagion.
As a rule, complex numbers never result from a numerical function
unless one or more of the
arguments is complex. (Exceptions to this
rule occur among the irrational and transcendental functions,
specifically `expt`, `log`, `sqrt`,
`asin`, `acos`, `acosh`, and `atanh`;
see section 12.5.)
When a non-complex number meets a complex number, the non-complex
number is in effect first converted to a complex number by providing an
imaginary part of zero.

If any computation produces a result that is a ratio of
two integers such that the denominator evenly divides the
numerator, then the result is immediately converted to the equivalent
integer. This is called the rule of *rational canonicalization*.

If the result of any computation would be a complex rational
with a zero imaginary part, the result is immediately
converted to a non-complex rational number by taking the
real part. This is called the rule of *complex canonicalization*.
Note that this rule does *not* apply to complex numbers whose components
are floating-point numbers. Whereas `#C(5 0)` and `5` are not
distinct values in Common Lisp (they are always `eql`),
`#C(5.0 0.0)` and `5.0` are always distinct values in Common Lisp
(they are never `eql`, although they are `equalp`).

AI.Repository@cs.cmu.edu