floatingpoint

Floating Point & IEEE 754

video

Some real numbers can never be represent in binary precisely

$$
\frac{1}{10} , \frac{1}{3},\frac{1}{5} ,…
$$

With a limited number of bits, we have to make a trade-off between range and precision

Why not use 2’s complement to encode exponent part?

We can’t compare two 2’s complement encoded number directly, but we can do the same to $Exp - \text{bias}$ encoded number to get a true comparison

Why we have the bias?

we want to have a monotonic range to ease our comparison
$$
[-127 , 128] +\text{bias} = [0,128 + \text{bias}]
$$

Halfway in binary

always in the form of x.xxxxx1000000….

“Half way” when bits to right of rounding position = $(100…)_2$

Float Point arithmetic

multi

addition

Casting in C

when doing casting, we examinate the fraction part of float point to see whether we need to round or not.