# IEEE 754‐1985, Binary 64, Floating‐Point Examiner

Chris W. Johnson

November 12, 2020

Imprecision is common in floating‐point. The imprecision is inevitable; no numeric format can represent every possible number. Furthermore, floating‐point formats like IEEE 754‐1985 were designed to preserve magnitude by sacrificing precision. Loss of precision, for whatever reason, can cause important effects, especially when people are unaware of the issue, a common problem (even among computer programmers) in part because decimal representations of these numbers are pervasively created with the intent of hiding inaccuracies.

It is hoped this tool will aid understanding of the causes and effects of those issues.

The “IEEE 754‐1985, binary 64”
format is a very common floating‐point format used by
Java to represent numbers in its
`double`

primitives and `Double`

objects, and for all floating‐point calculations. By extension, any other languages executed in the
Java Virtual Machine (JVM) will use it.
It is also used by languages unrelated to Java and the JVM. Notably, C and C++ use it wherever it is the
host CPU’s native representation (prime examples are Intel’s CPUs, so it is a very common case),
Javascript uses it for essentially all numbers,
C#
`float`

and `double`

types use it,
Python uses it on most platforms,
and so on.

Floating‐Point Format | IEEE 754‐1985, binary 64 | |
---|---|---|

Value, Typical Text Representation (Decimal) | ||

Value, Actual, Binary | Bits 63–0 | ${\mathrm{}}_{2}$ |

Value, Actual, Hexadecimal | Bits 63–0 | ${\mathrm{}}_{16}$ |

Sign | Bit 63 | |

Exponent | Bits 62–52 | |

Mantissa | Bits 51–0 | |

Value Calculation | = Full‐Precision Result ⩰ (low‐precision) |

## Change the Bits, See What Happens…

Click on the bits to change their values. This page updates immediately.

± | Exponent | Mantissa | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

— | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |

## Conventions

- All hexadecimal and binary representations are big‐endian.
- Binary (base 2) representations are suffixed with a subscripted “2”. Their bits are identified using numbers 63 to zero, assigned from left to right.
- Hexadecimal (base 16) representations are suffixed with a subscripted “16”.
- Hexadecimal representations are used when shorter than decimal.
- Steps required to transform a number into the value used in the final floating‐point calculation are separated with right arrow (“→”) symbols. (It’s not enough to know bit positions, rules govern use of those bits. The transformations are the rules applied to each bit field.)