Introduction

Floating point types have a reputation for being difficult to use compared to integer types. This is partly because they have unintuitive sources of rounding error but it also stems from the kinds of calculations they are used for, which contain plentiful other sources of numeric error.

The float_eq crate provides an API for comparing floating point primitives, structs and collections for equality that allows users to communicate their reasoning and intent with respect to numeric error.

This book is not designed to be read from cover to cover, but instead is a series of categorised articles. If you are unsure of how floats work or why they are considered difficult to use then begin with the Basic Usage guide, otherwise check out one of the categories for more:

Tutorials

Basic usage

So you have attempted to perform a simple floating point calculation in Rust, perhaps an old classic:


#![allow(unused)]
fn main() {
let sum = 0.1 + 0.2;

if sum == 0.3 {
    println!("Decimal math is working as expected!");
} else {
    println!("Something has gone horribly wrong!");
}
}

And it turns out that:

Something has gone horribly wrong!

What's going on? Let's take a closer look at the result of the sum:


#![allow(unused)]
fn main() {
println!("0.1 + 0.2 = {}", sum);
}
0.1 + 0.2 = 0.30000000000000004

That doesn't seem like the right answer at all! It's very close, but why is it off by a tiny amount?

Well, what's happened is that the f64 type being used to calculate sum has a binary numeric representation, and our inputs are specified as decimal numbers. Neither 0.1 nor 0.2 in decimal can be represented exactly in binary and so they have been rounded to the nearest values which can be. The addition is performed using binary arithmetic and finally is converted back into a decimal representation to be printed.

We can see how these values have been rounded by printing them with a high enough precision:


#![allow(unused)]
fn main() {
println!("0.1 -> {:.55}", 0.1);
println!("0.2 -> {:.55}", 0.2);
}
0.1 -> 0.1000000000000000055511151231257827021181583404541015625
0.2 -> 0.2000000000000000111022302462515654042363166809082031250

Close, so very close

Very few decimal numbers may be exactly represented in binary. Only powers of two:


#![allow(unused)]
fn main() {
println!("4.0   -> {:.55}", 4.0);
println!("2.0   -> {:.55}", 2.0);
println!("1.0   -> {:.55}", 1.0);
println!("0.5   -> {:.55}", 0.5);
println!("0.25  -> {:.55}", 0.25);
println!("0.125 -> {:.55}", 0.125);
}
4.0   -> 4.0000000000000000000000000000000000000000000000000000000
2.0   -> 2.0000000000000000000000000000000000000000000000000000000
1.0   -> 1.0000000000000000000000000000000000000000000000000000000
0.5   -> 0.5000000000000000000000000000000000000000000000000000000
0.25  -> 0.2500000000000000000000000000000000000000000000000000000
0.125 -> 0.1250000000000000000000000000000000000000000000000000000

And sums of powers of two:


#![allow(unused)]
fn main() {
println!("{:.55}", 0.5 + 0.25);
println!("{:.55}", 4.0 + 0.5 + 0.125);
}
0.7500000000000000000000000000000000000000000000000000000
4.6250000000000000000000000000000000000000000000000000000

Where:

  • 4.000 = 22
  • 2.000 = 21
  • 1.000 = 20
  • 0.500 = 2-1
  • 0.250 = 2-2
  • 0.125 = 2-3

Most decimal numbers are not sums of powers of two1, so every time we convert a decimal into floating point there is a high chance that it will be rounded. The difference between the input value and the converted f64 floating point value is known as the initial roundoff error.

Accumulating errors

Conversion from decimal to binary is not the only source of error in floating point arithmetic. If the only problem was the initial roundoff error, then we could reasonably expect the sum 0.1 + 0.2 to be computed exactly and be equal to the converted constant 0.3 when we test it with ==.

It is not:


#![allow(unused)]
fn main() {
let sum = 0.1 + 0.2;

println!("sum -> {:.55}", sum);
println!("0.3 -> {:.55}", 0.3);
}
sum -> 0.3000000000000000444089209850062616169452667236328125000
0.3 -> 0.2999999999999999888977697537484345957636833190917968750

Why aren't they the same? Well, sum has been rounded up to the nearest representable f64 number, and 0.3 has been rounded down.

Let's think a bit more about the original inputs. They have both been rounded up compared to their decimal values:

0.1 -> 0.1000000000000000055511151231257827021181583404541015625
0.2 -> 0.2000000000000000111022302462515654042363166809082031250

If we add these rounded values together by hand, we get a number that is just slightly above 0.3:

  0.1000000000000000055511151231257827021181583404541015625
+ 0.2000000000000000111022302462515654042363166809082031250
= 0.3000000000000000166533453693773481063544750213623046775

The results of operations on floating point values are rounded to the nearest representable value2. It just so happens that in this example, the sum of the two rounded values is high enough to be rounded up, whereas 0.3 is low enough to be rounded down. This is why checking they are equal with == returns false.

Nearly every operation on floating point numbers can result in further rounding, amplifying the effect of previous rounding. You may be able to mitigate this somewhat by reordering operations to reduce the magnitude, but it is impossible to avoid error entirely.

Distant relatives

Wait a minute though. If our two input two values are exactly representable as binary numbers then why was the result of adding them together rounded at all? Shouldn't sum be the same as the answer we calculated by hand? It is somewhat higher than we expected:

by hand: 0.3000000000000000166533453693773481063544750213623046775
by f64:  0.3000000000000000444089209850062616169452667236328125000

They are different because of the fundamental design of floating point numbers. The big advantage of floating point types, and the reason that we may be happy to deal with the small errors inherent in using them, is that they can represent a literally astronomically larger range of absolute values than integer types can in the same number of bits.

The trade off is that the granularity of floating point numbers changes with their magnitude. As floating point numbers get larger the absolute distance between adjacent values grows, and as they get smaller it shrinks, whereas the relative distance between values remains constant. This is in contrast to integer types where, regardless of magnitude, adjacent values are always the same absolute distance away from each another (one), but their relative distance changes.

Let's make this more concrete by comparing i32 and f32. Each can represent 4,294,967,296 different values using their 32 bits, but they make very different choices as to what those values are.

The positive range of i32 can represent:


#![allow(unused)]
fn main() {
0
1
...
2147483646
2147483647 // i32::MAX
}

The positive range of f32 can represent:


#![allow(unused)]
fn main() {
0.0
...
0.000000000000000000000000000000000000011754944 // f32::MIN_POSITIVE
0.000000000000000000000000000000000000011754945
...
340282330000000000000000000000000000000.0
340282350000000000000000000000000000000.0 // f32::MAX
∞ // f32::INFINITY                                        
}

As mentioned above, the absolute distance between every adjacent pair of i32 numbers within its range is always one, but their relative distance changes depending on their magnitude:

0 -> 1:
absolute: 1 - 0 = 1
relative: 1 / 0 = ∞

1 -> 2:
absolute: 2 - 1 = 1
relative: 2 / 1 = 2

...

2147483646 -> 2147483647:
absolute: 2147483647 - 2147483646 = 1
relative: 2147483647 / 2147483646 = 1.0000000004656612873077392578125

The f32 number line is a bit more complex. Zero and infinity are special values at the extremes and the tiny range of subnormal values just above zero act differently but the vast majority of f32 values are normal floating point numbers in the range from f32::MIN_POSITIVE to f32::MAX.

The absolute distance between each pair of adjacent normal f32 values varies depending on their size but their relative distance is always very close to 1.0000001. We can illustrate this with some help from the ieee754 crate:


#![allow(unused)]
fn main() {
use ieee754::Ieee754;

let a = f32::MIN_POSITIVE;
let b = a.next();
println!("{} ->\n{}:", a, b);
println!("absolute: {}", b - a);
println!("relative: {}\n", b / a);

let c = f32::MIN_POSITIVE.next();
let d = c.next();
println!("{} ->\n{}:", c, d);
println!("absolute: {}", d - c);
println!("relative: {}\n", d / c);

println!("...\n");

let e = f32::MAX.prev();
let f = f32::MAX;
println!("{} ->\n{}:", e, f);
println!("absolute: {}", f - e);
println!("relative: {}", f / e);
}
0.000000000000000000000000000000000000011754944 ->
0.000000000000000000000000000000000000011754945:
absolute: 0.000000000000000000000000000000000000000000001
relative: 1.0000001

0.000000000000000000000000000000000000011754945 ->
0.000000000000000000000000000000000000011754946:
absolute: 0.000000000000000000000000000000000000000000001
relative: 1.0000001

...

340282330000000000000000000000000000000 ->
340282350000000000000000000000000000000:
absolute: 20282410000000000000000000000000
relative: 1.0000001

Machine epsilon

The absolute distance between adjacent numbers for a floating point type is determined by multiples of that type's machine epsilon. This is the distance between adjacent values in the range 1.0 to 2.0. For f64 this is the constant f64::EPSILON. We can scale it for every other range of powers of two to determine the absolute distance between numbers in those ranges:

  • 0.25 to 0.5 contains 8,388,608 values, all a distance of 0.25 * f64::EPSILON apart.
  • 0.5 to 1.0 contains 8,388,608 values, all a distance of 0.5 * f64::EPSILON apart.
  • 1.0 to 2.0 contains 8,388,608 values, all a distance of f64::EPSILON apart.
  • 2.0 to 4.0 contains 8,388,608 values, all a distance of 2.0 * f64::EPSILON apart.
  • 4.0 to 8.0 contains 8,388,608 values, all a distance of 4.0 * f64::EPSILON apart.

This is why our addition was unexpectedly rounded up:

  • 0.1 is in the range 0.0625 to 0.125, which are all 0.0625 f64::EPSILON apart.
  • 0.2 is in the range 0.125 to 0.25, which are all 0.125 * f64::EPSILON apart.
  • 0.1 + 0.2 is in the range 0.25 to 0.5, which are all 0.25 * f64::EPSILON apart.

The input values 0.1 and 0.2 were both rounded up to the nearest values in their respective ranges. The result of adding them together is within a range with a lower granularity and was not exactly representable, so it needed to be rounded to the nearest value that is.

It gets much worse

But that's not the end of the story.

There are far more significant sources of error in most numerical calculations than the roundoff errors we have described so far. If the inputs are real world values, then there is almost certainly some measurement error in how they were collected. Algorithms like physics simulations that use discrete steps to approximate continuous functions introduce truncation error. Even the underyling mathematics may amplify existing error if it is numerically unstable, for example when dividing large numbers by much smaller ones or if values undergo catastrophic cancellation.

I'll say that again, because it's so important: in most real world programs, floating point roundoff error is so small as to be insignificant compared to other sources of error. Floating point numerics may be inappropriate for values that have strict absolute precision requirements, such as currency, but in general there is no need to shy away from using them because of their rounding behaviour.

Every numerical algorithm is unique and there is no one size fits all solution or set of defaults to account for the error inherent in them and hide it away. In fact, there is an entire field of active mathematical research concerned with computational error, Numerical Analysis. What we can do, however, is learn to reason about these sources of error and provide tools for taking it into account and communicating our thoughts to future readers and maintainers.

Close enough

Bearing that in mind, let's return to our original comparison. Now that we know why the exact == comparison failed, we can instead check that the difference between the expected and actual values lies within a margin of error, known as the tolerance. This is what the float_eq! macro is for.

Absolute tolerance comparison

The simplest algorithm to check if two floating point numbers are equal is an absolute tolerance comparison. This tests that the absolute difference between two values lies within a specified tolerance and is invoked with the syntax abs <= tol. We have calculated that in our very simple example the values may differ by at most 0.25 * f64::EPSILON:


#![allow(unused)]
fn main() {
use float_eq::float_eq;

let sum = 0.1 + 0.2;

if float_eq!(sum, 0.3, abs <= 0.25 * f64::EPSILON) {
    println!("Floating point math is working as expected!");
} else {
    println!("Something has gone horribly wrong!");
}
}
Floating point math is working as expected!

Relative tolerance comparison

Hurray! However, manually scaling the tolerance to the range of the operands is not very elegant. Fortunately, given that we know that we are comparing two normal numbers, we can use a relative tolerance comparison to scale the tolerance for us. The second operand is our expected value, so we may choose to use r2nd <= tol. With a relative tolerance comparison, the tolerance should be specified as if we were testing a value in the range 1.0 to 2.0, so f64::EPSILON indicates we are expecting our operands to be no more than one representable value apart from each other3:


#![allow(unused)]
fn main() {
use float_eq::float_eq;
let sum = 0.1 + 0.2;
if float_eq!(sum, 0.3, r2nd <= f64::EPSILON) {
    println!("Floating point math is working as expected!");
} else {
    println!("Something has gone horribly wrong!");
}
}
Floating point math is working as expected!

ULPs based comparison

If both numbers are normal and the same sign, which is often the case, we can use an ULPs comparison, another form of relative check. This uses a property of the underlying representation of floating point numbers which means that when we interpret their bits as unsigned integers, the adjacent floats are the adjacent integer values above and below. By using ulps <= tol to invoke one of these checks, the tolerance is the maximum number of representable values apart they may be regardless of their magnitude. In our example, we know they may be at most one representable f64 value apart:


#![allow(unused)]
fn main() {
use float_eq::float_eq;
let sum = 0.1 + 0.2;
if float_eq!(sum, 0.3, ulps <= 1) {
    println!("Floating point math is working as expected!");
} else {
    println!("Something has gone horribly wrong!");
}
}
Floating point math is working as expected!

Asserting ourselves

The float_eq library also includes assert_float_eq! to accompany the boolean float_eq! operator. To illustrate their use and show some more advanced comparison techniques, we will define a very simple numerical integrator. This function takes an initial value and step size, then advances the value by an arbitrary number of steps:


#![allow(unused)]
fn main() {
fn integrate(initial: f64, step: f64, count: usize) -> f64 {
    let mut sum = initial;
    for _ in 0..count {
        sum += step;
    }
    sum
}
}

Say we want to unit test a number of different sets of input to this algorithm, we might build ourselves a wrapper for it:


#![allow(unused)]
fn main() {
fn test_integrate(initial: f64, step: f64, count: usize, expected: f64) {
    let actual = integrate(initial, step, count);
    assert_float_eq!(actual, expected, r2nd <= f64::EPSILON);
}
}

Note that the syntax for assert, r2nd <= tol is the same as for the boolean form, and that you may use any of the same algorithms if you wish. Arbitrarily, we have begun with the same tolerance as for our first example comparison. This means that the equivalent of our first simple comparison will pass this test just fine:


#![allow(unused)]
fn main() {
test_integrate(0.2, 0.1, 1, 0.3);
}

Drifting away

Let's add some tests which use an increasing step size. Our first two are within the existing expected tolerance:


#![allow(unused)]
fn main() {
test_integrate(0.0, 0.1, 1, 0.1);
test_integrate(0.0, 0.1, 10, 1.0);
}

But when we look at 100 steps, we find that our test fails:


#![allow(unused)]
fn main() {
test_integrate(0.0, 0.1, 100, 10.0);
}
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, r2nd <= t)`
        left: `9.99999999999998`,
       right: `10.0`,
    abs_diff: `0.000000000000019539925233402755`,
   ulps_diff: `Some(11)`,
    [r2nd] t: `0.000000000000002220446049250313`', src\main.rs:15:9

This assert form prints out the values of the operands, like the standard Rust asserts, but it also provides additional context information to help us make sense of why it failed. Here, t is the tolerance:

  • abs_diff is the absolute difference between the two values: (left - right).abs().
  • ulps_diff is the difference in ULPs between the two values, the count of representable values they are apart. It may be None if two numbers have different signs.
  • [r2nd] t is the tolerance as scaled to the range of right (the second operand).

We can see that our actual and expected values are eleven ULPs from one another. That means that our tolerance of f64::EPSILON, equivalent to one ULP, is inadequate. This is because we are performing count number of additions, and each one of those provides an answer accurate to within 0.5 ULPs, so they have accumulated more error than a single step or ten steps would. We might reason that our tolerance should therefore be some function of the number of steps, perhaps (count * 0.5) * f64::EPSILON:


#![allow(unused)]
fn main() {
fn test_integrate(initial: f64, step: f64, count: usize, expected: f64) {
    let actual = integrate(initial, step, count);
    assert_float_eq!(
        actual,
        expected,
        r2nd <= ((count as f64) * 0.5) * f64::EPSILON
    );
}
}

Absolute zero

Let's take a look at another failure case, when we count backwards to zero:


#![allow(unused)]
fn main() {
test_integrate(10.0, -0.1, 100, 0.0);
}
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, r2nd <= t)`
        left: `0.000000000000018790524691780774`,
       right: `0.0`,
    abs_diff: `0.000000000000018790524691780774`,
   ulps_diff: `Some(4401468191289638912)`,
    [r2nd] t: `0.0`', src\main.rs:15:9

Wow! There are a couple of things to note about this failure. The first is that because we are comparing versus zero, scaling our tolerance doesn't work - zero times anything is zero, so the tolerance does not take into account the step count. Even more noticable though is just how far away our actual value is in terms of representable numbers.

There are two reasons why this has happened. The first is that, as mentioned above, zero is a special number and does not have the same properties as normal floating point numbers. The second is that our final subtraction leaves us with a result many orders of magnitude smaller than the previous sum total, resulting in a catastrophic cancellation.

The solution here is to use a more sophisticated calculation for our tolerance, one that takes into account the nature of the calculation itself. It needs to be an absolute test, since we may be comparing versus zero. It should also scale relative to the largest intermediate value in the calculation and take into account the potential rounding errors from our repeated addition:


#![allow(unused)]
fn main() {
fn test_integrate(initial: f64, step: f64, count: usize, expected: f64) {
    let actual = integrate(initial, step, count);
    let half_count = (count as f64) * 0.5;
    let tol = f64::EPSILON * half_count * f64::max(initial.abs(), actual.abs());
    assert_float_eq!(actual, expected, abs <= tol);
}
}

This further illustates that there is no one right or general way to express the tolerances of numeric algorithms. Every comparison will be based on the specific calculation being performed and frequently the particular inputs.

Custom messages

The assert macros may include a custom message in the same manner as the standard Rust asserts:


#![allow(unused)]
fn main() {
assert_float_eq!(
    actual,
    expected,
    r2nd <= f64::EPSILON,
    "\nWhere: initial: {}, step: {}, count: {}",
    initial,
    step,
    count
);
}
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, r2nd <= t)`
        left: `9.99999999999998`,
       right: `10.0`,
    abs_diff: `0.000000000000019539925233402755`,
   ulps_diff: `Some(11)`,
    [r2nd] t: `0.000000000000002220446049250313`:
Where: initial: 0, step: 0.1, count: 100', src\main.rs:14:9

Staying afloat

Hopefully that's given you some flavour of the issues that crop up when implementing numerical methods and how float_eq may aid you when they do. At this point you may be interested in learning how to perform some specific tasks, reading some more general background explanations or browsing the API documentation.


1

They are in fact all exact sums of multiples of powers of ten.

2

By default. Other rounding modes such as always rounding up, down or toward zero are available.

3

In general, n * fXX::EPSILON as a relative tolerance means "at most n representable values apart", for example you might use a tolerance of 4.0 * f64::EPSILON.

How to

How to compare floating point numbers

  1. Determine which comparison algorithm best suits your purposes.

  2. Determine the tolerance required based on the sources of the error in how the numbers were entered/calculated.

  3. If you need a boolean result, then use float_eq! or float_ne!. The two numbers should be the first and second operands, and then the tolerance is the value after the <=. For example, to compare two f32 numbers stored in a and b using a relative tolerance of tol scaled to the magnitude of the second operand:


#![allow(unused)]
fn main() {
float_eq!(a, b, r2nd <= tol)
}
  1. If instead you wish to assert that the two numbers are equal and panic if they are not, you can use assert_float_eq! or assert_float_ne!. The syntax is the same, and may optionally include formatted text after the comparisons as with standard Rust asserts:

#![allow(unused)]
fn main() {
assert_float_eq!(a, b, r2nd <= tol);
assert_float_eq!(a, b, r2nd <= tol, "Example context: {}", context);
}

How to compare composite types

  1. Composite types such as arrays, tuples and structs may be compared by specifying a per-field tolerance in an instance of the same type:

#![allow(unused)]
fn main() {
let a = [1.0, -2.0, 3.0];
let b = [-1.0, 2.0, 3.5];
assert_float_eq!(a, b, abs <= [2.0, 4.0, 0.5]);

let c = Complex32 { re: 2.0, im: 4.000_002 };
let d = Complex32 { re: 2.000_000_5, im: 4.0 };
assert_float_eq!(c, d, rmax <= Complex32 { re: 0.000_000_25, im: 0.000_000_5 });
assert_float_eq!(c, d, ulps <= Complex32Ulps { re: 2, im: 4 });
}
  1. Homogeneous types may also support the _all variants of the checks, which allow you to specify a single tolerance to use across all fields:

#![allow(unused)]
fn main() {
assert_float_eq!(a, b, abs_all <= 4.0);

assert_float_eq!(c, d, rmax_all <= 0.000_000_5);
assert_float_eq!(c, d, ulps_all <= 4);
}
  1. Checks may be extended over new types by implementing the extension traits.

How to interpret assert failure messages

Assertion failure messages provide context information that hopefully helps in determining how a check failed. For example, this line:


#![allow(unused)]
fn main() {
assert_float_eq!(4.0f32, 4.000_008, rmax <= 0.000_001);
}

Panics with this error message:

thread 'main' panicked at 'assertion failed: `float_eq!(left, right, rmax <= t)`
        left: `4.0`,
       right: `4.000008`,
    abs_diff: `0.000008106232`,
   ulps_diff: `Some(17)`,
    [rmax] t: `0.000004000008`', assert_failure.rs:15:5

Where:

  • rmax <= t - indicates the type of comparison which was carried out.
  • left - the value of the first operand.
  • right - the value of the second operand.
  • abs_diff - the absolute difference between left and right.
  • ulps_diff - the difference between left and right in ULPs. If it is None, that is because they have different signs or at least one is NaN.
  • [rmax] t - the tolerance used in the comparison against the relevant difference, here abs_diff, after it has been scaled relative to an operand, in this case max(left, right) since it is rmax.

How to compare custom types

To extend float_eq functionality over a new type, you should implement the relevant traits:

  1. float_eq! and float_ne! require FloatEqUlpsTol and FloatEq.

  2. If your type is homogeneous, that is if it consists of fields that are all the same underlying floating point type, you should implement the optional FloatEqAll to enable the _all comparison algorithms.

  3. assert_float_eq! and assert_float_ne! require the same traits plus FloatEqDebugUlpsDiff and AssertFloatEq. If you have implemented FloatEqAll you should also implement AssertFloatEqAll.

If your type is a non-generic struct or tuple struct that consists entirely of already supported fields, then the easiest way to implement these traits is to make use of the #[derive_float_eq] helper macro. It is also possible to #[derive] individual traits. If you cannot derive an implementation, then you will need to implement the traits directly.

#[derive_float_eq]

If your type is a non-generic struct or tuple struct then you may derive the relevant traits using this helper macro. Enable the "derive" feature by adding this to your Cargo.toml:

[dependencies.float_eq]
version = "0.6"
features = ["derive"]

Add #[derive_float_eq] to your type. The ulps_tol and debug_ulps_diff parameters are required. They are used to name two new types that match the structure of the type being derived from. The first is used to provide ULPs tolerance values per field, and the second is used to provide debug information for the differerence between values in ULPs.

The all_tol parameter is optional, and ought to be provided if your type is homogeneous and consists of fields that are all the same underlying floating point type. If provided, it will additionally implement the traits required to use the _all variants of checks, using the given tolerance type (usually f32 or f64).


#![allow(unused)]
fn main() {
#[derive_float_eq(
    ulps_tol = "PointUlps", 
    debug_ulps_diff = "PointDebugUlpsDiff",
    all_tol = "f64"
)]
#[derive(Debug, PartialEq, Clone, Copy)]
struct Point {
    x: f64,
    y: f64,
}
}

This will generate the following two types in addition to implementing the relevant extension traits. These new types will derive Debug, Clone, Copy and PartialEq:


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
struct PointUlps {
    x: UlpsTol<f64>,
    y: UlpsTol<f64>,
}

#[derive(Debug, Clone, Copy, PartialEq)]
struct PointDebugUlpsDiff {
    x: DebugUlpsDiff<f64>,
    y: DebugUlpsDiff<f64>,
}
}

This enables the use of your type in the float_eq macros:


#![allow(unused)]
fn main() {
let a = Point { x: 1.0, y: -2.0 };
let b = Point { x: 1.1, y: -2.2 };

assert_float_eq!(a, b, abs <= Point { x: 0.15, y: 0.25 });
assert_float_eq!(a, b, abs_all <= 0.25);

let c = Point { x: 1.000_000_000_000_000_9, y: -2.000_000_000_000_001_3 };
let eps = f64::EPSILON;

assert_float_eq!(a, c, rmax <= Point { x: 4.0 * eps, y: 5.0 * eps });
assert_float_eq!(a, c, rmax_all <= 5.0 * eps);

assert_float_eq!(a, c, ulps <= PointUlps { x: 4, y: 3 });
assert_float_eq!(a, c, ulps_all <= 4);
}

Manually #[derive] traits

If you cannot use #[derive_float_eq] then you may be able to derive individual traits manually. Enable the "derive" feature by adding this to your Cargo.toml:

[dependencies.float_eq]
version = "0.6"
features = ["derive"]

#[derive(FloatEqUlpsTol)]

Add a #[float_eq] attribute and provide ulps_tol, which will be used as the name of a new type. This type will be structurally identical to the type being derived, using the same visibility as the parent type and with identically named fields that use the derived fields' types wrapped by UlpsTol. The new struct derives Debug, Clone, Copy and PartialEq.


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, FloatEqUlpsTol)]
#[float_eq(ulps_tol = "PointUlps")]
struct Point {
    x: f64,
    y: f64,
}
}

This will generate the following struct:


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
struct PointUlps {
    x: UlpsTol<f64>,
    y: UlpsTol<f64>,
}
}

#[derive(FloatEq)]

Requires FloatEqUlpsTol. Add a #[float_eq] attribute and provide ulps_tol, which should match the name of the FloatEqUlpsTol type. Two instances are equal if all fields are equal, and not equal if any are not.


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, FloatEqUlpsTol, FloatEq)]
#[float_eq(ulps_tol = "PointUlps")]
struct Point {
    x: f64,
    y: f64,
}
}

#[derive(FloatEqAll)]

Add a #[float_eq] attribute and specify all_tol, which is the type to be used as AllTol, usually f32 or f64. Two instances are equal if all fields are equal, and not equal if any are not.


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, FloatEqAll)]
#[float_eq(ulps_tol = "PointUlps", all_tol = "f64")]
struct Point {
    x: f64,
    y: f64,
}
}

#[derive(FloatEqDebugUlpsDiff)]

Add a #[float_eq] attribute and provide debug_ulps_diff, which will be used as the name of a new type. This type will be structurally identical to the type being derived, using the same visibility as the parent type and with identically named fields that use the derived fields' types wrapped by DebugUlpsDiff. The new struct derives Debug, Clone, Copy and PartialEq.


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, FloatEqDebugUlpsDiff)]
#[float_eq(debug_ulps_diff = "PointDebugUlpsDiff")]
struct Point {
    x: f64,
    y: f64,
}
}

This will generate the following struct:


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
struct PointDebugUlpsDiff {
    x: DebugUlpsDiff<f64>,
    y: DebugUlpsDiff<f64>,
}
}

#derive[(AssertFloatEq)]

Requires FloatEqUlpsTol, FloatEq and FloatEqDebugUlpsDiff. Add a #[float_eq] attribute and provide ulps_tol and ulps_debug_diff, which should match the name of the UlpsTol and DebugUlpsDiff types. Each field's tolerance is calculated via a recursive call to the algorithm being used.


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
#[derive(FloatEqUlpsTol, FloatEq, FloatEqDebugUlpsDiff, AssertFloatEq)]
#[float_eq(ulps_tol = "PointUlps", debug_ulps_diff = "PointDebugUlpsDiff")]
struct Point {
    x: f64,
    y: f64,
}
}

#[derive(AssertFloatEqAll)]

Requires FloatEqUlpsTol, FloatEq, FloatEqAll, FloatEqDebugUlpsDiff and AssertFloatEq. Add a #[float_eq] attribute and provide ulps_tol, ulps_debug_diff, and all_tol, which should match the names of the UlpsTol, DebugUlpsDiff and AllTol types. Each field's tolerance is calculated via a recursive call to the algorithm being used.


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
#[derive(
    FloatEqUlpsTol, FloatEq, FloatEqAll,
    FloatEqDebugUlpsDiff, AssertFloatEq, AssertFloatEqAll
)]
#[float_eq(
    ulps_tol = "PointUlps",
    debug_ulps_diff = "PointUlpsDebugUlpsDiff",
    all_tol = "f64",
)]
struct Point {
    x: f64,
    y: f64,
}
}

Implementing the traits directly

If you cannot derive the implementations of the extension traits, then you may implement them manually. These implementations will be based on the same Point type as the derive examples:


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
struct Point {
    x: f64,
    y: f64,
}
}

Implementing FloatEqUlpsTol

Types should provide an UlpsTol representation for each of their fields:


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
struct PointUlps {
    x: UlpsTol<f64>,
    y: UlpsTol<f64>,
}

impl FloatEqUlpsTol for Point {
    type UlpsTol = PointUlps;
}
}

Implementing FloatEq

Requires FloatEqUlpsTol. Implementation is then usually a matter of calling through to an underlying FloatEq method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:


#![allow(unused)]
fn main() {
impl FloatEq for Point {
    type Tol = Point;

    fn eq_abs(&self, other: &Self, tol: &Point) -> bool {
        self.x.eq_abs(&other.x, &tol.x) &&
        self.y.eq_abs(&other.y, &tol.y)
    }

    fn eq_rmax(&self, other: &Self, tol: &Point) -> bool {
        self.x.eq_rmax(&other.x, &tol.x) &&
        self.y.eq_rmax(&other.y, &tol.y)
    }

    fn eq_rmin(&self, other: &Self, tol: &Point) -> bool {
        self.x.eq_rmin(&other.x, &tol.x) &&
        self.y.eq_rmin(&other.y, &tol.y)
    }

    fn eq_r1st(&self, other: &Self, tol: &Point) -> bool {
        self.x.eq_r1st(&other.x, &tol.x) &&
        self.y.eq_r1st(&other.y, &tol.y)
    }

    fn eq_r2nd(&self, other: &Self, tol: &Point) -> bool {
        self.x.eq_r2nd(&other.x, &tol.x) &&
        self.y.eq_r2nd(&other.y, &tol.y)
    }

    fn eq_ulps(&self, other: &Self, tol: &UlpsTol<Point>) -> bool {
        self.x.eq_ulps(&other.x, &tol.x) &&
        self.y.eq_ulps(&other.y, &tol.y)
    }
}
}

Implementing FloatEqAll

Select a tolerance type to compare recursively with each field in your type, likely f32 or f64. Implementation is then usually a matter of calling through to an underlying FloatEqAll method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:


#![allow(unused)]
fn main() {
impl FloatEqAll for Point {
    type AllTol = f64;

    fn eq_abs_all(&self, other: &Self, tol: &f64) -> bool {
        self.x.eq_abs_all(&other.x, tol) &&
        self.y.eq_abs_all(&other.y, tol)
    }

    fn eq_rmax_all(&self, other: &Self, tol: &f64) -> bool {
        self.x.eq_rmax_all(&other.x, tol) &&
        self.y.eq_rmax_all(&other.y, tol)
    }

    fn eq_rmin_all(&self, other: &Self, tol: &f64) -> bool {
        self.x.eq_rmin_all(&other.x, tol) &&
        self.y.eq_rmin_all(&other.y, tol)
    }

    fn eq_r1st_all(&self, other: &Self, tol: &f64) -> bool {
        self.x.eq_r1st_all(&other.x, tol) &&
        self.y.eq_r1st_all(&other.y, tol)
    }

    fn eq_r2nd_all(&self, other: &Self, tol: &f64) -> bool {
        self.x.eq_r2nd_all(&other.x, tol) &&
        self.y.eq_r2nd_all(&other.y, tol)
    }

    fn eq_ulps_all(&self, other: &Self, tol: &UlpsTol<f64>) -> bool {
        self.x.eq_ulps_all(&other.x, tol) &&
        self.y.eq_ulps_all(&other.y, tol)
    }
}
}

Implementing FloatEqDebugUlpsDiff

Types should provide a DebugUlpsDiff representation for each of their fields:


#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq)]
struct PointDebugUlpsDiff {
    x: DebugUlpsDiff<f64>,
    y: DebugUlpsDiff<f64>,
}

impl FloatEqDebugUlpsDiff for Point {
    type DebugUlpsDiff = PointDebugUlpsDiff;
}
}

Implementing AssertFloatEq

Requires FloatEqUlpsTol, FloatEq and FloatEqDebugUlpsDiff. Implementation is then usually a matter of simply calling through to an underlying AssertFloatEq method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:


#![allow(unused)]
fn main() {
impl AssertFloatEq for Point {
    type DebugAbsDiff = Self;
    type DebugTol = Self;

    fn debug_abs_diff(&self, other: &Self) -> Point {
        Point {
            x: self.x.debug_abs_diff(&other.x),
            y: self.y.debug_abs_diff(&other.y),
        }
    }

    fn debug_ulps_diff(&self, other: &Self) -> PointDebugUlpsDiff {
        PointDebugUlpsDiff {
            x: self.x.debug_ulps_diff(&other.x),
            y: self.y.debug_ulps_diff(&other.y),
        }
    }

    fn debug_abs_tol(
        &self,
        other: &Self,
        tol: &Point
    ) -> Point {
        Point {
            x: self.x.debug_abs_tol(&other.x, &tol.x),
            y: self.y.debug_abs_tol(&other.y, &tol.y),
        }
    }

    fn debug_rmax_tol(
        &self,
        other: &Self,
        tol: &Point
    ) -> Point {
        Point {
            x: self.x.debug_rmax_tol(&other.x, &tol.x),
            y: self.y.debug_rmax_tol(&other.y, &tol.y),
        }
    }

    fn debug_rmin_tol(
        &self,
        other: &Self,
        tol: &Point
    ) -> Point {
        Point {
            x: self.x.debug_rmin_tol(&other.x, &tol.x),
            y: self.y.debug_rmin_tol(&other.y, &tol.y),
        }
    }

    fn debug_r1st_tol(
        &self,
        other: &Self,
        tol: &Point
    ) -> Point {
        Point {
            x: self.x.debug_r1st_tol(&other.x, &tol.x),
            y: self.y.debug_r1st_tol(&other.y, &tol.y),
        }
    }

    fn debug_r2nd_tol(
        &self,
        other: &Self,
        tol: &Point
    ) -> Point {
        Point {
            x: self.x.debug_r2nd_tol(&other.x, &tol.x),
            y: self.y.debug_r2nd_tol(&other.y, &tol.y),
        }
    }

    fn debug_ulps_tol(
        &self,
        other: &Self,
        tol: &PointUlps,
    ) -> PointUlps {
        PointUlps {
            x: self.x.debug_ulps_tol(&other.x, &tol.x),
            y: self.y.debug_ulps_tol(&other.y, &tol.y),
        }
    }
}
}

Implementing AssertFloatEqAll

Requires FloatEqUlpsTol, FloatEq, FloatEqAll, FloatEqDebugUlpsDiff and AssertFloatEq. Implementation is then usually a matter of simply calling through to an underlying AssertFloatEqAll method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:


#![allow(unused)]
fn main() {
impl AssertFloatEqAll for Point {
    type AllDebugTol = Self;

    fn debug_abs_all_tol(
        &self,
        other: &Self,
        tol: &Self::AllTol
    ) -> Self::AllDebugTol {
        Point {
            x: self.x.debug_abs_all_tol(&other.x, tol),
            y: self.y.debug_abs_all_tol(&other.y, tol),
        }
    }

    fn debug_rmax_all_tol(
        &self,
        other: &Self,
        tol: &Self::AllTol
    ) -> Self::AllDebugTol {
        Point {
            x: self.x.debug_rmax_all_tol(&other.x, tol),
            y: self.y.debug_rmax_all_tol(&other.y, tol),
        }
    }

    fn debug_rmin_all_tol(
        &self,
        other: &Self,
        tol: &Self::AllTol
    ) -> Self::AllDebugTol {
        Point {
            x: self.x.debug_rmin_all_tol(&other.x, tol),
            y: self.y.debug_rmin_all_tol(&other.y, tol),
        }
    }

    fn debug_r1st_all_tol(
        &self,
        other: &Self,
        tol: &Self::AllTol
    ) -> Self::AllDebugTol {
        Point {
            x: self.x.debug_r1st_all_tol(&other.x, tol),
            y: self.y.debug_r1st_all_tol(&other.y, tol),
        }
    }

    fn debug_r2nd_all_tol(
        &self,
        other: &Self,
        tol: &Self::AllTol
    ) -> Self::AllDebugTol {
        Point {
            x: self.x.debug_r2nd_all_tol(&other.x, tol),
            y: self.y.debug_r2nd_all_tol(&other.y, tol),
        }
    }

    fn debug_ulps_all_tol(
        &self,
        other: &Self,
        tol: &UlpsTol<Self::AllTol>,
    ) -> UlpsTol<Self::AllDebugTol> {
        PointUlps {
            x: self.x.debug_ulps_all_tol(&other.x, tol),
            y: self.y.debug_ulps_all_tol(&other.y, tol),
        }
    }
}
}

Background

Floating point comparison algorithms

Descriptions of the underlying comparison algorithms used by float_eq.

Absolute tolerance comparison

abs <= tol

A check to see how far apart two expressions are by comparing the absolute difference between them to an absolute tolerance. Mathematically, this is:

|a - b| <= tol

Equivalent to, using f32 as an example:


#![allow(unused)]
fn main() {
fn float_eq_abs(a: f32, b: f32, tol: f32) -> bool {
    // the PartialEq check covers equality of infinities
    a == b || (a - b).abs() <= tol
}
}

This is the simplest method of testing the equality of two floats and may be sufficient if you know the absolute margin of error for your calculation given the values being tested. However, absolute tolerance tests do not work well for general comparison of floating point numbers, because they do not take into account that normal values' granularity changes with their magnitude. Thus any given choice of tol is likely to work for one specific exponent's range and poorly outside of it.

In some circumstances absolute tolerance comparisons are required. If you wish to compare against zero, an infinity, or subnormal values then the assumptions that relative tolerance or ULPs based checks make about how neighbouring values are related to one another break down. Similarly, if the underlying mathematics of your algorithm is numerically unstable, for example if it is prone to catastrophic cancellation, then you may find that you need to reach for an absolute tolerance comparison.

Relative tolerance comparison

r1st <= tol
r2nd <= tol
rmax <= tol
rmin <= tol

A check to see how far apart two expressions are by comparing the absolute difference between them to an tolerance that is scaled to the granularity of one of the inputs. Mathematically, this is:

|a - b| <= func(|a|, |b|) * tol

Equivalent to, using f32 as an example:


#![allow(unused)]
fn main() {
fn float_eq_relative(a: f32, b: f32, tol: f32) -> bool {
    // the PartialEq check covers equality of infinities
    a == b || {
        let chosen = func(a.abs(), b.abs());
        (a - b).abs() <= (chosen * tol)
    }
}
}

Where func is one of:

  • r1st: the first input (a)
  • r2nd: the second input (b)
  • rmax: the larger magnitude (aka rel for legacy reasons)
  • rmin: the smaller magnitude

If you are checking for equality versus an expected normal floating point value then you may wish to calculate the tolerance based on that value and so using r1st or r2nd will allow you to select it. If you are generally testing two normal floating point values then rmax is a good general choice. If either number may also be subnormal or close to zero, then you may need to calculate a tolerance based on an intermediate value for an absolute tolerance check instead.

Choice of tol will depend on the tolerances inherent in the specific mathematical function or algorithm you have implemented. Note that a tolerance of n * EPSILON (e.g. f32::EPSILON) will test that two expressions are within n representable values of another. However, you should be aware that the errors inherent in your inputs and calculations are likely to be much greater than the small rounding errors this form would imply.

Units in the Last Place (ULPs) comparison

ulps <= tol

A check to see how far apart two expressions are by comparing the number of representable values between them. This works by interpreting the bitwise representation of the input values as integers and comparing the absolute difference between those. Equivalent to, using f32 as an example:


#![allow(unused)]
fn main() {
fn float_eq_ulps(a: f32, b: f32, tol: u32) -> bool {
    if a.is_nan() || b.is_nan() {
        false // NaNs are never equal
    } else if a.is_sign_positive() != b.is_sign_positive() {
        a == b // values of different signs are only equal if both are zero.
    } else {
        let a_bits = a.to_bits();
        let b_bits = b.to_bits();
        let max = a_bits.max(b_bits);
        let min = a_bits.min(b_bits);
        (max - min) <= tol
    }
}
}

Thanks to a deliberate quirk in the way the underlying format of IEEE floats was designed, this is a measure of how near two values are that scales with their relative granularity. Note that tol is an unsigned integer, so for example ulps <= 4 means "check that a and b are equal to within a distance of four or less representable values".

ULPs comparisons are very similar to relative tolerance checks, and as such are useful for testing equality of normal floats but not for comparisons with zero or infinity. Additionally, because floats use their most significant bit to indicate their sign, ULPs comparisons are not valid for comparing values with different signs.

API documentation

The latest API documentation is available here or at docs.rs.

About this documentation

This book's structure was inspired by the Diátaxis Framework.