Introduction
Floating point types have a reputation for being difficult to use compared to integer types. This is partly because they have unintuitive sources of rounding error but it also stems from the kinds of calculations they are used for, which contain plentiful other sources of numeric error.
The float_eq crate provides an API for comparing floating point primitives, structs and collections for equality that allows users to communicate their reasoning and intent with respect to numeric error.
This book is not designed to be read from cover to cover, but instead is a series of categorised articles. If you are unsure of how floats work or why they are considered difficult to use then begin with the Basic Usage guide, otherwise check out one of the categories for more:
 Tutorials: getting started.
 How to guides: solutions to specific problems.
 Background: explanation and discussion.
 API documentation: detailed technical reference.
Tutorials
Basic usage
So you have attempted to perform a simple floating point calculation in Rust, perhaps an old classic:
#![allow(unused)] fn main() { let sum = 0.1 + 0.2; if sum == 0.3 { println!("Decimal math is working as expected!"); } else { println!("Something has gone horribly wrong!"); } }
And it turns out that:
Something has gone horribly wrong!
What's going on? Let's take a closer look at the result of the sum:
#![allow(unused)] fn main() { println!("0.1 + 0.2 = {}", sum); }
0.1 + 0.2 = 0.30000000000000004
That doesn't seem like the right answer at all! It's very close, but why is it off by a tiny amount?
Well, what's happened is that the f64
type being used to calculate sum
has a
binary numeric representation, and our inputs are specified as decimal
numbers. Neither 0.1 nor 0.2 in decimal can be represented exactly in binary
and so they have been rounded to the nearest values which can be. The addition
is performed using binary arithmetic and finally is converted back into a
decimal representation to be printed.
We can see how these values have been rounded by printing them with a high enough precision:
#![allow(unused)] fn main() { println!("0.1 > {:.55}", 0.1); println!("0.2 > {:.55}", 0.2); }
0.1 > 0.1000000000000000055511151231257827021181583404541015625
0.2 > 0.2000000000000000111022302462515654042363166809082031250
Close, so very close
Very few decimal numbers may be exactly represented in binary. Only powers of two:
#![allow(unused)] fn main() { println!("4.0 > {:.55}", 4.0); println!("2.0 > {:.55}", 2.0); println!("1.0 > {:.55}", 1.0); println!("0.5 > {:.55}", 0.5); println!("0.25 > {:.55}", 0.25); println!("0.125 > {:.55}", 0.125); }
4.0 > 4.0000000000000000000000000000000000000000000000000000000
2.0 > 2.0000000000000000000000000000000000000000000000000000000
1.0 > 1.0000000000000000000000000000000000000000000000000000000
0.5 > 0.5000000000000000000000000000000000000000000000000000000
0.25 > 0.2500000000000000000000000000000000000000000000000000000
0.125 > 0.1250000000000000000000000000000000000000000000000000000
And sums of powers of two:
#![allow(unused)] fn main() { println!("{:.55}", 0.5 + 0.25); println!("{:.55}", 4.0 + 0.5 + 0.125); }
0.7500000000000000000000000000000000000000000000000000000
4.6250000000000000000000000000000000000000000000000000000
Where:
 4.000 = 2^{2}
 2.000 = 2^{1}
 1.000 = 2^{0}
 0.500 = 2^{1}
 0.250 = 2^{2}
 0.125 = 2^{3}
Most decimal numbers are not sums of powers of two^{1}, so every time we
convert a decimal into floating point there is a high chance that it will be
rounded. The difference between the input value and the converted f64
floating
point value is known as the initial roundoff error.
Accumulating errors
Conversion from decimal to binary is not the only source of error in floating
point arithmetic. If the only problem was the initial roundoff error, then we
could reasonably expect the sum 0.1 + 0.2 to be computed exactly and be equal
to the converted constant 0.3 when we test it with ==
.
It is not:
#![allow(unused)] fn main() { let sum = 0.1 + 0.2; println!("sum > {:.55}", sum); println!("0.3 > {:.55}", 0.3); }
sum > 0.3000000000000000444089209850062616169452667236328125000
0.3 > 0.2999999999999999888977697537484345957636833190917968750
Why aren't they the same? Well, sum
has been rounded up to the nearest
representable f64
number, and 0.3 has been rounded down.
Let's think a bit more about the original inputs. They have both been rounded up compared to their decimal values:
0.1 > 0.1000000000000000055511151231257827021181583404541015625
0.2 > 0.2000000000000000111022302462515654042363166809082031250
If we add these rounded values together by hand, we get a number that is just slightly above 0.3:
0.1000000000000000055511151231257827021181583404541015625
+ 0.2000000000000000111022302462515654042363166809082031250
= 0.3000000000000000166533453693773481063544750213623046775
The results of operations on floating point values are rounded to the nearest
representable value^{2}. It just so happens that in this example, the sum
of the
two rounded values is high enough to be rounded up, whereas 0.3 is low enough to
be rounded down. This is why checking they are equal with ==
returns false.
Nearly every operation on floating point numbers can result in further rounding, amplifying the effect of previous rounding. You may be able to mitigate this somewhat by reordering operations to reduce the magnitude, but it is impossible to avoid error entirely.
Distant relatives
Wait a minute though. If our two input two values are exactly representable as
binary numbers then why was the result of adding them together rounded at all?
Shouldn't sum
be the same as the answer we calculated by hand? It is somewhat
higher than we expected:
by hand: 0.3000000000000000166533453693773481063544750213623046775
by f64: 0.3000000000000000444089209850062616169452667236328125000
They are different because of the fundamental design of floating point numbers. The big advantage of floating point types, and the reason that we may be happy to deal with the small errors inherent in using them, is that they can represent a literally astronomically larger range of absolute values than integer types can in the same number of bits.
The trade off is that the granularity of floating point numbers changes with their magnitude. As floating point numbers get larger the absolute distance between adjacent values grows, and as they get smaller it shrinks, whereas the relative distance between values remains constant. This is in contrast to integer types where, regardless of magnitude, adjacent values are always the same absolute distance away from each another (one), but their relative distance changes.
Let's make this more concrete by comparing i32
and f32
. Each can represent
4,294,967,296 different values using their 32 bits, but they make very different
choices as to what those values are.
The positive range of i32
can represent:
#![allow(unused)] fn main() { 0 1 ... 2147483646 2147483647 // i32::MAX }
The positive range of f32
can represent:
#![allow(unused)] fn main() { 0.0 ... 0.000000000000000000000000000000000000011754944 // f32::MIN_POSITIVE 0.000000000000000000000000000000000000011754945 ... 340282330000000000000000000000000000000.0 340282350000000000000000000000000000000.0 // f32::MAX ∞ // f32::INFINITY }
As mentioned above, the absolute distance between every adjacent pair of i32
numbers within its range is always one, but their relative distance changes
depending on their magnitude:
0 > 1:
absolute: 1  0 = 1
relative: 1 / 0 = ∞
1 > 2:
absolute: 2  1 = 1
relative: 2 / 1 = 2
...
2147483646 > 2147483647:
absolute: 2147483647  2147483646 = 1
relative: 2147483647 / 2147483646 = 1.0000000004656612873077392578125
The f32
number line is a bit more complex. Zero and infinity are special
values at the extremes and the tiny range of subnormal values just above zero
act differently but the vast majority of f32
values are normal floating point
numbers in the range from f32::MIN_POSITIVE
to f32::MAX
.
The absolute distance between each pair of adjacent normal f32
values varies
depending on their size but their relative distance is always very close to
1.0000001. We can illustrate this with some help from the ieee754 crate:
#![allow(unused)] fn main() { use ieee754::Ieee754; let a = f32::MIN_POSITIVE; let b = a.next(); println!("{} >\n{}:", a, b); println!("absolute: {}", b  a); println!("relative: {}\n", b / a); let c = f32::MIN_POSITIVE.next(); let d = c.next(); println!("{} >\n{}:", c, d); println!("absolute: {}", d  c); println!("relative: {}\n", d / c); println!("...\n"); let e = f32::MAX.prev(); let f = f32::MAX; println!("{} >\n{}:", e, f); println!("absolute: {}", f  e); println!("relative: {}", f / e); }
0.000000000000000000000000000000000000011754944 >
0.000000000000000000000000000000000000011754945:
absolute: 0.000000000000000000000000000000000000000000001
relative: 1.0000001
0.000000000000000000000000000000000000011754945 >
0.000000000000000000000000000000000000011754946:
absolute: 0.000000000000000000000000000000000000000000001
relative: 1.0000001
...
340282330000000000000000000000000000000 >
340282350000000000000000000000000000000:
absolute: 20282410000000000000000000000000
relative: 1.0000001
Machine epsilon
The absolute distance between adjacent numbers for a floating point type is
determined by multiples of that type's machine epsilon. This is the distance
between adjacent values in the range 1.0 to 2.0. For f64
this is the constant
f64::EPSILON
. We can scale it for every other range of powers of two to
determine the absolute distance between numbers in those ranges:
 0.25 to 0.5 contains 8,388,608 values, all a distance of
0.25 * f64::EPSILON
apart.  0.5 to 1.0 contains 8,388,608 values, all a distance of
0.5 * f64::EPSILON
apart.  1.0 to 2.0 contains 8,388,608 values, all a distance of
f64::EPSILON
apart.  2.0 to 4.0 contains 8,388,608 values, all a distance of
2.0 * f64::EPSILON
apart.  4.0 to 8.0 contains 8,388,608 values, all a distance of
4.0 * f64::EPSILON
apart.
This is why our addition was unexpectedly rounded up:
0.1
is in the range 0.0625 to 0.125, which are all0.0625 f64::EPSILON
apart.0.2
is in the range 0.125 to 0.25, which are all0.125 * f64::EPSILON
apart.0.1 + 0.2
is in the range 0.25 to 0.5, which are all0.25 * f64::EPSILON
apart.
The input values 0.1 and 0.2 were both rounded up to the nearest values in their respective ranges. The result of adding them together is within a range with a lower granularity and was not exactly representable, so it needed to be rounded to the nearest value that is.
It gets much worse
But that's not the end of the story.
There are far more significant sources of error in most numerical calculations than the roundoff errors we have described so far. If the inputs are real world values, then there is almost certainly some measurement error in how they were collected. Algorithms like physics simulations that use discrete steps to approximate continuous functions introduce truncation error. Even the underyling mathematics may amplify existing error if it is numerically unstable, for example when dividing large numbers by much smaller ones or if values undergo catastrophic cancellation.
I'll say that again, because it's so important: in most real world programs, floating point roundoff error is so small as to be insignificant compared to other sources of error. Floating point numerics may be inappropriate for values that have strict absolute precision requirements, such as currency, but in general there is no need to shy away from using them because of their rounding behaviour.
Every numerical algorithm is unique and there is no one size fits all solution or set of defaults to account for the error inherent in them and hide it away. In fact, there is an entire field of active mathematical research concerned with computational error, Numerical Analysis. What we can do, however, is learn to reason about these sources of error and provide tools for taking it into account and communicating our thoughts to future readers and maintainers.
Close enough
Bearing that in mind, let's return to our original comparison. Now that we know
why the exact ==
comparison failed, we can instead check that the difference
between the expected and actual values lies within a margin of error, known as
the tolerance. This is what the float_eq!
macro is for.
Absolute tolerance comparison
The simplest algorithm to check if two floating point numbers are equal is an
absolute tolerance comparison. This tests that the absolute difference between
two values lies within a specified tolerance and is invoked with the syntax
abs <= tol
. We have calculated that in our very simple example the values may
differ by at most 0.25 * f64::EPSILON
:
#![allow(unused)] fn main() { use float_eq::float_eq; let sum = 0.1 + 0.2; if float_eq!(sum, 0.3, abs <= 0.25 * f64::EPSILON) { println!("Floating point math is working as expected!"); } else { println!("Something has gone horribly wrong!"); } }
Floating point math is working as expected!
Relative tolerance comparison
Hurray! However, manually scaling the tolerance to the range of the operands is
not very elegant. Fortunately, given that we know that we are comparing two
normal numbers, we can use a relative tolerance comparison to scale the
tolerance for us. The second operand is our expected value, so we may choose to
use r2nd <= tol
. With a relative tolerance comparison, the tolerance should be
specified as if we were testing a value in the range 1.0 to 2.0, so
f64::EPSILON
indicates we are expecting our operands to be no more than one
representable value apart from each other^{3}:
#![allow(unused)] fn main() { use float_eq::float_eq; let sum = 0.1 + 0.2; if float_eq!(sum, 0.3, r2nd <= f64::EPSILON) { println!("Floating point math is working as expected!"); } else { println!("Something has gone horribly wrong!"); } }
Floating point math is working as expected!
ULPs based comparison
If both numbers are normal and the same sign, which is often the case, we can
use an ULPs comparison, another form of relative check. This uses a property
of the underlying representation of floating point numbers which means that
when we interpret their bits as unsigned integers, the adjacent floats are the
adjacent integer values above and below. By using ulps <= tol
to invoke one of
these checks, the tolerance is the maximum number of representable values apart
they may be regardless of their magnitude. In our example, we know they may be
at most one representable f64
value apart:
#![allow(unused)] fn main() { use float_eq::float_eq; let sum = 0.1 + 0.2; if float_eq!(sum, 0.3, ulps <= 1) { println!("Floating point math is working as expected!"); } else { println!("Something has gone horribly wrong!"); } }
Floating point math is working as expected!
Asserting ourselves
The float_eq
library also includes assert_float_eq!
to accompany the
boolean float_eq!
operator. To illustrate their use and show some more
advanced comparison techniques, we will define a very simple numerical
integrator. This function takes an initial value and step size, then advances
the value by an arbitrary number of steps:
#![allow(unused)] fn main() { fn integrate(initial: f64, step: f64, count: usize) > f64 { let mut sum = initial; for _ in 0..count { sum += step; } sum } }
Say we want to unit test a number of different sets of input to this algorithm, we might build ourselves a wrapper for it:
#![allow(unused)] fn main() { fn test_integrate(initial: f64, step: f64, count: usize, expected: f64) { let actual = integrate(initial, step, count); assert_float_eq!(actual, expected, r2nd <= f64::EPSILON); } }
Note that the syntax for assert, r2nd <= tol
is the same as for the boolean
form, and that you may use any of the same algorithms if you wish. Arbitrarily,
we have begun with the same tolerance as for our first example comparison. This
means that the equivalent of our first simple comparison will pass this test
just fine:
#![allow(unused)] fn main() { test_integrate(0.2, 0.1, 1, 0.3); }
Drifting away
Let's add some tests which use an increasing step size. Our first two are within the existing expected tolerance:
#![allow(unused)] fn main() { test_integrate(0.0, 0.1, 1, 0.1); test_integrate(0.0, 0.1, 10, 1.0); }
But when we look at 100 steps, we find that our test fails:
#![allow(unused)] fn main() { test_integrate(0.0, 0.1, 100, 10.0); }
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, r2nd <= t)`
left: `9.99999999999998`,
right: `10.0`,
abs_diff: `0.000000000000019539925233402755`,
ulps_diff: `Some(11)`,
[r2nd] t: `0.000000000000002220446049250313`', src\main.rs:15:9
This assert form prints out the values of the operands, like the standard Rust asserts, but it also provides additional context information to help us make sense of why it failed. Here, t is the tolerance:
abs_diff
is the absolute difference between the two values:(left  right).abs()
.ulps_diff
is the difference in ULPs between the two values, the count of representable values they are apart. It may beNone
if two numbers have different signs.[r2nd] t
is the tolerance as scaled to the range ofright
(the second operand).
We can see that our actual and expected values are eleven ULPs from one another.
That means that our tolerance of f64::EPSILON
, equivalent to one ULP, is
inadequate. This is because we are performing count
number of additions, and
each one of those provides an answer accurate to within 0.5 ULPs, so they have
accumulated more error than a single step or ten steps would. We might reason
that our tolerance should therefore be some function of the number of steps,
perhaps (count * 0.5) * f64::EPSILON
:
#![allow(unused)] fn main() { fn test_integrate(initial: f64, step: f64, count: usize, expected: f64) { let actual = integrate(initial, step, count); assert_float_eq!( actual, expected, r2nd <= ((count as f64) * 0.5) * f64::EPSILON ); } }
Absolute zero
Let's take a look at another failure case, when we count backwards to zero:
#![allow(unused)] fn main() { test_integrate(10.0, 0.1, 100, 0.0); }
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, r2nd <= t)`
left: `0.000000000000018790524691780774`,
right: `0.0`,
abs_diff: `0.000000000000018790524691780774`,
ulps_diff: `Some(4401468191289638912)`,
[r2nd] t: `0.0`', src\main.rs:15:9
Wow! There are a couple of things to note about this failure. The first is that because we are comparing versus zero, scaling our tolerance doesn't work  zero times anything is zero, so the tolerance does not take into account the step count. Even more noticable though is just how far away our actual value is in terms of representable numbers.
There are two reasons why this has happened. The first is that, as mentioned above, zero is a special number and does not have the same properties as normal floating point numbers. The second is that our final subtraction leaves us with a result many orders of magnitude smaller than the previous sum total, resulting in a catastrophic cancellation.
The solution here is to use a more sophisticated calculation for our tolerance, one that takes into account the nature of the calculation itself. It needs to be an absolute test, since we may be comparing versus zero. It should also scale relative to the largest intermediate value in the calculation and take into account the potential rounding errors from our repeated addition:
#![allow(unused)] fn main() { fn test_integrate(initial: f64, step: f64, count: usize, expected: f64) { let actual = integrate(initial, step, count); let half_count = (count as f64) * 0.5; let tol = f64::EPSILON * half_count * f64::max(initial.abs(), actual.abs()); assert_float_eq!(actual, expected, abs <= tol); } }
This further illustates that there is no one right or general way to express the tolerances of numeric algorithms. Every comparison will be based on the specific calculation being performed and frequently the particular inputs.
Custom messages
The assert macros may include a custom message in the same manner as the standard Rust asserts:
#![allow(unused)] fn main() { assert_float_eq!( actual, expected, r2nd <= f64::EPSILON, "\nWhere: initial: {}, step: {}, count: {}", initial, step, count ); }
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, r2nd <= t)`
left: `9.99999999999998`,
right: `10.0`,
abs_diff: `0.000000000000019539925233402755`,
ulps_diff: `Some(11)`,
[r2nd] t: `0.000000000000002220446049250313`:
Where: initial: 0, step: 0.1, count: 100', src\main.rs:14:9
Staying afloat
Hopefully that's given you some flavour of the issues that crop up when
implementing numerical methods and how float_eq
may aid you when they do. At
this point you may be interested in learning how to perform some specific
tasks, reading some more general background explanations or browsing the API
documentation.
They are in fact all exact sums of multiples of powers of ten.
By default. Other rounding modes such as always rounding up, down or toward zero are available.
In general, n * fXX::EPSILON
as a relative tolerance means "at
most n representable values apart", for example you might use a tolerance of
4.0 * f64::EPSILON
.
How to
 Compare floating point numbers
 Compare composite types
 Interpret assert failure messages
 Compare custom types
How to compare floating point numbers

Determine which comparison algorithm best suits your purposes.

Determine the tolerance required based on the sources of the error in how the numbers were entered/calculated.

If you need a boolean result, then use
float_eq!
orfloat_ne!
. The two numbers should be the first and second operands, and then the tolerance is the value after the<=
. For example, to compare twof32
numbers stored ina
andb
using a relative tolerance oftol
scaled to the magnitude of the second operand:
#![allow(unused)] fn main() { float_eq!(a, b, r2nd <= tol) }
 If instead you wish to assert that the two numbers are equal and panic if
they are not, you can use
assert_float_eq!
orassert_float_ne!
. The syntax is the same, and may optionally include formatted text after the comparisons as with standard Rust asserts:
#![allow(unused)] fn main() { assert_float_eq!(a, b, r2nd <= tol); assert_float_eq!(a, b, r2nd <= tol, "Example context: {}", context); }
How to compare composite types
 Composite types such as arrays, tuples and structs may be compared by specifying a perfield tolerance in an instance of the same type:
#![allow(unused)] fn main() { let a = [1.0, 2.0, 3.0]; let b = [1.0, 2.0, 3.5]; assert_float_eq!(a, b, abs <= [2.0, 4.0, 0.5]); let c = Complex32 { re: 2.0, im: 4.000_002 }; let d = Complex32 { re: 2.000_000_5, im: 4.0 }; assert_float_eq!(c, d, rmax <= Complex32 { re: 0.000_000_25, im: 0.000_000_5 }); assert_float_eq!(c, d, ulps <= Complex32Ulps { re: 2, im: 4 }); }
 Homogeneous types may also support the
_all
variants of the checks, which allow you to specify a single tolerance to use across all fields:
#![allow(unused)] fn main() { assert_float_eq!(a, b, abs_all <= 4.0); assert_float_eq!(c, d, rmax_all <= 0.000_000_5); assert_float_eq!(c, d, ulps_all <= 4); }
 Checks may be extended over new types by implementing the extension traits.
How to interpret assert failure messages
Assertion failure messages provide context information that hopefully helps in determining how a check failed. For example, this line:
#![allow(unused)] fn main() { assert_float_eq!(4.0f32, 4.000_008, rmax <= 0.000_001); }
Panics with this error message:
thread 'main' panicked at 'assertion failed: `float_eq!(left, right, rmax <= t)`
left: `4.0`,
right: `4.000008`,
abs_diff: `0.000008106232`,
ulps_diff: `Some(17)`,
[rmax] t: `0.000004000008`', assert_failure.rs:15:5
Where:
 rmax <= t  indicates the type of comparison which was carried out.
 left  the value of the first operand.
 right  the value of the second operand.
 abs_diff  the absolute difference between
left
andright
.  ulps_diff  the difference between
left
andright
in ULPs. If it is None, that is because they have different signs or at least one isNaN
.  [rmax] t  the tolerance used in the comparison against the relevant
difference, here
abs_diff
, after it has been scaled relative to an operand, in this casemax(left, right)
since it isrmax
.
How to compare custom types
To extend float_eq
functionality over a new type, you should implement the
relevant traits:

float_eq! and float_ne! require FloatEqUlpsTol and FloatEq.

If your type is homogeneous, that is if it consists of fields that are all the same underlying floating point type, you should implement the optional FloatEqAll to enable the
_all
comparison algorithms. 
assert_float_eq! and assert_float_ne! require the same traits plus FloatEqDebugUlpsDiff and AssertFloatEq. If you have implemented FloatEqAll you should also implement AssertFloatEqAll.
If your type is a nongeneric struct or tuple struct that consists entirely of
already supported fields, then the easiest way to implement these traits is to
make use of the #[derive_float_eq]
helper macro. It is also possible to
#[derive]
individual traits. If you cannot derive an implementation, then you
will need to implement the traits directly.
#[derive_float_eq]
If your type is a nongeneric struct or tuple struct then you may derive the relevant traits using this helper macro. Enable the "derive" feature by adding this to your Cargo.toml:
[dependencies.float_eq]
version = "0.6"
features = ["derive"]
Add #[derive_float_eq]
to your
type. The ulps_tol
and debug_ulps_diff
parameters are required. They are
used to name two new types that match the structure of the type being derived
from. The first is used to provide ULPs tolerance values per field, and the
second is used to provide debug information for the differerence between values
in ULPs.
The all_tol
parameter is optional, and ought to be provided if your type
is homogeneous and consists of fields that are all the same underlying floating
point type. If provided, it will additionally implement the traits required to
use the _all
variants of checks, using the given tolerance type (usually f32
or f64
).
#![allow(unused)] fn main() { #[derive_float_eq( ulps_tol = "PointUlps", debug_ulps_diff = "PointDebugUlpsDiff", all_tol = "f64" )] #[derive(Debug, PartialEq, Clone, Copy)] struct Point { x: f64, y: f64, } }
This will generate the following two types in addition to implementing the
relevant extension traits. These new types will derive Debug
, Clone
, Copy
and PartialEq
:
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] struct PointUlps { x: UlpsTol<f64>, y: UlpsTol<f64>, } #[derive(Debug, Clone, Copy, PartialEq)] struct PointDebugUlpsDiff { x: DebugUlpsDiff<f64>, y: DebugUlpsDiff<f64>, } }
This enables the use of your type in the float_eq
macros:
#![allow(unused)] fn main() { let a = Point { x: 1.0, y: 2.0 }; let b = Point { x: 1.1, y: 2.2 }; assert_float_eq!(a, b, abs <= Point { x: 0.15, y: 0.25 }); assert_float_eq!(a, b, abs_all <= 0.25); let c = Point { x: 1.000_000_000_000_000_9, y: 2.000_000_000_000_001_3 }; let eps = f64::EPSILON; assert_float_eq!(a, c, rmax <= Point { x: 4.0 * eps, y: 5.0 * eps }); assert_float_eq!(a, c, rmax_all <= 5.0 * eps); assert_float_eq!(a, c, ulps <= PointUlps { x: 4, y: 3 }); assert_float_eq!(a, c, ulps_all <= 4); }
Manually #[derive] traits
If you cannot use #[derive_float_eq]
then you may be able to derive individual
traits manually. Enable the "derive" feature by adding this to your Cargo.toml:
[dependencies.float_eq]
version = "0.6"
features = ["derive"]
#[derive(FloatEqUlpsTol)]
Add a #[float_eq]
attribute and provide ulps_tol
, which will be used as
the name of a new type. This type will be structurally identical to the type
being derived, using the same visibility as the parent type and with identically
named fields that use the derived fields' types wrapped by UlpsTol
. The
new struct derives Debug
, Clone
, Copy
and PartialEq
.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, FloatEqUlpsTol)] #[float_eq(ulps_tol = "PointUlps")] struct Point { x: f64, y: f64, } }
This will generate the following struct:
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] struct PointUlps { x: UlpsTol<f64>, y: UlpsTol<f64>, } }
#[derive(FloatEq)]
Requires FloatEqUlpsTol. Add a #[float_eq]
attribute and provide
ulps_tol
, which should match the name of the FloatEqUlpsTol
type. Two
instances are equal if all fields are equal, and not equal if any are not.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, FloatEqUlpsTol, FloatEq)] #[float_eq(ulps_tol = "PointUlps")] struct Point { x: f64, y: f64, } }
#[derive(FloatEqAll)]
Add a #[float_eq]
attribute and specify all_tol
, which is the type to be
used as AllTol, usually f32
or f64
. Two instances are equal if all
fields are equal, and not equal if any are not.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, FloatEqAll)] #[float_eq(ulps_tol = "PointUlps", all_tol = "f64")] struct Point { x: f64, y: f64, } }
#[derive(FloatEqDebugUlpsDiff)]
Add a #[float_eq]
attribute and provide debug_ulps_diff
, which will be used
as the name of a new type. This type will be structurally identical to the type
being derived, using the same visibility as the parent type and with identically
named fields that use the derived fields' types wrapped by DebugUlpsDiff
. The
new struct derives Debug
, Clone
, Copy
and PartialEq
.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq, FloatEqDebugUlpsDiff)] #[float_eq(debug_ulps_diff = "PointDebugUlpsDiff")] struct Point { x: f64, y: f64, } }
This will generate the following struct:
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] struct PointDebugUlpsDiff { x: DebugUlpsDiff<f64>, y: DebugUlpsDiff<f64>, } }
#derive[(AssertFloatEq)]
Requires FloatEqUlpsTol, FloatEq and FloatEqDebugUlpsDiff. Add a
#[float_eq]
attribute and provide ulps_tol
and ulps_debug_diff
, which
should match the name of the UlpsTol
and DebugUlpsDiff
types. Each
field's tolerance is calculated via a recursive call to the algorithm being used.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] #[derive(FloatEqUlpsTol, FloatEq, FloatEqDebugUlpsDiff, AssertFloatEq)] #[float_eq(ulps_tol = "PointUlps", debug_ulps_diff = "PointDebugUlpsDiff")] struct Point { x: f64, y: f64, } }
#[derive(AssertFloatEqAll)]
Requires FloatEqUlpsTol, FloatEq, FloatEqAll, FloatEqDebugUlpsDiff
and AssertFloatEq. Add a #[float_eq]
attribute and provide ulps_tol
,
ulps_debug_diff
, and all_tol
, which should match the names of the
UlpsTol
, DebugUlpsDiff
and AllTol
types. Each field's tolerance is
calculated via a recursive call to the algorithm being used.
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] #[derive( FloatEqUlpsTol, FloatEq, FloatEqAll, FloatEqDebugUlpsDiff, AssertFloatEq, AssertFloatEqAll )] #[float_eq( ulps_tol = "PointUlps", debug_ulps_diff = "PointUlpsDebugUlpsDiff", all_tol = "f64", )] struct Point { x: f64, y: f64, } }
Implementing the traits directly
If you cannot derive the implementations of the extension traits, then you may
implement them manually. These implementations will be based on the same Point
type as the derive examples:
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] struct Point { x: f64, y: f64, } }
Implementing FloatEqUlpsTol
Types should provide an UlpsTol representation for each of their fields:
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] struct PointUlps { x: UlpsTol<f64>, y: UlpsTol<f64>, } impl FloatEqUlpsTol for Point { type UlpsTol = PointUlps; } }
Implementing FloatEq
Requires FloatEqUlpsTol. Implementation is then usually a matter of calling through to an underlying FloatEq method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:
#![allow(unused)] fn main() { impl FloatEq for Point { type Tol = Point; fn eq_abs(&self, other: &Self, tol: &Point) > bool { self.x.eq_abs(&other.x, &tol.x) && self.y.eq_abs(&other.y, &tol.y) } fn eq_rmax(&self, other: &Self, tol: &Point) > bool { self.x.eq_rmax(&other.x, &tol.x) && self.y.eq_rmax(&other.y, &tol.y) } fn eq_rmin(&self, other: &Self, tol: &Point) > bool { self.x.eq_rmin(&other.x, &tol.x) && self.y.eq_rmin(&other.y, &tol.y) } fn eq_r1st(&self, other: &Self, tol: &Point) > bool { self.x.eq_r1st(&other.x, &tol.x) && self.y.eq_r1st(&other.y, &tol.y) } fn eq_r2nd(&self, other: &Self, tol: &Point) > bool { self.x.eq_r2nd(&other.x, &tol.x) && self.y.eq_r2nd(&other.y, &tol.y) } fn eq_ulps(&self, other: &Self, tol: &UlpsTol<Point>) > bool { self.x.eq_ulps(&other.x, &tol.x) && self.y.eq_ulps(&other.y, &tol.y) } } }
Implementing FloatEqAll
Select a tolerance type to compare recursively with each field in your type,
likely f32
or f64
. Implementation is then usually a matter of calling
through to an underlying FloatEqAll method for each field in turn. If not, you
will need to take a close look at the descriptions of the algorithms on a method
by method basis:
#![allow(unused)] fn main() { impl FloatEqAll for Point { type AllTol = f64; fn eq_abs_all(&self, other: &Self, tol: &f64) > bool { self.x.eq_abs_all(&other.x, tol) && self.y.eq_abs_all(&other.y, tol) } fn eq_rmax_all(&self, other: &Self, tol: &f64) > bool { self.x.eq_rmax_all(&other.x, tol) && self.y.eq_rmax_all(&other.y, tol) } fn eq_rmin_all(&self, other: &Self, tol: &f64) > bool { self.x.eq_rmin_all(&other.x, tol) && self.y.eq_rmin_all(&other.y, tol) } fn eq_r1st_all(&self, other: &Self, tol: &f64) > bool { self.x.eq_r1st_all(&other.x, tol) && self.y.eq_r1st_all(&other.y, tol) } fn eq_r2nd_all(&self, other: &Self, tol: &f64) > bool { self.x.eq_r2nd_all(&other.x, tol) && self.y.eq_r2nd_all(&other.y, tol) } fn eq_ulps_all(&self, other: &Self, tol: &UlpsTol<f64>) > bool { self.x.eq_ulps_all(&other.x, tol) && self.y.eq_ulps_all(&other.y, tol) } } }
Implementing FloatEqDebugUlpsDiff
Types should provide a DebugUlpsDiff representation for each of their fields:
#![allow(unused)] fn main() { #[derive(Debug, Clone, Copy, PartialEq)] struct PointDebugUlpsDiff { x: DebugUlpsDiff<f64>, y: DebugUlpsDiff<f64>, } impl FloatEqDebugUlpsDiff for Point { type DebugUlpsDiff = PointDebugUlpsDiff; } }
Implementing AssertFloatEq
Requires FloatEqUlpsTol, FloatEq and FloatEqDebugUlpsDiff. Implementation is then usually a matter of simply calling through to an underlying AssertFloatEq method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:
#![allow(unused)] fn main() { impl AssertFloatEq for Point { type DebugAbsDiff = Self; type DebugTol = Self; fn debug_abs_diff(&self, other: &Self) > Point { Point { x: self.x.debug_abs_diff(&other.x), y: self.y.debug_abs_diff(&other.y), } } fn debug_ulps_diff(&self, other: &Self) > PointDebugUlpsDiff { PointDebugUlpsDiff { x: self.x.debug_ulps_diff(&other.x), y: self.y.debug_ulps_diff(&other.y), } } fn debug_abs_tol( &self, other: &Self, tol: &Point ) > Point { Point { x: self.x.debug_abs_tol(&other.x, &tol.x), y: self.y.debug_abs_tol(&other.y, &tol.y), } } fn debug_rmax_tol( &self, other: &Self, tol: &Point ) > Point { Point { x: self.x.debug_rmax_tol(&other.x, &tol.x), y: self.y.debug_rmax_tol(&other.y, &tol.y), } } fn debug_rmin_tol( &self, other: &Self, tol: &Point ) > Point { Point { x: self.x.debug_rmin_tol(&other.x, &tol.x), y: self.y.debug_rmin_tol(&other.y, &tol.y), } } fn debug_r1st_tol( &self, other: &Self, tol: &Point ) > Point { Point { x: self.x.debug_r1st_tol(&other.x, &tol.x), y: self.y.debug_r1st_tol(&other.y, &tol.y), } } fn debug_r2nd_tol( &self, other: &Self, tol: &Point ) > Point { Point { x: self.x.debug_r2nd_tol(&other.x, &tol.x), y: self.y.debug_r2nd_tol(&other.y, &tol.y), } } fn debug_ulps_tol( &self, other: &Self, tol: &PointUlps, ) > PointUlps { PointUlps { x: self.x.debug_ulps_tol(&other.x, &tol.x), y: self.y.debug_ulps_tol(&other.y, &tol.y), } } } }
Implementing AssertFloatEqAll
Requires FloatEqUlpsTol, FloatEq, FloatEqAll, FloatEqDebugUlpsDiff and AssertFloatEq. Implementation is then usually a matter of simply calling through to an underlying AssertFloatEqAll method for each field in turn. If not, you will need to take a close look at the descriptions of the algorithms on a method by method basis:
#![allow(unused)] fn main() { impl AssertFloatEqAll for Point { type AllDebugTol = Self; fn debug_abs_all_tol( &self, other: &Self, tol: &Self::AllTol ) > Self::AllDebugTol { Point { x: self.x.debug_abs_all_tol(&other.x, tol), y: self.y.debug_abs_all_tol(&other.y, tol), } } fn debug_rmax_all_tol( &self, other: &Self, tol: &Self::AllTol ) > Self::AllDebugTol { Point { x: self.x.debug_rmax_all_tol(&other.x, tol), y: self.y.debug_rmax_all_tol(&other.y, tol), } } fn debug_rmin_all_tol( &self, other: &Self, tol: &Self::AllTol ) > Self::AllDebugTol { Point { x: self.x.debug_rmin_all_tol(&other.x, tol), y: self.y.debug_rmin_all_tol(&other.y, tol), } } fn debug_r1st_all_tol( &self, other: &Self, tol: &Self::AllTol ) > Self::AllDebugTol { Point { x: self.x.debug_r1st_all_tol(&other.x, tol), y: self.y.debug_r1st_all_tol(&other.y, tol), } } fn debug_r2nd_all_tol( &self, other: &Self, tol: &Self::AllTol ) > Self::AllDebugTol { Point { x: self.x.debug_r2nd_all_tol(&other.x, tol), y: self.y.debug_r2nd_all_tol(&other.y, tol), } } fn debug_ulps_all_tol( &self, other: &Self, tol: &UlpsTol<Self::AllTol>, ) > UlpsTol<Self::AllDebugTol> { PointUlps { x: self.x.debug_ulps_all_tol(&other.x, tol), y: self.y.debug_ulps_all_tol(&other.y, tol), } } } }
Background
Floating point comparison algorithms
Descriptions of the underlying comparison algorithms used by float_eq.
Absolute tolerance comparison
abs <= tol
A check to see how far apart two expressions are by comparing the absolute difference between them to an absolute tolerance. Mathematically, this is:
a  b <= tol
Equivalent to, using f32
as an example:
#![allow(unused)] fn main() { fn float_eq_abs(a: f32, b: f32, tol: f32) > bool { // the PartialEq check covers equality of infinities a == b  (a  b).abs() <= tol } }
This is the simplest method of testing the equality of two floats and may be
sufficient if you know the absolute margin of error for your calculation given
the values being tested. However, absolute tolerance tests do not work well for
general comparison of floating point numbers, because they do not take into
account that normal values' granularity changes with their magnitude. Thus any
given choice of tol
is likely to work for one specific exponent's range and
poorly outside of it.
In some circumstances absolute tolerance comparisons are required. If you wish to compare against zero, an infinity, or subnormal values then the assumptions that relative tolerance or ULPs based checks make about how neighbouring values are related to one another break down. Similarly, if the underlying mathematics of your algorithm is numerically unstable, for example if it is prone to catastrophic cancellation, then you may find that you need to reach for an absolute tolerance comparison.
Relative tolerance comparison
r1st <= tol
r2nd <= tol
rmax <= tol
rmin <= tol
A check to see how far apart two expressions are by comparing the absolute difference between them to an tolerance that is scaled to the granularity of one of the inputs. Mathematically, this is:
a  b <= func(a, b) * tol
Equivalent to, using f32
as an example:
#![allow(unused)] fn main() { fn float_eq_relative(a: f32, b: f32, tol: f32) > bool { // the PartialEq check covers equality of infinities a == b  { let chosen = func(a.abs(), b.abs()); (a  b).abs() <= (chosen * tol) } } }
Where func
is one of:
r1st
: the first input (a
)r2nd
: the second input (b
)rmax
: the larger magnitude (akarel
for legacy reasons)rmin
: the smaller magnitude
If you are checking for equality versus an expected normal floating point value
then you may wish to calculate the tolerance based on that value and so using
r1st
or r2nd
will allow you to select it. If you are generally testing two
normal floating point values then rmax
is a good general choice. If either
number may also be subnormal or close to zero, then you may need to calculate a
tolerance based on an intermediate value for an absolute tolerance check
instead.
Choice of tol
will depend on the tolerances inherent in the specific
mathematical function or algorithm you have implemented. Note that a tolerance
of n * EPSILON
(e.g. f32::EPSILON
) will test that two expressions are within
n
representable values of another. However, you should be aware that the
errors inherent in your inputs and calculations are likely to be much greater
than the small rounding errors this form would imply.
Units in the Last Place (ULPs) comparison
ulps <= tol
A check to see how far apart two expressions are by comparing the number of
representable values between them. This works by interpreting the bitwise
representation of the input values as integers and comparing the absolute
difference between those. Equivalent to, using f32
as an example:
#![allow(unused)] fn main() { fn float_eq_ulps(a: f32, b: f32, tol: u32) > bool { if a.is_nan()  b.is_nan() { false // NaNs are never equal } else if a.is_sign_positive() != b.is_sign_positive() { a == b // values of different signs are only equal if both are zero. } else { let a_bits = a.to_bits(); let b_bits = b.to_bits(); let max = a_bits.max(b_bits); let min = a_bits.min(b_bits); (max  min) <= tol } } }
Thanks to a deliberate quirk in the way the underlying format of IEEE floats
was designed, this is a measure of how near two values are that scales with
their relative granularity. Note that tol
is an unsigned integer, so for
example ulps <= 4
means "check that a and b are equal to within a distance of
four or less representable values".
ULPs comparisons are very similar to relative tolerance checks, and as such are useful for testing equality of normal floats but not for comparisons with zero or infinity. Additionally, because floats use their most significant bit to indicate their sign, ULPs comparisons are not valid for comparing values with different signs.
API documentation
The latest API documentation is available here or at docs.rs.
About this documentation
This book's structure was inspired by the Diátaxis Framework.