Rust and numeric computation

jimy-byerley · March 2, 2024, 11:06am

background & motivation

As an engineer working in advanced robotics, I have to write a lot of computation code involving 3D geometry, statistics, computer vision and machine learning, using vectors, matrices, raw buffers, series interpolation, etc. I was working with Python but for code performing computations and complex algorithms, I needed second faster language. Among the languages I knew, I choose Rust and used it during the 2 last years.

Did I found Rust good for it ? I'm writing this report as a feedback of what a scientist programmer can think of Rust. Not about the features and frameworks available for rust for such use (I know they will grow with time), but about the language itself and its capabilities. I did not found a similar synthesis on the internet despite the several places I saw mentioning the following points.

no genericity of size

problem

This problem is mostly affecting code working with multiple dimensions, or with buffers of multiple channels (in fact most of scientific applications involves it). In such code, writing generic types with a generic size is extremely useful to keep readability. And optimizing performances need static sizes. However a lot of operations need to play between dimensions and produce objects with a different dimensionality deduced from the inputs (think of concatenation or homogeneous matrices for instance). The problem is that Rust do not allow deducing const generics expressions.

There is few ways to deal with this problem, but none is satisfying:

crates featuring matrix and vector types are often relying on type-based dimensions and using macros to recursively generate rules of dimension concatenation, like does nalgebra. This is extremely verbose and hard to master for anyone wanting to write code using such generics. This also cannot cover all the possible dimensions combinations so usually stops to dimension increments of 1
some crates restrain to low dimension (dimensions up to 4) and duplicate their operators implementations 4 times, like cgmath .
some crates restrain to dynamic dimension, like does ndarray , making dynamic allocations necessary in most operations, preventing some compiler AVX optimizations, and bringing a memory and performance overhead for small objects.
My personal workaround for this is to always have 2 linear algebra crates in my project: nalgebra for small arrays, and ndarray for large arrays. Unfortunately their API and feature set do not match so this is introducing a lot of conversions and confusion.

solution

I have nothing more to propose to what is currently on the tracks, a way to implement relations between usize generics in rust is already in testing , I unfortunately does not seem to progress fast. C++ actually features it, allowing the existence of user-friendly libraries like eigen

no genericity of numbers

problem

When I'm speaking of numbers I'm speaking of integers and floating points. each one of them is a primitive type implementing almost only similar methods, But there is no trait to generalize this.

there is 10 different ints (and might be more in other crates) but no trait Int.
There is 2 different floats (there might be more in the future in core or in other crates), but no trait Float

This mean any function performing calculations has to be specific to a combination of integers and floats. In practice, because of this:

some crates chose a specific numeric precision: they choose between f32 and f64 and then use the matching signed and unsigned integers. For instance parry3d or collider but also many other are doing so.
some other crates implement 2 or more times the same functions with a different primitive each
some crates are relying on num_traits which is trying to provide traits for numbers, but these traits often have less methods than bare floats and ints. So using them leads to have some functions using only methods from core and some other using only methods from the traits. Bringing confusion.

As it is a third party crate and not the standard, any crate relying on it may be incompatible with an other crate using an other dependency for numeric traits.

proposal

Traits are meant to generalize properties of structs when they can be. I suggest new traits in standard rust. They wouldn't be intended to represent mathematical details of primitives but simply their structural similarities

Int providing their current methods to the primitive integers

implementable by any user type strictly being an integer according to mathematical definition
Float providing their current methods to the primitive floats

implementable by any user type strictly being a floating point number according to mathematical definition
Unsigned providing Int only for unsigned ints
Signed providing Int only for signed ints

On the other hand, the following traits could stay in dedicated crates as they cover more mathematical-specific aspects of the numbers, and do not provide methods redundant to core

One providing a constant 1 (neutral element for Mul)
Zero providing a constant 0 (neutral element for Add)
Number providing Add<Self> + Sub<Self> + Mul<Self> + Div<Self> + One + Zero

Of course programmers should be advised to always prefer using generics rather than dynamic types for numbers so that the compiler could inline the matching processor instructions instead of calling a virtual table.

no genericity of litterals

problem

Assuming we had traits for numbers (like proposed above or proposed by num_traits), then writing code with it leads to the following problem: Literals are not supported for generics

pub fn hermite3<T: Float>(a: (T,T), b: (T,T), x: T) -> T
{
	let x2 = x.powi(2);
	let x3 = x.powi(3);
	
    // none of these line are accepted by the compiler because the literal constants are f32 or f64 but not compatible with T
		(a.0 * (2.*x3 - 3.*x2 + 1.))
	+	(b.0 * (-2.*x3 + 3.*x2))
	+	(a.1 * (x3 - 2.*x2 + x))
	+	(b.1 * (x3 - x2))
}

The problem here is that most numeric computation code use hardcoded constants in math expressions (and it is the right way to do, not an ugly habit), preventing them to use generic numbers.

To overcome this limitation, the community developed tools like numeric_literals to ugly-trick the rust parser: crates playing with the AST to replace literals by conversion expressions.

#[replace_float_literals(T::from_float(literal))]
pub fn hermite3<T: Float>(a: (T,T), b: (T,T), x: T) -> T    {...}

There is still limitations though:

it needs a conversion header each function
we can only hope the compiler will optimize out the conversion and not keep it at runtime
we can no more mix multiple number types T and V in the same function because all literals are now converted to T

proposal

As a minimalistic change, rust could implicitely determine the type of a literal constant based on the type of its operand. then parse the literal accordingly like it is already doing when a _T type annotation is present.

As a much generic and aestetic solution, rust could use a compile-time constant constructor T::from_literal(&str) for every type supporting literals. Primitive types would then implement a trait FromLiteral providing this method. The compiler would seek the type of literal constants based on the type notation _T if present, or based on the type of its operands.

difficulty of conversions

problem

following implementation exists: From<f32> for f64 however From<f64> for f32 does not. Instead TryFrom<f64> for f32 has to be used.

Likewise, From<i32> for f32 (and bigger ints) and From<usize> for f32 (and bigger ints) do not exist either.

Because of this a lot of code using floats for computations but involving integers for eg. array sizes or indices, tend to rely on as conversions, which is not generic nor really recommended in idiomatic rust. Because it is not generic it doesn't extend to structs containing numbers that we want to cast: [u32] doesn't cast to [f32] as u32 does. likewise, ndarray<u32> doesn't cast to ndarray<f32>

It is easy to understand that such conversions are not implemented in core for the reason that they would be lossy. However:

Using integers where every computation is exact (except when overflowing), any lossy operation consist in a non-intuitive step and thus must be explicit and exceptions noticed in a Result. So between integers of different sizes, TryFrom is relevant.
Using floating points however every computation is lossy, this is not an unexpected but a desired behavior. This is the tradeoff for handling computations with any order of magnitude. Nothing unexpected can come from this loss of precision because IEEE standard is specifying this behavior. So loosy computation is not only usual but always, so TryFrom between floats and ints is not relevant.

proposal

Since at the moment the programmer is using a float, precision loss is always expected and no side case can happen, I consider From should be available from any primitive number to any float. it would nicely complete the genericity of numbers and would avoid writing _ as _ everywhere but instead T::from in a more idiomatic and functional style.

If From is only for exact conversions (it is not at the moment), then a Lossy conversion trait (that do not return a Result) should be added in standard rust and implemented from any primitive number to any float. I personally think using From would be much clearer than yet an other conversion trait.

no function overloading

problem

This point is only an issue because of the lack of genericity of numbers. Overloaded functions are the way some other languages are dealing with the previous limitations.

function overloading is what allows having the same function name to be used by different functions implementations and signatures. It is relevant when an operation has to be implemented for different types with conceptually the same meaning, but implementations must be different

fn foo(value: f32) -> f32 {...}
fn foo(value: f64) -> f64 {...}

foo(1.2f32);
foo(1.2f64);

No genericity here, but at least if you want to write a function bar implemented for serveral floats types using foo you can copy paste bar's implementation changing only the argument type. (Not very aesthetic, but as I said it is a workaround, C is working like this). The point is it allows providing functions supporting different types without generics. That could be useful in Rust if the genericity of numbers is not fixed.

For base math functions rust is not using overloaded functions but methods, pretending it is removing the need for overloading and genericity. But it is a lure since from other crates we cannot add methods to foreign types from the std

Using overloaded functions of multiple arguments we can also cover the issue of conversions:

// with function overloading, one overload for each combination of number precisions
fn pow(a: f32, b: f32) -> f32
fn pow(a: f32, b: i32) -> f32
fn pow(a: f64, b: f64) -> f64
fn pow(a: f64, b: i32) -> f64

pow(1f32, 0.5f32);

// without, like in standard rust, we duplicate names each combination
impl f32 {
	fn powf(&self, b: f32) -> f32
	fn powi(&self, b: i32) -> f32
}
impl f64 {
	fn powf(&self, b: f64) -> f64
	fn powi(&self, b: i32) -> f64
}

1f32.powf(0.5f32);

proposal

function overloading will be not really needed if the genericity of numbers and conversions are fixed. function overloading is less idiomatic and much less powerfull than traits, so I would prefer the previous issues to be fixed rather that function overloading to be added. So it can stay the way it is.

conclusion

To the question I raised in introduction: Did I found rust good for science ? My answer is an other question: Did the rust developers forgot that numeric computation is half the use cases of a fast compiled language ?

I think rust trully lacks the following features:

genericity of size
genericity of numbers
genericity of litterals
lossy conversions for floating points

And despite the fact many crates are trying to patch this, it is still a pain to write numeric computational code in rust. These issues and their workaround are greatly increasing the complexity of the code written, and this is something a scientist really do not need when implementing algorithms that are already complex. Of course I am also using Rust for networking and data management code which Rust is really good for. Sadly at the moment, it is easier to write computational code in C or C++.

I personally will continue to use rust for computational code only because I like not having to fear memory corruptions. and do not want to mix more languages in the same projects. One may be tempted to argue that the pain brought by rust in computational code is the price for safety (it is the mainstream answer when someone points out difficulties with rust). But looking at the proposals above: none of them has to do with safety hence could be implemented in rust without sacrificing safety.

Rust has a great potential in computation for its safety, performance and abstraction capabilities. Howerver, If the rust core team never improves these points, I really fear that this wonderful language will not be able to become a first-choice language for numeric computation. And will never be able to fully replace c++

Ratatouille · March 2, 2024, 11:23am

I completely agree

sebcrozet · March 2, 2024, 12:04pm

Disclaimer: I’m the main author of nalgebra so my opinions can be biased.

These points make a lot of sense and I’m generally in agreement. There has been some attempts to address these limitations, but, as you point out, some of them cannot be fixed without improved language support.

Regarding generics wrt. size and function overloading, they should both be solved by a complete language support of const-generics and specialization. It would also make the nalgebra codebase significantly simpler. Unfortunately these language features have not been fully implemented yet and haven’t been of very high priority (or at least, it doesn’t feel like it. I’m not part of the lang team so I don’t really know). But I agree they are essential for scientific computing.

My personal workaround for this is to always have 2 linear algebra crates in my project: nalgebra for small arrays, and ndarray for large arrays. Unfortunately their API and feature set do not match so this is introducing a lot of conversions and confusion.

Note that nalgebra has the DVector/DMatrix types (that are actually the Matrix type with the Dynamic dimension type-parameter) for heap-allocated dynamically-sized vectors and matrices. But it doesn’t have higher-order tensors like ndarray.

no genericity of numbers

There has been some attempts to improve the situation from third-party crates. The alga crate took the exhaustive approach where every mathematical construct gets a trait (magma, semigroup, quasigroup, group, monoid, ring, field, etc.) This never got very appreciated by the community given the complexity of these concepts. The simba crate takes a more pragmatic approach and defines big complex and real field traits (as well as some abstraction for SIMD). But it lacks good abstractions for integer types.

Having these in the standard library would certainly help with the traits adoption, but at the same time I can understand that the general community is divided regarding the preferred balance between granularity and ease of use.

difficulty of conversions

I agree with these points. As an attempt to overcome this, the simba crate defines the SubsetOf trait (and its counterpart SupersetOf which its auto-implemented. Similar to how From and Into work) that defines conversion in a more pragmatic, set-theoretic, way. To mention its documentation:

- f32 and f64 are both supposed to represent reals and are thus considered equal (even if in practice f64 has more elements).
- u32 and i8 are respectively supposed to represent natural and relative numbers. Thus, i8 is a superset of u32.
- A quaternion and a 3x3 orthogonal matrix with unit determinant are both sets of rotations. They can thus be considered equal.
In other words, implementation details due to machine limitations are ignored (otherwise we could not even, e.g., convert a u64 to an i64). If considering those limitations are important, other crates allowing you to query the limitations of given types should be used.

no genericity of litterals

Agreed. The way nalgebra does it is by calling na::convert on the literal, which itself is based on the aforementioned SubsetOf/SupersetOf trait. Having an implicit conversion for number literals built into the language would be fantastic.

jimy-byerley · March 2, 2024, 2:16pm

On the contrary, I would say your are among those who confronted the problem the most !

jdahlstrom · March 2, 2024, 2:34pm

Both of them are consistently among the most commonly desired features in the State of Rust surveys, so I doubt they're not a priority. It's just that they are very difficult to implement right. As far as I know, even the min_specialization feature is unsound and cannot be salvaged – an entirely new approach is required. Never mind "full" specialization.

With regard to generic_const_exprs the difficulty as I understand it lies in reasoning about value-level equality in type checker/trait solver context and specifically the fact that most expressions are non-total functions – depending on semantics chosen even something as innocent as N - 1, where N: u32 is a const generic variable, is ill-formed when N == 0. And critically this would be a post-monomorphization error, something Rust has very much tried to avoid. One would have to be able to express the constraint N != 0 in a where clause ^[1], and it quickly gets more complicated when you have more complex expressions and several variables.

currently possible on nightly with the funky syntax where [(); N - 1]:, as in, "where [(); N - 1]" is a valid type, exploiting the fact that arrays are Special^TM ↩︎

toc · March 2, 2024, 6:49pm

I would love to see a const ParseLiteral trait. One thing I can foresee being a blocker to ergonomics for some of what you have listed is default types, and intelligently propagating those defaults. Even without genericity:

let i: BigInt = 5;
let j = i + 2; // Does `2` default to BigInt?
let k = 2 + i; // what is the type of `2`?

CAD97 · March 2, 2024, 6:49pm

I'm mostly waiting for #[sealed] to get an initial implementation, but a while back I posted a pre-RFC here on irlo for core::primitive::Int and core::primitive::Float sealed traits defined as "the functionality macro pasted across the different primitive types." I'm still interested in doing so, and it's a decent first push into this while remaining fairly -evident in the provided functionality, thus hopefully less complicated than the generality of num, alga, or simba.

(funty is another alternative which is deliberately limited to the primitive types similar to this.)

Everyone would love extending literals with the trait system, but IIUC it's blocked on better fallback/inference behavior, which is blocked on the new trait resolver work. (I usually see it discussed with i__ and f__ types as a bit of cute visual abstraction over the concrete types.)

jdahlstrom · March 2, 2024, 8:42pm

I would strongly prefer that user-defined literals have custom suffixes and unsuffixed literals never resolve to user types. The current polymorphism in literals is magical enough; no need to mix things up even more. For that last bit of convenience, one can always impl Op<i32> for MyInt.

jhpratt · March 3, 2024, 12:15am

Sorry This is 100% waiting on me.

scottmcm · March 3, 2024, 9:16am

The fundamental problem with this is that the standard library wants to add new things -- like how ilog2 was added recently -- but letting users implement the trait means that it's a breaking change to add new things, since those existing implementations wouldn't have ilog2.

There's just not a great way to deal with that in the standard library. It works much better in a crate, which can do major version breaks.

jimy-byerley · March 3, 2024, 1:48pm

This is a very good point !

But does that meant that the standard library should provide only the fewest traits possible, in order to avoid any possible breaking changes ?

If not, adding new things could also eventually be done by

adding new traits (bringing a bit of mess in the std)
putting the new features as specific methods rather than trait methods (ugly and not generic)
giving the new traits item a default implementation raising notimplemented!("recent feature") (ugly but forward compatible, until the crates maintainers implement it)

The third option seems preferable, because it would be ugly only until the maintainers implement the new features.

jimy-byerley · March 3, 2024, 1:55pm

Also a fourth option, could be to consider these traits will not be often implemented in other crates than core (and std)

implementations in core will necessarily follow the evolution of the trait which is also in core
implementations in community crates will be rare, and those who implement it will be expected to keep up to date with the core trait definitions.

I think this fourth option is a good tradeoff so forward compatiblity should not discourage the use of numeric traits in standard rust.

jpleyer · March 3, 2024, 2:48pm

I think that this is very true. I think that implicit casts as often done by c++ eg. from int to double will always be unacceptable without explicitly calling for it (eg. by using From) but this should not present any runtime overhead after optimizations.

Which Trait to use?

However, I totally agree that TryFrom is the wrong trait in this case. This is even true for integer types which might have a larger Self::MAX value than the floating point type in which they are cast, since the IEEE standard explicitly includes +infinity in the floating point specification (same for negative numbers of course).

The documentation of From specifies this:

I think that the main goal of From is to be lossless. But I also think that TryFrom is completely incorrect since our conversion might be lossy but not throw an error.

Let's say you want to convert from [u64; N] to nalgbebra::SVector<f32, N>. This is a lossy conversion but valid without error: If we simply implement From<u64> for f32 we are allowed to do this (by chaining From implicitly).

let x = nalgebra::SVector::<f32, N>::from([0_u64; N]);

But this is not explicit and in my opinion against the language standard.

Possible Solution

So maybe another conversion trait such as LossyFrom (for the lack of a better name) could be proposed (simply rename From)

pub trait LossyFrom<T>: Sized
    // Required method
    fn lossy_from(value: T) -> Self;
}

And for anyone who does not care about the exact conversion, one could also use another trait

pub trait AnyFrom<T>: Sized {
    // Required method
    fn any_from(valiue: T) -> Self;
}

Implementing this last trait generically to work on types which implement LossyFrom and/or From would require specialization which we are currently lacking however.

Discussion

I know that this introduces another trait (undesirable in general) but we also have an uncountable number of String types which are only relevant for very specific cases but still carry forward their distinctions. This would be a clear addition but crate authors would have to rewrite their try_from(...).unwrap() into .lossy_from() or any_from() which I believe to be much better.

Please let me know your thoughts on this.

Edit: messed up the order of u32 and f64 and changed it in the example above.

toc · March 3, 2024, 5:00pm

This is maybe not precisely the best example, but the rest of the idea is sound.

scottmcm · March 3, 2024, 7:50pm

I think it means a combination of things is best.

For example, one side of the problem is fixed by doing what CAD said above and offering a sealed trait with all the methods.

Then the one that can be implemented by others can be in a crate where it has the option of doing a major version bump every year, say, to add the new things. Plus if the language learns more about how sealed traits will impact coherence, that trait could even have a blanket impl from the core trait.

But also the core trait will plausibly end up doing things like assuming Copy + Freeze + 'static and such. A trait designed for arbitrary things -- like a BigInteger -- will plausibly work somewhat differently and thus want different signatures in the trait.

scottmcm · March 3, 2024, 7:51pm

See Conversions: `FromLossy` and `TryFromLossy` traits by dhardy · Pull Request #2484 · rust-lang/rfcs · GitHub for some long-lasting conversation about this.

jpleyer · March 3, 2024, 11:23pm

Oops I messed up the order completely. Fixing it now. Thanks for pointing that out.

Thanks so much. This is really interesting.

gdennie · March 5, 2024, 11:02pm

Sounds like a need for a domain specific language over mathematics. I am thinking Maple or Mathematica... Of course the initial efforts needn't be that elaborate

let z: usize = math! { 1 + 1 - 2};

let z: i32= math! { 1 + 1 - 2};

let z: f64 = math! { 1 + 1 - 2};

No reason why the formulas cannot be read from some pretty printed format and/or be wrapped in functions injecting values.

bascule · March 6, 2024, 1:42am

@jimy-byerley I haven't seen anyone else mention this: as a bit of history before Rust 1.0 there were traits like you propose defined in std::num. They were removed in this PR, and spun out in the num crate (eventually finding a home in num-traits): Remove more deprecated functionality by alexcrichton · Pull Request #24636 · rust-lang/rust · GitHub

It's probably a good thing they were removed in the state they were in. There are several design issues with them which would require breaking changes to fix that have been identified over the years (just look at the num-traits issue tracker).

I think it'd be great to see at least minimal traits added back to core, and @CAD97's aforementioned proposal sounds interesting, but hopefully any such proposal carefully studies what went wrong with previous attempts to define such traits in the standard library.

jimy-byerley · March 6, 2024, 5:47pm

before Rust 1.0 there were traits like you propose defined in std::num

I was not aware of this ! It is stunning that they went in the direction of removing these traits instead of fixing the design issues o.O

but yes, I agree that minimal traits would be good back to core

Topic		Replies	Views
Why Rust fails hard at scientific computing	20	17462	March 25, 2019
Futhark size types language design	7	1382	June 16, 2020
Matrix type libs	23	5420	February 23, 2024
pre-Proposal: Math Working Group community	13	2078	February 4, 2021
[Pre-RFC] Genericity over static values internals	27	8592	March 25, 2019

Rust and numeric computation

background & motivation

no genericity of size

problem

solution

no genericity of numbers

problem

proposal

no genericity of litterals

problem

proposal

difficulty of conversions

problem

proposal

no function overloading

problem

proposal

conclusion

Which Trait to use?

Possible Solution

Discussion

Related topics