Why Rust fails hard at scientific computing


#1

This was posted to reddit this week and there is some interesting discussions going on in the comments:

Since we don’t have a /r/rust_meta, I am going to ask here:

  • Are there any actionable items that we can extract out of this discussion?

For example, there seems to be some confusion about how multi-dimensional arrays work, which might hint that the docs could be improved.


#2

Well… once

https://github.com/rust-lang/rust/issues/44580

is implemented, Rust’s arrays can finally be first class citizens. We’ll be able to properly implement traits for not just all kinds of array sizes, but also traits that allow yielding arrays of different sizes.

Before that, other than the doc improvements you mentioned, I don’t think we have much that we can do


#3

Maybe it would be worth adding some examples to the cookbook for multidimensional arrays, which as Steve Klabnik says are not actually language constructs, but rather derived ones.


#4

More examples of arrays-of-arrays certainly sound good. Maybe Vecs-of-arrays too, and perhaps the common IRC question of “how do I turn a Vec<[f32;3]> into a Vec<f32> (without copying)”.

But here are the bullets from the article:

  1. Too much symbols & <> :: {}
  2. Arrays in Rust are a second-class citizens, actually I think they don’t even have their visas
  3. Rust is still “discussing” integer as generic type parameter (since 2015)

I’m not sure there’s all the much new there, and they feel more generic than particularly-scientific-related to me.

I don’t think there’s much to be done about the first one. Certainly we can’t get rid of the need for & (however it’s spelled). I find it ironic that the symbols complaint also talks about Box, since it was changed to not be a symbol

The third is that the fix for the second isn’t done yet, so hopefully things are already on the right track here.


#5

To me it feels like Rust isn’t complete yet - there are a lot of unstable features that mean you either pay extra performance overhead - returning Box<Trait> vs impl Trait, using Rc for self-referential or cyclic structures, HashMap::entry needing to clone the keys, etc, or you ignore the safety stuff and basically write C. And in some cases, like CoerceUnsized you’d have to throw it all out and transmute a pointer or give up that behavior entirely. I only use nightly for these reasons - on stable i spend too much time working around missing features. And I’m still confused what the plan is around things like CoerceUnsized which has been working but “unstable” for at least a year or two.

All that said, I think Rust is incredibly innovative, has an awesome, smart, hard-working team and community, and I really admire the community design/RFC process. It’s probably the best overall language community I know of right now, and I’m excited to see where it goes. I guess I just wish there was more of a push for stabilizing features, especially anything that’s necessary to write the stdlib - I think that would make it feel like a much more complete language. I can see how everything will be more coherent by waiting, but I also think a lot of people like this author are going to miss that point and just say “not capable” or “too many workarounds” until that happens.

Just my thoughts. I wish I could use five years from now Rust, that’s gonna be awesome. I think any time your project makes someone feel impatient you’ve gotta be doing something right.


#6

Sure, this wasn’t meant to discuss this topic in general. I just wanted to collect some “actionable” ideas from that particular user experience to be able to fill some issues like this one. @dhardy I’ve added it to Rust by Example instead of the Cookbook since I think it fits better there.

I think that for the cookbook we would need to come up with a more concrete use case of how they work in general.


#7

C# has both rectangular (int[,]) and jagged (int[][]) arrays, maybe it’d help to show both styles ([[i32; N]; M] and [Box<[i32]>; N]) and explain the differences? Of course, in C# it’s that simple because of the GC, in Rust you still have all kinds of choices around sharing and mutability.

It also might help to list common complaints and misconceptions (e.g. can’t clone big arrays which was recently fixed), how they’re going to be addressed and why they aren’t yet.


#8

This is sorta what the impl period is about! It’s basically just started though, so not a ton of results yet.


#9

There are various things that should be very useful for scientific computing in Rust. One of them is a syntax to denote the end of the array/slice, like the “$” symbol in D language plus its operator overloading.


#10

You could kind of do that with a RangeOffset type and a ..$ shortcut operator, but you’d need two range types for front/back offsets and back/back offsets. Maybe if there was a way to specify an inverted range without reversing the collection? That could be done right now with a trait, although only on nightly until RangeArgument or its replacement is stabilized.


#11

Other thing which is definitely needed for scientific (number crunching) computing is SIMD and assembly. AFAIK there is no concrete plans on the stabilization of the later in sight.

As for slices and arrays it will be definitely convenient to slice arrays from slices (so buf[10..20] will have type &[T; 10]) and ability to do assign operations on slices. Yes, there is array_ref for former, but this functionality imho should be in the standard library.


#12

You can do a little better than this, if you introduce some slice length analysis, similar to the Value Range Analysis I’ve discussed elsewhere. This has to be part of the type system.


#13

Of course there is a plan for the stabilization of inline assembly! and many ideas about how to improve it. The only reason you don’t see much progress here is that nobody cares enough to put in the work. If you want to work on this, ping the people in the thread.


#14

This is an very long ways away. You’d new a new type for ranges that lifts their ends to consts when their ends are const expressions, and then the output array’s length would be End - Start, which is allowed in the first pass but will be extremely frustrating because we can basically never unify it with anything.


#15

Well, one straightforward workaround would be to implement TryFrom<&[T]> and TryFrom<&mut [T]> generically for &[T; N] and &mut [T; N] when we get const generics. I think From<&[T]> which will panic with mismatched lengths will be useful too, but not sure if others will feel the same way. This way we will be able to write:

let array_ref: &[u8; 16] = (&buf[100..116]).into();

Of course it will be a bit less convenient than direct slicing into array, but it will be something.


#16

Aside from the language features, i’m thinking about a hypothetical crate “request/recruitment” mechanism:

Maybe there can be something like a bulletin board, people can write down what crates people are looking for, what exactly they want. And then other people can exhibit their ideas, prototype designs and works. People can know who’s working on this subject, and how’s it going. And people working on the same topic and learn from each other… People solving problems can even be awarded somehow (i’m not sure)…


#17

Into must not fail.

Note there are plenty of crates on crates.​io for converting slices to arrays. Hell, even I have a published crate that includes such functionality (in my defense, it does other things).

These all crates differ in subtle ways due to the authors’ various usecases. arrayref, which extracts an array prefix, was clearly written with parsing in mind. Meanwhile, in my case, I wanted my public APIs to be able to use familiar and obvious types like &[[f64; 3]] for a list of 3d positions, so my crate (intended as adapters into/from such APIs) requires exact sizes and loves to panic.


#18

It might be good to ask ourselves why libraries like numpy and scipy took off in the Python community for doing numeric tasks to see if there are lessons we can learn from them.

I’m a massive fan of Rust and have been using it for over a year, both in my own time and at work, yet I still use Python when I need to do anything numeric because Rust tends to get in my way. Particularly with how it gives you control over everything and you need to put thought into your types, mutability, and borrowing.

It may also be a good idea to specify what facet of scientific computing Rust wants to target. When I hear the term “scientific computing” I usually think of ipython notebooks and doing data analysis for a lab at uni, but that’s probably completely different from what others mean when they talk about scientific computing.


#19

There are two kinds of scientific programmers, those who use scientific libraries and those who write them. Users of scientific libraries are usually scientists, not professional programmers, and therefore need a simple language. Simple implies interpreted, which implies slow, but this can be mitigated by having the core algorithms written in a lower-level language and exposed as a vectorized library in the higher-level language. Numpy and Scipy do this for Python. R and Matlab have a similar design built in.

Rust will never compete as an environment to use a scientific library. (Biologists will not be writing Rust code anytime soon.) However, Rust has the potential to change the landscape for those who write scientific libraries, which are currently written in C, C++, and Fortran.

I won’t address the specific issues in the article, but my experience with Rust so far has is that it is not quite there for writing scientific libraries. I tried to write a simple linear algebra library in Rust (mainly to learn Rust and test its suitability). I succeeded in learning Rust thanks to the helpful community and excellent docs, but failed in writing the library. I ultimately couldn’t find a way around a compiler oddity (posted on SO and Discourse) that shows up in a definition of an abstract Matrix type. My final impression, perhaps inaccurate, was that the code paths needed for scientific programming were not as well traveled as those needed for systems programming. I come back every couple months to update my copy of nightly to see if the error changes.


#20

This very same error was the final nail in the coffin for my effort to do… well, kind of the same thing!

I was trying to design a wrapper structs which would make linear algebra operations as painless as possible on standard language types like slices and arrays. With specialization still far off on the distant horizon (and unlikely to help for the reasons I would like it to be able to), it took me days to finally draft out a non-overlapping set of impls for ::std::ops operators and conversion traits into and back from these wrappers, and to implement them via macros. This error was my reward.

I eventually settled on a utility designed solely for elementwise ops on 1D slices and vecs.

let V(new_pos) = v(pos) + alpha * v(&direction); 

// or, more frequently, since I often have Vec<[f64; 3]>:
let new_pos = {
    use ::slice_of_array::prelude::*;  // aye aye aye
    let V(new_pos) = v(pos.flat()) + alpha * v(direction.flat());
    new_pos.nest().to_vec()  // blecchh
};

and even then I occasionally need to write things like let V(pos): V<Vec<_>> = ... because if I give &pos to a function that takes &[T] then it may infer the type of pos to be [T]. (I have no minimized test case of this; it only shows up when you least expect it, and feeds on the tears of those who thought they were finally free)