Focus of Rust evolution

This is to continue the discussion started by @leonardo in the effects system thread, which I think deserves more thought than fits in a quick OT:

As someone who also works on scientific computations, I can totally understand this perception/frustration. I think it ultimately boils down to several mutually related factors:

  1. The current population of Rust designers and users
  2. The design and evolution philosophy of the Rust language

When it comes to population, I hope I will surprise no one by saying that the current Rust community is not heavily focused on number crunching. From my totally biased view of the u.r.l.o discussion topics, I get the impression that the community of Rust users is currently largely focused on...

  1. Web development (mainly back-end, but with growing front-end interest)
  2. System infrastructure (CLI tools, multimedia libraries, or more radically embedded/OSdeving)
  3. Game development (and more generally interactive GPU-intensive programs)

The composition of the user base naturally impacts the focus of the language's development in every area: which features are requested, how likely each feature is to be welcome, who is ready to design and implement stuff, and in the end what actually gets done.

It also affects library development, and eventually leads to survivor bias: the users who are most likely to stick around are those whose profile best fits Rust's current community and development directions, which in turn further increases the language's momentum in that direction.

To phrase it otherwise, since arrays are a pain in Rust right now, people who care a lot about them are unlikely to stick around long enough in order to push the language in a direction that makes them more pleasant to use. Unless, of course, they have other interests that Rust caters to better. For example, speaking personally, I love lock-free algorithms and system tools, and Rust is a joy for that.


When it comes to technical direction, Rust has so far taken the Haskell approach of trying to do as much as possible in libraries using few powerful language abstractions, rather than the Ada/C# approach of bundling direct support for identified use cases into the language using many focused abstractions.

What this effectively means is that use cases like number-crunching cannot be directly improved upon, and must instead await the completion of more complex language features that will take a lot more time to get done, such as const generics. More radically, some of Rust's broadly beneficial design choices can be directly antagonistic to "direct" number-crunching usability, interactions of borrow checking and bounds checking with array indexing being an easy example. If you're doing something like 2D stencil computations in Rust, you'll probably want to do it via high-level abstractions rather than talk to the language directly.

This "library-first" strategy of Rust has some justifications though. Languages which evolve through single-purpose features, like Ada and C#, tend to age badly. Related but disjoint features quickly pile up and increase the language's spec complexity, resulting in unpredictable interactions, that make the languages hard to learn and hard to implement. This, in turn, leads the community to take a very defensive stance towards language additions, which ultimately causes stagnation since the missing features cannot be easily implemented in libraries. Ada is unfortunately already there, and I expect C# to get there when it will reach the same age.

Hopefully, Rust's focus on library-building tools and will lead it to age better than that. If it does not become an expert-only language whose complex abstract concepts are impenetrable to all but a very small elite beforehand, that is.


To summarize, I don't think that improving Rust's number-crunching story is impossible. But for that dream to become true you will need to sacrifice a generation or two of early adopters like us who are ready to deal with Rust's current warts in this department.

These early adopters will need, in turn, to move forward the development of a good library-based infrastructure for Rust numerics, and to push the development of language features which are needed to make such library development easier. With time, this process will hopefully lead the language to a point where it is more readily usable in this area, leading more users to join, and eventually to the Rust community becoming more active in this department.

However, I don't think there is much to hope from requesting direct core language support for these use cases. Rust is not a numerics-focused language like R or Fortran. Innovation in this area will only be able to come from library additions, not language ones. Which is where the high-level discussions that you are hesitant about come into play :slight_smile:

6 Likes

I didn't mean to show disrespect toward the Rust designers/implementators. Rust already avoids several design mistakes I've seen in older languages.

C# seems to add some features for specific use cases, but I still think C# is well designed, it's aging well enough, and I think its future is much brighter than Ada. If you look at the used of C#, you see that the single-purpose features aren't hurting much the Unity communities, etc. They are even designing a subset of C# that can be optimized better for game-related purposes (https://www.youtube.com/watch?v=NF6kcNS6U80 ).

As someone not currently working in number-crunching, can you explain some use cases in which arrays are a pain to use? I think that identifying precise use cases it will help people outside the problematic area to be more aware of the problem.

4 Likes

Some examples of array/slice pains:

  • Arrays are only well-supported by the language (with trait impls, etc.) until a certain length.
  • Rust’s slice/iterator ecosystem is heavily biased towards dynamically sized containers, and interoperates poorly with statically sized arrays. For example, collecting an iterator into an array is a pain.
  • Using variable-size arrays as function parameters or struct members is a pain.
  • The borrow checker does not understand that array elements are distinct, and this makes references to array elements or parallelization of array computations unpleasant.
  • Arrays are bound-checked by default, and the syntax for unchecked manipulations (which are often necessary in index-based algorithms like stencils) is a pain.
  • Array/slice manipulation is heavily geared towards reference-based access, which leads to the aforementioned borrow checking issues in spite of being unnecessary for the Copy types that are commonplace in computational use cases.

These are just examples, many of which are already being worked on. Const generics and functions in particular will address many of these issues, at the expense of potentially causing a big language community split between compile-time and run-time Rust. But they will not solve every problem.

16 Likes

In my quotation I was also talking about numbers. But if we keep the discussion focused on just arrays and slices I can add some points to your list:

  • More reliable, efficient and succinct slice <-> Array conversions
  • Indexing with types other than usize
  • Optionally strong types for array/slice indexing
  • Better ways to initialize arrays and avoid double initializations
  • A contextual and overload-able way to denote the end of an array/slice
  • Reliable, efficient and succinct slice copies and assignments

I don’t think a better handling of arrays, slices and number are “single-purpose features”, I think they are useful in many situations.

3 Likes

It would be nice to see some code examples demonstrating the bulleted issues along with proposed syntax/semantics that would correct the issue (for those of us not in the know). Something like that might help to move things forward.

4 Likes

I think this should be solved not by making unsafe unchecked indexing more ergonomic (buffer overflows in Rust? No, thank you...), but by making Rust smarter around proving bounds checks redundancy (e.g. sometimes you have to use explicit slices instead of assert_eq!) and by developing an ergonomic tool which will be able to highlight indexing operations for which LLVM can't remove bounds checks. (second should help with the first) Ideally we need something like godbolt compiler explorer with helper annotations for unfamiliar with assembly.

2 Likes

I think that’s it’s really important – and not just for scientific computing – to implement the concept of a slice-with-stride. That is, not just a ref into a contiguous area of memory of the form n, n+1, n+2… n+m, but a ref into an area of the form

contiguous of length m – gap of length p – contiguous of length m – gap of length p …

This is a language construct, not something for libraries. I suspect that it necessitates original research in algorithmics, and a lot of PRs for getting things like reborrowing into optimal shape. But then a lot could be done with just that.

Could you give some examples of how/where this is useful and syntax you would envision for it? I’m guessing for things like you have an mxn matrix (could be pixels or anything else) and you want to take a “slice” that consists of a…bxc…d where a <= b <= m and c<=d<=n, no?

If I understand correctly, basically an arbitrary length borrow that is validated by a compiler? That sounds like a lot of work, that probably boils down to an impossible solution.

I think that these two approaches are complementary, in the same way that everyone agrees that compiler auto-vectorization is a good thing, but no experienced user thereof would suggest that it is a full replacement for manual vectorization primitives.

Optimizer magic is awesome, but brittle. All it takes to break it are things which, in a developer’s eye, are ultimately minor/irrelevant annoyances, such as a function not being inlined, bounds check not realizing that a Vec’s length cannot change during iteration, or alias analysis failing to determine that two memory accesses cannot overlap. When that happens, getting the auto-optimization magic to work is an unpleasant game of trial and error, which requires significant knowledge of the inner workings of a compiler, makes the code less readable, and is not guaranteed to keep working as the optimizer is updated in new compiler releases.

We need auto-optimization because it works in the simple cases, and leaves the developer free to focus on the complex cases. But beyond a certain level of auto-optimization failure, one has to admit that the trade-off becomes favorable to unchecked array accesses. A good optimizer can shift the threshold, but none that I know of ever managed to eliminate it entirely.

The main difference between SIMD and unchecked indexing is that later can lead to horrible bugs and security vulnerabilities, which radically goes against Rust goals. Maybe some functionality of safe “explicitly provable unchecked indexing” could be an alternative option. In other words for given new syntax Rust compiler (not LLVM back-end) will eliminate checks if necessary proof is available, and if not it will produce compile-time error. But I have no ideas how this functionality could look.

UPD: I think in early days of Rust it had a capability to list restrictions on input arguments (a la Ada?), but it was removed due to the noisiness. Maybe its revival in some form could help?

1 Like

Can’t one do unchecked indexing effectively with “unsafe” and pointer manipulation? Couldn’t this be put in an unsafe trait/trait with unsafe methods? It you really want to do “unchecked” indexing, that is clearly “unsafe”.

EDIT: Might a possible syntax for this be?

let i = computeNextIndexToUseBasedOnMyAlgorithm( ... );
unsafe { let el = a[!:i]; }

where the “[!:]” is the "unchecked/unsafe index operator (bike-shedding on syntax welcome). Basically, you are asserting that your algorithm is such that the index you’ve computed is guaranteed to be in-bounds for the given array and that the compiler does not need to check. It needs to be in an unsafe block because you are asking the compiler to trust you.

Some compilers have pragmas which basically state “please vectorize this loop if you can, and throw a compiler error otherwise”. Rust could probably take a similar approach for bounds checks, with some kind of #[elide_bound_checks] loop attribute.

The main problem with this approach is that like -Werror, it turns aspects of a compiler which are not covered by any strong stability guarantee into sources of compiler error. Therefore, it is only suitable for debugging, and should not be used in production builds, which somewhat undermines its usefulness.

2 Likes

Unchecked array accesses are most certainly unsafe, there is no disputing that. They can cause serious bugs which are hard to track down, such as data races, and should thus be kept in small and focused code regions (e.g. inner loop, high-level abstraction implementations).

What I am saying is just that names like “get_unchecked” and “ptr::offset” are often much too long for use in numerical computations, where having tens of array accesses per line in the heart of a kernel is not uncommon. A shorter unsafe indexing syntax, such as the one that you are proposing would be needed to make this kind of use cases practical. Even a simple “at()” unsafe method, which can be added today by a trait, would help already :slight_smile:

Better yet:

let el = unsafe { a[!:i] }

OK, so, something like this would be a solution?

let result = unsafe { a[!:i] * a[!:i+3] / a[!:i*2+5] } 

EDIT: Of course, in the above, for it not to be a disaster at run-time, our "algorithm" would've needed to compute i such that i, i+3, and i*2+5 are guaranteed in-bounds for our array.

As I said, that would be one workable approach, assuming that the syntax is available (possible issue: can you define a unary not operator on a type which is usable as an array index?) :slight_smile:

Another approach which has been suggested in the past would be to annotate blocks of code as using unchecked indexing. Not sure if that is better/worse.

Well, that's why I suggested, "[!:]" and not just "[!]" to avoid the problem of unary ! (I hope). Alternatively, what about?

let el = unsafe { a[#i] }

Where now the unsafe index operator is, "[#]"

But that's just as easy to add yourself, no?