Goals and priorities for C++

ckaran · April 7, 2020, 2:19pm

So what we're talking about is Pareto efficiency, but we each have slightly different contours for our value functions, which means that our frontiers/level sets are different (just to be clear, I'm agreeing with & paraphrasing what @withoutboats has noted).

Unfortunately, there is no solution to the problem; by definition, all points along a given Pareto frontier have equal value. That said, there are things that we can do to make it possible to advance all of our goals at the same time, in spite of the fact that our frontiers are different.

First, we need to fully define rust so that its meaning is mathematically unambiguous. This will give us a firm foundation for anything else we decide to do. One of the things that holds C/C++ back is that the compiler has to be conservative in deciding what can or can't be done, which means that the performance is not as good as it could be. In my opinion, rust should be specified well enough that it is at least theoretically possible to do formal proofs across what it does; the work that @RalfJung et. al. have (and are) doing in RustBelt should be more heavily invested in.

Once the formal underpinnings are in place, we can check programs, and decide if certain things are true at certain points in the program, which allows further performance gains. This helps address @matklad's performance concern. As an example, consider the lowly floating point number. As far as I know, the rust compiler today cannot do proofs over floating point numbers, so if I have code like the following (not tested, may have errors):

let foo = vec![23.0f32, 45.0, -89.0];
if foo.iter().all(|x| x != 0.0)){
    let bar: Vec<f32> = foo.iter().for_each(|x| 1.0f32 / x).collect();
}

it can't do something 'smart' with the division. In tiny code like this, it doesn't matter, but if there are repeated calculations, it would be nice if the compiler had the option of doing something smarter (which can only be done if it can prove that the smart thing to do is correct).

However, the formal underpinnings also help with maintainability by reducing the false positives that clippy and rustc can produce. For example, a hypothetical 'divide by zero' lint that could perform logic over floating point numbers could safely ignore the division by x above, but could flag it if there was a path to the code that didn't protect against division by 0. That lets a programmer focus in on the real issues, rather than wade through large piles of cruft.

jnordwick · April 7, 2020, 5:58pm

Rust also includes the base libraries though - so far I'm 0 for 2 in my ability to std::collections for anything latency sensitive and more complex than the simple insert/retrieve use case. You can't just always tell somebody "fork your own" or "crates.io might have something" and expect them to want to use the language for something important.

ckaran · April 7, 2020, 9:16pm

Which goes back to the 80/20 rule; for 80% of the use cases out there, std::collections is good enough, but for that other 20%, you really have to roll your own because without being able to profile your particular use case it isn't possible to make a general purpose library that can outperform a specialized crate.

That said, I'm sure that if you can figure out ways of improving the runtime of the standard library without breaking any compatibility guarantees, everyone would be interested!

jnordwick · April 7, 2020, 10:41pm

The problem with 80/20 is what 80 percent of who? If rust attracted a large number of wasm people initially, are you going say 80% of them? Then it just becomes self-reinforcing, because the high-performance systems people will use the language less and rust just turns into an overly complex web development language if you keep focusing on that group as your 80%.

And you just saying 80% of people find it usable? 80% of large projects? 80% of public projects? It is such a nebulous target that it is useless.

It is also a false dichotomy at times where adding a few things doesn't distract from the 80%. If there are performance gains to be made, why leave them on the floor out of some false nothing it is "good enough".

If the constant cry is to roll your own data structures, rust is going to be a second tier language in many of the areas it was supposed to do well at - eg, systems and performance/latency sensitive work. With the continuing assault on unsafe code too, rolling your own becomes less and less appealing.

Rust is extremely difficult to write clear data structures - I gave up trying to play the Cell games, and just go straight to raw pointers now for everything under the hood. If you want to tell people to write their own ds, then you need to also give them better tools and constructs to do it.

By far the most important factor to me continually trying rust once a year until it becomes usable for me is performance. If I basically have to rewrite std::collections (and probably many other things if 80/20 was the dominant paradigm), I'm out.

Pauan · April 8, 2020, 12:37am

I would really prefer to continue this discussion in a new thread (maybe a mod could split it into a new thread?), but I'll just say that I'm a core member of the Rust Wasm working group. I am well aware of how Wasm works.

Yes, which is exactly the same as running your Rust code in an OS process (for memory isolation) and something like SELinux or AppArmor (to prevent calls to OS APIs).

I hope we can agree that "running your Rust code in a secure OS process makes unsafe safe" is an incorrect statement.

Wasm does not really mitigate any security issues within your Wasm code.

The only mitigation it has is that the host can provide the minimum set of capabilities to Wasm. This minimizes the harm that Wasm can cause to the outside world, but it does not remove it.

This is the exact same situation as running your Rust process under SELinux or AppArmor.

But that's somewhat of a tangent, because we're not talking about security (which is a broad topic), we're talking about unsafe and undefined behavior (which Wasm does not protect against).

Interface types aren't needed for dynamic linking of Rust code (which could theoretically be done now, if Rust supported it). They're needed for dynamic linking with the host OS and non-Rust languages.

Dynamic linking of Wasm modules is equivalent to using multiple OS processes and using IPC to communicate between them. That certainly does increase security, but it still does not make unsafe safe.

Interface types simply define a common ABI that allows Wasm modules to communicate with each other using shared types. You can think of it as being like the C ABI, but much higher level.

But it's possible for Wasm modules to communicate right now, by using the C ABI. They just need a common ABI to communicate, regardless of whether it's the C ABI or interface types.

The big benefit of interface types is that they are standardized, so you aren't relying on an ad-hoc and low-level C ABI.

Interface types are a big deal, but they aren't really a security feature (since everything they do can already be done today). Instead they're a convenience and compatibility feature.

Yes, it's definitely exciting, but it's not fundamentally any different from using multiple processes and IPC. Wasm modules are basically just lightweight processes. Their behavior isn't any different, they're just more efficient than processes, that's all.

The efficiency definitely enables cool new things, but let's not kid ourselves into thinking that Wasm is more secure or safe.

It is compiled as one large statically linked Wasm binary. "Modularizing" the Rust code would be equivalent to dynamic linking (which isn't currently supported by Rust).

That's correct: dynamic linking with Rust + Wasm is hard (it's an unsolved problem), there is no Wasm sandboxing within the Rust code, and your statement does still stand.

RalfJung · April 8, 2020, 6:56am

I don't think anyone says to just leave performance gains on the floor. libstd data structures and algorithms constantly see improvements. To name just two examples, we entirely swapped out the HashMap implementation, and also BTreeMap got a bit faster recently. This happens after a lot of benchmarking to make sure that overall, the gains outweigh the losses.

All it takes is someone willing and able to actually put in the work, both for the implementation and the benchmarking to convincingly demonstrate that the change does not regress things elsewhere.

chrisd · April 8, 2020, 7:34am

Can I ask what is wrong with using third party libraries for data structures and algorithms? Especially if you have specific needs.

It seems to me that std can never be all things to all people but libraries can by virtue of there being multiple libraries that can each have implementations tailored to a specific use case. If Rust's ecosystem isn't yet robust enough then that would seem to be the real issue. Not the std library.

PoignardAzur · April 8, 2020, 8:52am

Aside from other concerns mentioned above, I want to add that Interface Types so far are a bit of a mess.

So far most effort is focusing on being able to exchange slices of raw bytes to and from the host. Beyond that, the theoretical work on this feature is kind of stalled, in part because there's not much common ground between between the abstractions that make sense for language interop and the abstractions that make sense for wasm.

In particular, there's no real effort being made to represent complex type hierarchies (eg arrays of structs of arrays, or anything involving references) in interface types, and it's up in the air whether there will ever be an effort in that direction. After all, it's not really what wasm is about.

So even in the long term (think 5 or 10 years for now) I think wasm will be very useful for coarse library sandboxing; things like what Firefox is already doing now, compiling an entire video codex to wasm for cheap sandboxing, with untrusted chunks of bytes as input and output.

I really doubt wasm will ever get to the point that dynamic linking of mutually-untrusted rust modules is remotely as easy as static linking using a single language.

ckaran · April 10, 2020, 3:34pm

Can you give a concrete example where this is true? What are you having problems with?

Amanieu · April 10, 2020, 8:13pm

I have a great example: hashbrown, which is now the HashMap implementation in the standard library. The code is split into two parts:

RawTable which has a completely unsafe API and uses raw pointers everywhere. Almost all of its code consists of unsafe fn: it is basically C code.
HashMap and HashSet which are safe wrappers around RawTable, and deal with all the lifetime-related issues.

When it comes down to writing high-performance data structures, Rust's "no mutable aliasing" rule can hinder more than it helps. It is possible to write code that follows this rule but more often than not you end up with code that is twice as long as the equivalent unsafe code (I personally believe this is the case with the BTreeMap code).

With that said, it is important that all this unsafe code stay internal to the crate: the public API should only consist of safe functions (except where absolutely necessary).

ckaran · April 14, 2020, 1:55pm

I fully understand your feelings; here are my thoughts

Yes, but...

This is true to the point that it is physically painful to me (I live the 'bad metrics' problem every single day of my life).

Where possible, std::collections should be improved, but because it is a general purpose library, it is unlikely to be tuned for a particular need. The analog is my car vs. a race car vs. a dump truck. My car is an ordinary 4-door; it isn't fast, and it doesn't carry a lot of dirt, but it is faster than a dump truck, and can carry more than a race car. This fits the vast majority of use cases out there for the vast majority of people (Camrys are popular for a reason). However, it will never win a race against a race car, and it will never haul 20 tons of dirt at a time; for those people that really need to do either of those jobs, they'll need to use the appropriate vehicle. But those people aren't going for std::collections; they're going for highly tuned GPU based parallel architectures, or something else where performance is more important than just getting into the car and cranking the engine.

Amanieu:

ckaran:

Can you give a concrete example where this is true? What are you having problems with?

I have a great example: hashbrown, which is now the HashMap implementation in the standard library. The code is split into two parts:

RawTable which has a completely unsafe API and uses raw pointers everywhere. Almost all of its code consists of unsafe fn : it is basically C code.

HashMap and HashSet which are safe wrappers around RawTable , and deal with all the lifetime-related issues.

When it comes down to writing high-performance data structures, Rust's "no mutable aliasing" rule can hinder more than it helps. It is possible to write code that follows this rule but more often than not you end up with code that is twice as long as the equivalent unsafe code (I personally believe this is the case with the BTreeMap code).

And here is an example where performance matters enough that someone is willing to spend the time and effort to tune it (make a race car). I'm lucky enough that they are willing to share said race car with me! .

In the end, it really comes down to economics; while I (and I suspect most people) want improved performance, they have to balance the costs of those performance improvements against their benefits.

system · July 13, 2020, 1:58pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Priorities after 1.0 policy	104	42558	March 25, 2019
[Blog post] Rustacean principles community	24	5880	December 23, 2021
Requirements for Rust guidelines	4	2522	March 25, 2019
The most wanted libraries! libs	5	1374	February 17, 2020
I've never contributed directly to the overall Rust project before. I am however quite interested in getting the "windows-gnu" target back to actually "feeling" like a first-class target. How can I help?	6	1131	February 15, 2020

Goals and priorities for C++

Related topics