"Just make it work" types (ref-counting)

Motivation

Rust can be difficult to get into compared to other languages:

  • C often simply assumes what you're doing makes sense (often resulting in undefined behavior for inexperienced programmers).
  • Most other, high level languages do reference counting and runtime checks, often with Garbage collection that halts the entire system to cleanup memory.

I don't want to propose to change Rusts approach (I really like it), but what if it where easier to opt-out of it and fall back to the approach of other languages: Everything behind a pointer (class) and has interior mutability. In Rust that would be Rc<RefCell<T>> or Arc<Mutex<T>>. Another "pain" point for new developers seems to be the existence of multiple types that seemingly do the same thing (String, str; Vec<u8>, [u8]; ...) and possibly more strict casting rules compared to most other languages.

Proposal

What if we let the user opt-out of the ownership system and compile time errors from it with a simpler syntax?

/// Ideal for performance and stating ownership behavior
fn foo(value: &str) {...}
/// What many probably do in the beginning before running into the borrow checker.
fn foo(value: String) {...}
/// What they often intended
fn foo(value: &mut str) {...}
fn foo(value: &mut String) {
    *value = "World".to_string();
}
/// What you currently have to do to opt-out and have it behave more like
/// other languages (think classes in C#).
/// I had to look this up to get it right
fn foo(value: Rc<RefCell<String>>) {
    // Or other functions that mutate instead of creating a new owned String.
    *value.borrow_mut() = "World".to_owned();
}

// Proposal (implemented like the Rc<RefCell<_>> variant above)
// Syntax is bike-sheddable of course but this feels intuitive.
fn foo(value: !String) {
    value = "World";
}

Proposed behavior of !T types

  • Implemented as Rc<RefCell<T>>
  • borrow_mut() is automatically added where needed.
  • Relaxed assignment rules that "hide" the heap allocation (see to_owned() example above) or use more appropriate functions in the background to reuse the existing Owned type.
  • Can be used in struct/enum fields, not just function arguments
  • Any type can be used in it (except perhaps for primitives and some traits)
  • If a trait is used !MyTrait it is implemented as Rc<RefCell<dyn MyTrait>> (I think that's possible). You obviously can't use it with all traits).

Advantages

  • With one tiny change the type behaves similar to how they do in other languages (familiarity).
  • You still have access to all the other options and can (as long as you don't actually need shared ownership or interior mutability) transition to the more specific and performant options.
  • You don't even need to know how to use Rc<RefCell<T>>.
  • It's easy to write and read.
  • With more implicit casts/calls enabled (e.g. the to_owned() above) you don't need to worry about String vs str and similar.
  • With implicit borrowing from the RefCell you don't need to worry about implementation details of this.
  • You still get to choose when you want to use runtime reference counting, with a simple syntax, instead of always getting it.

Downsides

  • You opt-out of a bunch of compile-time checks and get runtime panics instead
  • You have a (small?) performance penalty.
  • Refactoring may become hard if you start relying on shared ownership or interior mutability.
  • It may point new devs in a non-optimal direction (though they can still get their work done, just with an increased risk of runtime errors).
  • It becomes really easy to introduce cycles and thus cause memory leaks, which are hard to detect without a Garbage Collection runtime.
  • We use a potentially useful syntax position (though I can't think of something else that would make use of !T (I'm not 100% sure if there are parsing conflicts with patterns).
  • Technically the following would be a valid Type: !! (Rc<RefCell<Never>>). Not really useful though.
  • Another piece of syntax to remember, but I think it is relatively easy to remember this one, as it's just a "behaves like other languages by using ref-counting and interior-mutability".

Alternatives

  • You don't always need/want interior mutability. If there is no desire to use !!T for thread safety it could be used to distinguish between Rc<T> (!T) and Rc<RefCell<T>> (!!T). Or it could make this sytnax too verbose to be useful to developers new to Rust.

Future extensions

  • This could be extended to thread safety: !!String would become Arc<Mutex<String>> instead, but Mutexes are well known and there is a high risk of deadlocks if this is implicit (especially when it's only hidden behind 2 characters), so that's probably not too useful to have.
  • To avoid the leak issues an optional garbage collection specifically for these types could be added (e.g. provided in a crate).

I want to re-iterate: This is not about changing Rust, but to make the learning curve easier by providing an easier-to-use option that is familiar to those coming from other languages. Sometimes you (think you) don't care about not having panics or don't need the highest performance.

What are your thoughts on this? Currently we're hiding one of the most familiar way used in other languages behind a hard to remember type because it isn't the most performant way [1].


  1. Yes, there are problems with it, see the leaking issue. ↩︎

I don't think it'd be a good idea to provide an automated method to wrap everything in Rc<RefCell<..>> (or anything else designed to "just make it work", like Arc<Mutex<..>> instead), especially for beginners, since that gets them used to doing things in a way that's unidiomatic and doesn't take advantage of Rust's strengths. Especially since I could easily see this turning into "dialects" of Rust, where some codebases reject it (any codebase I control will turn on lints to disallow this syntactic sugar) while others run by people used to this.

When it comes to helping people learn Rust, I think it would be better if we focused on helping people overcome the existing difficulties, by providing better error messages and other learning resources. As someone who knows Rust pretty well, I'm long past the point where I can tell where new learners get stuck, but as I interact with learners who struggle with certain things, I try to make issue tickets for places where we can improve the diagnostics, like Suboptimal diagnostic E0599 on `impl AsRef<T>` · Issue #140178 · rust-lang/rust · GitHub (which I will note, your proposal would have done nothing to help my friend deal with this issue).

Especially if you have examples in your head of people struggling to figure out the ownership aspects of Rust, it'd be great if you could provide examples of the situation they faced, what they did, and what rustc/cargo/online docs could have done differently to better help them.

9 Likes

While in theory you can, often it's not easy to perform such transitions because using Rc<RefCell<T>> will have put you in a situation where you can't easily get references into the objects you already have. There's no shortcut to a good architecture.

3 Likes

I 100% agree, and transitioning a project back becomes really difficult, as with any large architecture changes. Yet lot's of people choose (or are forced to use) languages that force you to exactly this. I wouldn't recommend anyone to use this over normal Rust (and yes, making it easier to use increases the risk of "dialects"), but which of the following would you consider the best option (let's say for some reason you can't fully write it in normal Rust due to lack of experience, especially in regards to designing a good architecture):

  • Write the entire application/library in a different language and avoid the problems from multi-language projects. But you also can't make use of the ownership system when needed/wanted in the future
  • FFI via C: Can result in annoying/fragile build setups, mismatches between the two sides and has many other restrictions in what you can use (because it has to be represented as C type).
  • Hope for improved FFI to higher level languages in the future.
  • (exaggerated) Spend 2+ months/person to learn the language and how to architect a good application in it.
  • Stay in one language (which may become easier to learn because of familiarity) and decide to opt-out of maximum performance, compile time errors and ownership, while being able to use all existing Rust libraries since you can always go from Rc<RefCell> to the required types (at the risk of runtime panics of course). In other languages you wouldn't even get runtime panics (e.g. when having multiple threads).

I don't have such an example. It was just a thought/idea I had while developing in C#, wondering why many projects use a language in which this is basically the default (and missing some functionality from Rust). Thinking about why people choose Linq even though it likely has performance downsides and why Rust is considered difficult to learn (better errors always help of course).

Regarding "dialects": Not to dismiss your argument, but the same could be said about other language features: async/await, nostd capable, Linq in C#, C#'s nullable, catchable panics vs Result.

I'm myself not fully convinced if this is a good idea (thanks for the critical response).

The main issue with such a "just works" type is

It's the same underlying reason as why RefCell requires you to write .borrow_mut() instead of just providing access to the wrapped value[1] — the extent of the borrow may be "obvious" in "simple" cases, but as soon as you start using more of the language than just getting/setting fields, then the extent of a borrow is no longer simple.

Just to touch on these,

  • #![no_std] compatibility isn't a dialect, it's a subset. Yes, it is something that libraries have to deliberately provide, but splitting the parts of your library that require alloc/std from the parts that don't (at least in theory) creates a better overall API design that gives more control to the consumer. This is a “sans IO” kind of way of considering things.
  • Panics are only meant to be caught at the task boundary. Using panics to unwind sub-task units is a possible dialect of Rust, but not one that you'll see in public code. Even the most notable reliance on unwinding panics I'm aware of (rust-analyzer via salsa) does so at the task level for cancellation.
  • Calling async/await a dialect isn't wrong, but it isn't quite right either. Calling tokio a dialect is perhaps a bit more correct. It's the common practice to panic if starting an async executor while one is already running, since that is rarely what you actually want to do (you should spawn on the existing one instead), but both the sync-in-async bridge (spawn_blocking) and async-in-sync (block_on) are conceptually quite simple.

  1. In a theoretical alternate Rust that allows temporary deref proxies in order to support such. ↩︎