Same old issues recurring (self-referential types, generics less flexible than C++ templates)

Here's a thoughtful feedback on problems with Rust:

These are the issues that I keep seeing over and over elsewhere too.

  • self-referential types. I know it's super hard to solve it in the borrow checker, but it's also a massive PITA. Could Rust have at least a better stop-gap solution? (for when real references are needed and indices or rc don't cut it).

  • lack of duck-typed generics and specialisation (and in their case, fields in traits would also help).

    Their specific example isn't the worst case of mismatch between goals of Rust generics and C++ templates, but the problem is real. Rust favors early checking and wants to have good errors, but sometimes this is a bad tradeoff. In some cases (like numeric code that wants to operate on mixed integer and float types), the trait bounds required become very burdensome, and operating on abstractly generic types gets so inconvenient that the cure is worse than the disease. As a side effect, generic types also lose access to direct disjoint field access, making borrow checking harder. And at some point, errors from Rust's generics even stop being any better than C++ template errors – blanket impls, associated types, type inference, and lack of coercions can conspire to make absolutely inscrutable errors in Rust. Macros have a whole another set tradeoffs, so Rust is missing something in the middle.

  • special cases where lifetimes are too restrictive, such as change of lifetime of an empty Vec. This case can be fixed by adding a library function (there is an existing safe workaround that takes advantage of libstd's obscure implementation details, but it's unreasonable to expect users to know about it). I wonder if there's possibility of having a more general solution, with some typestate for empty objects, or super let-like magic for recycling objects in loops.

15 Likes

I doubt that would have helped here. Their Set_X solution seems very suspicious and nothing that couldn't be done some other way. My first thought is something like a macro, or an enum. Possibly a macro initializer that does type checking at compile time.

Then they should use unsafe. If you have a weird usage pattern that the borrow checker doesn't support, use a bit of unsafe. But make sure you really know what you're doing.

The problem isn't that these issues can't be solved or worked around in Rust, but that the existing solutions are unsatisfactory. These are recurring problems that many users run into in some form.

Self-referential structs are especially painful. The compiler is currently unable to directly detect and explain the problem. Users who aren't already familiar with the issue struggle to understand the errors, and usually end up going in circles trying various lifetime annotations that can't work.

Working around self-references with unsafe is also surprisingly tricky, because seemingly simple solutions break aliasing and/or create unsound APIs. There's a graveyard of unsound self-referential libraries on crates-io. Rust could at least have a proper one in libstd.

10 Likes

Ok, but looking at their preferred implementations, it goes very much against the spirit of Rust. Which is safety first.

I think those are tricky to get right in any non-GC language. It's just that others don't bug you about it. Does postponing going to the doctor, cure you of your ailment? No, it just makes the problem worse.

It would be fine if Rust “bugged you” about it to help you get it right. But, instead, the base rules of the language prohibit it. Rust adds borrow checking, and then contains a case that borrow checking prohibits, not just in a “write this differently” way, but “you cannot use this type here. use a different type. the library doesn't provide one? too bad."

In the simple, common case of a some thing T and one or more shared borrowers using &T, which all stick around together, the only rules actually needed for correct run-time behavior are

  • drop the borrowers before the borrowed
  • don't let the borrowed move
  • don't mutate the borrowed

In a language without a borrow checker, where all pointer usage is unsafe, this would work perfectly fine — in my limited experience of C and C++, programs in those languages do this all the time (except for the 'don't mutate' part), particularly at library API boundaries, without thinking about it much.

Then people translate those APIs to Rust, think “yay zero-overhead safety”, and leave users stuck with problems like sdl2::render::Texture (which very many people have run into "…you mean I have to put the TextureCreator in a variable in main() and cannot abstract this away at all?").

IMO, the inability to express this is a missing piece of Rust's design. It limits abstraction, because it forces certain cases to be expressed as multiple variables in a stack frame, prohibiting them from ever being bundled together in a struct. Given the status quo, advice I give to users is: “because Rust works this way, library authors need to provide genericism or versions of their types that work with Rc/Arc, not just &” — but it would be better if Rust filled in this missing piece, and allowed anyone to wrap up a borrowing type so that it acted like a refcounting one. For example, std::str::Chars is a borrowing iterator with no owning equivalent. You should be able to just write something nearly this short:

struct ArcChars {
    #[pinned_for_self_reference]
    origin_string: Arc<String>,
    iter: std::str::Chars<'self>,
}
impl Iterator for Pin<&mut ArcChars> { /* delegate to self.iter */ }
9 Likes

Do you mean something like .into_iter().map(|_| unreachable!()).collect()? Or if not, what workaround are you alluding to here?

I think everyone is on the same page that we should have a mechanism for self-referential types. The issue is not one of "do we want this", it's "how do we support this"; it requires design.

And there are multiple potential designs, including pinning vs relative pointers. I get the impression the hardest part is teaching the borrow checker itself to understand the self-referential lifetimes.

5 Likes

Maybe “everyone” on T-lang thinks this, but lots of people on Rust forums come up with the position that “no, this is not in the spirit of Rust; if you want self-referential types then your design is bad”. That’s the position I'm writing in opposition to.

Is there any document that says “yes, we want self-referential types”? It would be useful to have such a thing I can cite, like how MCP 475 says “yes, we want build sandboxing”.

8 Likes

Then there's the question how should this be prioritized against all the other things Rust could have, and what the solution could look like.

AFAIK first-class support for self-referential types requires large and complex changes in the borrow checker, and such project is unlikely to even start before switching to a next-gen borrow checker implementation, which is already a difficult project. So a proper solution won't be feasible anytime soon. Maybe partial solutions are possible in the meantime? (built-in pin with pinned fields? super let? OwningRef in std?)

Yes, this one. It's a neat trick, but it's an exceptional case only possible thanks to unstable specialization in libstd. It would be unfair to expect an average Rust user to discover this trick at the bottom of the trait implementations page, under 'Beware of the Leopard' sign, and have trust that it will keep working as intended, especially that this pattern had been considered undesirable in other contexts. But this one is still the smallest of the problems, with a relatively easy solution.

First off, it wasn't really talking about borrow checking. It was more about their examples not really lining up or being idiomatic. Why use trait when you want an initializer? What happens when you have V3 which changes other fields or whatnot? What happens to process_data(&buffer) when you do buffer.clear()?

But to get to your point - By adding and using borrow checking, some possibly safe programs will be marked as unsafe. That's expected.

The answer isn't to give up and go to C++, but to drop to unsafe and try to contain it in a safe wrapper. Then, when those things are proven sound and useful, adopt them in the std.

From what I've seen, there is a limitation with the borrow checker and tree-style flow. I.e., it handles graphs badly, and there isn't a known solution to self-referential structs. The best possible solution is to give every element in an arena the same lifetime and let them reference each other. Or use Pin *shudder*.

Having a way to express safe, self-referential types would be great, but the biggest hurdle is proving it is possible.

I think this is in part the IRLO vs URLO split. On the former, there's a desire to find a way to do it eventually. On the latter, it's perfectly reasonable to say that such a design is just going to cause you misery currently and thus it's a "bad" design.

7 Likes

I think people overreact. They have probably come across "object soup" code in other languages. Which is a real problem, and Rust naturally discourages. I have worked in a C++ code base filled with wild shared_ptrs pointing at other objects. Just understanding what was going on could be difficult at times.

On the other hand there are many self-referential patterns that are fine and easy to understand. For example those that yoke, oroborus, etc allows. I commonly reach for the latter crate when I have some "raw" data and a deserialized / zero copied / indexed view of it, and I want to carry those together in one object.

Another case is general graphs. Yes you can do the approach where you have indices into vectors of nodes/edges instead. It is annoying, and you are basically rolling your own bounds checked relative pointers without the borrow checker to help you.

There are likely many more reasonable use cases for self referential types than those, but those are the ones I have mostly run into so far in Rust.

But I also think discouraging object soup is a good idea, so perhaps it should always be an extra attribute or something so that Rust makes you stop for a few seconds and reflect on if you really want self references.

With &'static mut T, which I often use, you can very often run into instances where safe self-referential types that compile. This is because 'static is not creating connections, thus borrow checker will not prevent self references. I find it very useful from time to time

I've got some thoughts about making something like

fn foo<T: _>(t: T) {
    println!("{t}");
}

expressible for crate-local items, and I would like to prototype an implementation at some point.

That case is an issue because of the interaction with lifetimes themselves. I would say that the correct answer today is indeed a little bit of unsafe, but the long term answer is that recycle_vec belongs in the std (although I'd call it something like v.clear_and_take()), which has the nice side benefit of also avoiding the potential foot-gun of forgetting to .clear() the Vec.

In the same spirit as some other replies, I would argue that the response is more "Rust doesn't allow this, and it won't allow it anytime soon, so rethink your approach", and that is perfectly valid. Replying to someone in a forum that "this might be a feature in the future" is tantamount to telling them to go pound sand.

2 Likes

It is good for people to learn how to design programs that work smoothly within the language as it is, but what I am taking issue with is the conflation of “Rust doesn't let you do this” with “it is good that Rust doesn’t let you do this”, especially in cases where it is actually “Rust doesn't let you do this (safely, yet)”.

1 Like

Regarding lack of duck-typed generics: I really like the macro fn idea, but it seems have gone no where.

Meta: TBH it feels more like a manpower (read "funds") problem now. I'm getting a sense that, there's just not enough people to work on rustc. There are ideas here and there getting discussed, but it's mostly from regular users.

6 Likes

For me the key issue is, how do you know you can implement it safely? And more importantly, how do you prove it?

My biggest fear this an undecidable problem, which means that even if safe such patterns will never pass the borrow checker. So I'm making my peace with it.

For me the key issue is, how do you know you can implement it safely?

Existence proof: async generated futures are self-referential any time they contain one let or temporary that borrows another:

async { 
    let x = String::new();
    let y = &x;
    f().await;
    println!("{y}");
}

is a self-referential value (after it is polled at least once). Therefore, some cases of self-reference in safe Rust code can be sound, without needing to know what the specific involved types are. The problem is not starting to have self-reference in the language at all, the problem is generalizing existing capabilities to something more powerful than the Future interface, which doesn't let you interact with the borrowing value except by

  • polling the future until it produces a single output value
  • communicating with the future by interior mutability (channels, shared mutexes)
  • communicating with the future by sneaking data through the Waker

Now, it’s possible that generalizing past Future will always lead to soundness issues; I just doubt that's true, because it would mean that async just happened to introduce exactly the amount of self-reference Rust can ever support, despite that not being its goal.

My biggest fear this an undecidable problem, which means that even if safe such patterns will never pass the borrow checker.

Undecidability does not mean that programs cannot be checked at all. It means that some programs that would not execute UB will be rejected by the borrow checker. This is already the case; we live with it every time we write a Rust program. New features such as self-reference (or NLL, a previously-added feature) expand the set of accepted programs; undecidability (or rather Rice's theorem) only tells us that we can can never expand that set to “all programs except ones with memory bugs”.

8 Likes

In my mind the minimal solution for self-referential borrowing is a library type that looks something like this, I'll call it Backed here.

struct Backed<O, B> {
    owned: O,
    backed: &'static B,
}

impl<B, O: StableDeref> Backed<O, B> {
    fn try_new<E>(owned: O, process: impl Fn(&O::Target) -> Result<&B, E>) -> Result<Self, E> {
        let backed = process(owned.deref())?;
        Ok(Self {
            backed: unsafe { transmute::<&B, &'static B>(backed) },
            owned,
        })
    }
}

impl<O, B> Deref for Backed<O, B> {
    type Target = B;
    fn deref(&self) -> &Self::Target {
        &*self.backed
    }
}

Clearly this is not the end-all be-all, but I think it provides a good start to some of the common issues, and I think you can probably build a cascade of Backeds to solve the general case?

In my mind, the OP was not very convincing and mostly told me that the authors did not have very experienced Rust folks around to help them.