Improving self-referential structs


#21

Sure. The latest work is happening here. Still under heavy development, but the core functionality is in and working.


#22

Bleh, spoke too soon I suppose. I appear to have slammed hard into the HRTB-associated-item-unification brick wall against which even transmute is powerless. Unless I can find some clever way to coax the compiler into unifying the types, there’s not much else I can do for now. I’ll probably just park this until the trait system refactor. To be fair though, I’m silently relying on drop order right now anyway, so I didn’t want to release until that’s specified either, so it’s not that big of a deal I suppose.


#23

Ok, after a few more days of banging on this, I’m going to fall back to extracting the type information I need syntactically instead of relying on an associated type. This is a terrible hack, but at least the current associated type machinery in place allows me to ensure that the syntactically extracted type is correct. This means incorrect uses won’t compile, but some correct uses won’t compile either. That’s an acceptable compromise for now, though, and I’ll lift it as soon as the compiler allows me to.


#24

I’ve reached what I’m confident in calling the 0.4 milestone. It should now be in a much more stable state for examination. More changes will occur as enabling language features land, but the bulk of the work is done, and further features will be relatively minor changes.


#25

I just wanted to chime in here that this is a major problem for me. I’ve run into multiple cases where I have a struct that should have ownership of an object and things derived from that object, but I can’t do it without workarounds or contorting my API to introduce extra layers, which increases the complexity to users of that API.


#26

I dwelled a bit on the idea of existential lifetimes and I think it’s actually feasible and (surprisingly) safe. Unsafety should only be necessary if you try to store a short-lived reference inside a long-lived existential object − that requires a transmute.

I’m not totally sure whether this holds water, but it made sense in my head when I wrote it. I would appreciate any comments and reviews!


#27

Can’t we introduce the following syntax (or something alike)?

struct Foo {
    // this field references other owned field and should be dropped before it
    a: &'a [u8],
    // 'a is "defined" here and tied to owned object stored in this field
    b: 'a Vec<u8>,
}

This way we define ordered acyclic graph of lifeteimes which generalizes notion of portable stack-frame mentioned by @jpernst .


#28

Can we introduce the following syntax (or something alike)?

struct Foo { // this field references other owned field and should be dropped before it a: &'a [u8], // 'a is “defined” here and tied to owned object stored in this field b: 'a Vec, }

This way we define ordered acyclic graph of lifeteimes which generalizes notion of portable stack-frame mentioned by @jpernst .

Sorry, I haven’t followed this discussion, but seeing this syntax my first thought was: why not using the field name as a lifetime?

struct Foo {
    b: Vec<u8>,
}

#29

Rental takes the approach of just using the field name as the lifetime name, and I think it’s been working quite well so far. My hope is that an eventual built-in language feature will work similarly.


#30

This thread is another good case in point for this, and the conclusion is exactly the kind of thing I was afraid would start happening in the ecosystem. Nothing really new about this one, but I just think it’s important to maintain awareness that this is an ongoing issue with practical implications.


#31

This is once again a blocking problem for me. Literally spent hours today solely because of this.

This is making me feel very conflicted for using Rust for stuff. This seems to create problems a lot, especially where the desirable pattern is to consume / take ownership of an object. Since self-referential structs are not directly possible in Rust, this implicitly forces me to pull in Rental and use special syntax to work with such structs. I’m legit not sure how more people aren’t running into this (or maybe they are and they just never get to this ticket because they give up before then).

If there is an intended alternative or people simply don’t believe it’s an issue, please let me know. At the point that I have to reference count everything anyway, then I start to wonder if it’s really something I should be trying to write in Rust or if I’m trying to bend the language in a way that it wasn’t meant to go.

Thanks.


#32

Can you be more specific about your use case? People do run into this, which is why rental exists at all, but it is relatively uncommon.

My suspicion, from personal experience, is that most problems can be solved a slightly different way that doesn’t involve self-reference.


#33

Suffice it to say I disagree that a small adjustment is all it usually takes to fix the issue. The core problem is that rust lifetimes are immune to encapsulation. Once a struct exposes a lifetime param, any structs that contain it will also be infected with that param. There’s no way (other than something like rental) to just encapsulate such a relationship as an implementation detail and wrap it in an opaque object that the user doesn’t need to worry about.

The cleanest way out of this situation is to just not use lifetime params in lieu of something like Rc or Arc. This is fine if you control the full vertical stack of your app, but that’s not necessarily true. That thread I just linked above gives an example of someone willing to fork their dependency to resolve this issue. This talk at RustFest Zürich also makes a solid case for not using lifetimes when your objects are “long lived”. I completely agree with his conclusion, but the problem is there’s no way to clearly decide if a type is truly “long lived” or not.

The library author has to guess what scenarios their crate is likely to be used in and how long people are likely to hold onto the structs, but it’s just that, a guess. If i want to play it safe and allow my library to be fully general I have to use Rc/Arc, since using those don’t preclude any particular use case, whereas using a lifetime relationship does. Even in cases where a lifetime relationship seems clearly the correct choice, it still prevents encapsulation.

I don’t think it’s any kind of anti-pattern to want to encapsulate an ownership relationship, and making this possible will allow crate authors to use lifetimes in cases where they seem natural, without concern for what impact that will have on downstream’s ability to build abstractions. Rental is the best I could manage with the current language, but it’s still painfully awkward and unergonomic enough that forking dependencies still feels like a better choice in some cases.


#34

I’m not disputing any of that, other than the frequency with which I run into it- I just wanted to hear more about @spease’s particular use case. “Encapsulating lifetimes” is so general that it’s essentially just restating the problem.

But to give my equally-general solution, I prefer to expose the owner and the reference separately as far up the stack as possible. When this works out (apparently more often for me than for you) it is far more flexible than either Rc/Arc or self-reference, both of which preclude some use cases.


#35

Sure. This is how the struct stands as it is now

rental! {
  pub mod rent_manager {
    use libudev;

    #[rental]
    pub struct Manager {
      context: Box<libudev::Context>,
      enumerator: Box<libudev::Enumerator<'context>>,
      monitor_socket: libudev::MonitorSocket<'context>,
    }
  }
}

New looks like this:

fn new() -> Result<Self> {
  rent_manager::Manager::try_new(
    Box::new(libudev::Context::new()?),
    |context| {
      let mut enumerator = libudev::Enumerator::new(&context)?;
      enumerator.match_subsystem(Self::UDEV_SUBSYSTEM)?;
      enumerator.match_property("DEVTYPE", Self::UDEV_DEVTYPE)?;
      Ok(Box::new(enumerator))
    },
    |_, context| {
      let mut monitor = libudev::Monitor::new(context)?;
      monitor.match_subsystem_devtype(Self::UDEV_SUBSYSTEM, Self::UDEV_DEVTYPE)?;
      Ok(monitor.listen()?)
    }
  ).map_err(|e: rental::TryNewError<Error,_>| e.into())
}

The general idea here is a simple application that automatically adds new wireless interfaces to wpa_supplicant when they’re plugged into my laptop.

The problem that’s complicating all of this is that the udev library requiring a context.

I’ve run into this more often than not for Rust projects. It’s very common for a hardware API to require the creation of a context as the first step, which will provide enumeration of the devices, then the devices will provide objects. Similarly, network APIs will often go something like socket -> connection -> session -> objects. Since Rust’s pattern is generally for a connection to take ownership of a socket, you immediately get hit with a self-referential struct problem. The alternative is to push the problem away by just taking a reference, but this then forces the problem onto the user of the library, who has to figure out how to keep a variable around that owns the item that the reference is to.

This has also come up with parsing and iterators. In parsing, you might be parsing some text data that is going to get split up into string slices that serde can do zero-copy deserialization with, but then you need to keep around the original text data. In iteration, you might want to take ownership of some data and retain a reference or slice back to the data.

I suspect you could force these to work out somehow with Rc and unsafe, but this is much less intuitive (and to varying degrees less performant).

EDIT: Also note that the program as shown above cannot compile, because I can’t access the middle variable with rental. Presumably I’d need to refactor such that the last variable is a struct or tuple which contains all the items which depend upon the context, which I haven’t gotten around to trying yet.

With proper self-referential structs, I’d expect that the above code could be refactored as follows:

pub struct Manager {
  context: libudev::Context,
  enumerator: libudev::Enumerator<'context>,
  monitor_socket: libudev::MonitorSocket<'context>,
}

fn new() -> Result<Self> {
  let mut manager = Manager {
    context: libudev::Context::new()?),
    enumerator: libudev::Enumerator::new(&context)?,
    monitor: {
      let mut monitor = libudev::Monitor::new(context)?;
      monitor.match_subsystem_devtype(Self::UDEV_SUBSYSTEM, Self::UDEV_DEVTYPE)?;
      monitor.listen()?
    }
  }

  enumerator.match_subsystem(Self::UDEV_SUBSYSTEM)?;
  enumerator.match_property("DEVTYPE", Self::UDEV_DEVTYPE)?;

  Ok(manager)
}

#36

Yes, it would be nice to have this in the language.

I think that with immovable types and the placement box work, it should be possible to make @spease 's syntax just work as long as it’s constructing a box (I think we only need to add an annotation on context specifying whether it’s to be assumed to be borrowed mutably or not, and probably add some struct qualification for the fields).

Basically, start constructing the first fields in the placement place, let the initialization of subsequent fields borrow them, and then end up with a Box/Rc/Arc/etc. containing an immovable type.

If a panic (or return/break) happens, have the compiler inject custom code to drop just the fields that have already been constructed.

It should also be possible to allow assignment after construction of & borrows of not-&mut-borrowed fields (to support e.g. a Vec of & references into parts of another field of the structs).

It would be something like this:

pub struct Manager { // immovable type due to the "ref" field
  ref /* could have "mut" here */ context: libudev::Context,
  enumerator: libudev::Enumerator<'context>,
  monitor_socket: libudev::MonitorSocket<'context>,
}

impl Manager {
  fn new() -> Result<Box<Self>> {
    let mut manager = box Manager {
      context: libudev::Context::new()?,
      enumerator: libudev::Enumerator::new(&struct.context)?,
      monitor: {
        let mut monitor = libudev::Monitor::new(&struct.context)?;
        monitor.match_subsystem_devtype(Self::UDEV_SUBSYSTEM, Self::UDEV_DEVTYPE)?;
        monitor.listen()?
      }
    };

    manager.enumerator.match_subsystem(Self::UDEV_SUBSYSTEM)?;
    manager.enumerator.match_property("DEVTYPE", Self::UDEV_DEVTYPE)?;

    Ok(manager)
  }
}

It might be possible to make the type movable by storing pointers as either offsets or pointers depending on whether they point inside the struct, but I don’t think it should be the default mode anyway since it changes FFI, requires at least one additional branch per pointer, struct nesting has additional complications, you can’t borrow the fields since they have a non-standard pointer representation, etc.


#37

Unfortunately, immovable types doesn’t help as much as it seems like it should. See this thread on the subreddit for a bit of discussion on why. The short version is that unless the field lifetimes can be tied to the exact struct instance they came from, it’s very easy to create unsound programs. If rust were to also get generative existential lifetimes, though, then we’d be in business and a solution should be possible.


#38

In general, “field lifetimes” need to be transformed to the lifetime of the borrow of the struct that the field is accessed with, exactly like rental does (with its fn<'a> get_field() -> &'a T accessors).

However, doing it in the language means that rather than transforming them right away, they can be represented as a sort of “dependent lifetimes”, which depend on the struct variable, which would allow to for instance have a Vec of self-borrowed references in the structure and add borrows to it at runtime, which would be allowed since they are being assigned to the struct matching the variable their lifetime depends on.

But in a first implementation the less powerful rental approach can be used.


#39

I’m almost utterly ignorant of the compiler internals. In terms of full-time days, roughly how much development is this likely to involve?


#40

I’m not sure what you’re referring to about the rental accessors, since rental provides no direct field accessors at all, precisely because doing so would be unsafe. Converting a self-ref lifetime to the borrow lifetime is only safe for shared borrows with variant lifetimes. There’s no way to detect that in a proc-macro, so it assumes any lifetimes are invariant and allows them to be accessed only through HRTB closures to prevent the self-ref lifetimes from unifying with anything.

I agree though that language support would be able to know when the lifetimes are variant or not, which would have huge ergonomic gains. Invariant lifetimes could still be handled with generative existentials. In the end, any language support will be vastly superior to anything rental can do as a mere crate, and I look forward to the day when I can deprecate it completely.