Why does Borrow have to return a reference?

I’m writing a CBOR library with two main data types: Cbor<'a> is a wrapper around a &'a [u8] while CborOwned is a wrapper around SmallVec<[u8; 16]>. The wrappers assert that the wrapped bytes are actually valid CBOR.

It would be nice if I could impl<'a> ToOwned for Cbor<'a> but I can’t, and the reason seems incidental — the required impl Borrow for CborOwned can’t be written because I can’t return a reference to a local value (CborOwned doesn’t contain a Cbor that I could borrow).

Now before I embark on the journey of making a library with an improved ToOwned/Borrow combo, perhaps there are reasons for the choice in borrow()’s signature, which is why I ask here. Thanks in advance for any answers!

I think this is a type system limitation, or at least it was. If it were possible to allow more general but equivalent borrowing situations in the trait's method signature, it would have been written more generically.

I your case, is Cbor<'a> just a &'a [u8]? In that case it might be possible to consider &'a Cbor instead with a repr(transparent) definition of struct Cbor([u8]).

Hmm, good points, that makes sense. It looks like higher-kinded associated types would be needed, indeed. I’ll try out the Cbor definition you suggest, that might work well enough in the meantime.

Thanks a lot!

This has always felt like a sort of "safe transmute" case that should be allowed but presently isn't.

If you have a #[repr(transparent)] newtype wrapper, it feels like methods of that wrapper should be able to transmute/cast &Inner to &NewType. They would be responsible for ensuring that &Inner follows &NewType's invariants.

Otherwise, this sort of thing can be done today using unsafe. I believe that's how the standard library implements e.g. Path vs PathBuf

This sounds like something unsafe is meant to call out, so keeping it as unsafe still makes sense to me.

But those invariants don't necessarily have anything to do with memory safety.

These use cases tend to occur in pairs of owned/borrowed types like MyTypeBuf and MyTypeRef, ala PathBuf and Path in the standard library.

Both types effectively maintain the same invariants, it's just the reference type is the Borrowed form of the owned type.

bytemuck adds the required derive and conversions for transparent wrappers, so it's at least out there in the ecosystem.

There’s also TransparentNewtype in core_extensions::transparent_newtype - Rust, but these wouldn’t fit: they permit to turn an Inner into Self indiscriminately, which allows easy violation of the invariants. Right now it seems like the only way is to use unsafe in the Borrow implementation to transmute the reference.

The compiler doesn't know that and needs to be convinced. That's what unsafe is for. Even so, non-memory safety errors can turn into memory safety errors later. Take NonZeroI8 for instance. There's nothing (AFAIK) "memory unsafe" about actually stuffing a 0 into one via mem::transmute, but, due to other guarantees, if an Option gets constructed as Some(actually_0), it is now a None which is definitely in UB territory.

Oh that's a good point. That's a whole usecase that bytemuck could support, but doesn't.

Can you describe specifically where given the following:

#[repr(transparent)]
struct MyNewType<'a>(&'a [u8]);

...that a method like the following could cause a memory safety error:

impl<'a> struct MyNewType<'a> {
    pub fn new_ref(bytes: &'a [u8]) -> Result<&'a MyNewType, Error> {
        /// ...enforce invariants and return errors here...

        /// ...assuming that `bytes` is valid for our domain invariant
        Ok(bytes as &'a MyNewType)
    }
}

This obviously isn't a valid "safe transmute" today, but AFAICT it could be with the addition of a new compiler feature.

1 Like

In this specific instance? No, I don't think so (because MyNewType doesn't have any documented additional requirements; I'd expect it to have MyNewType::new() -> Result<Self, Error> and MyNewType::new_unchecked() -> Self /* may panic */ constructors with any requirements).

I think it needs to be shown for any such repr(transparent) structure (or the compiler somehow knows what invariants are required for the wrapper type). NonZero* is one such structure where such an Ok(i as &'a NonZeroI8) is not inherently safe and (IMO) must use an unsafe {} block.

I think it needs to be shown for any such repr(transparent) structure (or the compiler somehow knows what invariants are required for the wrapper type). NonZero* is one such structure where such an Ok(i as &'a NonZeroI8) is not inherently safe and (IMO) must use an unsafe {} block.

To the extent that the NonZero* types do have some compiler magic associated with them I would argue the compiler does know or at least participate in their invariants.

Sure, but that's not to say that there aren't pure library code doing bit stuffing in over-aligned pointers or something like that which would also require unsafe {} here. I think any RFC to add repr(transparent) safe mutation would need to consider the rules in which it is actually allowed to be used without unsafe to be quite strict (loosening them up is easier than taking mistakes away).

You've definitely done something wrong in that example; MyNewType here contains &[u8], and you're converting &[u8] to &MyNewType.

It generally sounds like you're asking for ref-cast.

Ref casts can't be safe in general, because newtypes introduce arbitrary safety invariants.

The example I use is Utf8Char([u8]). As a safety invariant, you can only construct &Utf8Char that point to a valid code unit sequence that encode a single codepoint.

It's not sound to convert an arbitrary &[u8] to &Utf8Char, because that violates the safety requirements of Utf8Char, and safe APIs might now cause UB. (You could also represent Utf8Char as Utf8Char(str), but [u8] communicates the point more clearly.)

str is still a primitive, but barely so. str being well formed UTF-8 is now a safety invariant, not a validity invariant, so it could be defined as just str([u8]). Path is just Path(OsStr).

The ability to add library level invariants is key to Rust's safety model. This holds just the same for types which have guaranteed #[repr(transparent)] layout and are ref cast, so long as the ref cast is private.

What could potentially be made available is a ref cast if and only if you have visibility of the sole field. This would allow privacy to handle the safety barrier, as with other types. That, I would be for having a safe way to do.

I specifically said:

To be very clear, I am not suggesting that it be possible anywhere else, because, as I specifically highlighted myself one of the main reasons for having newtypes in the first place is to enforce invariants on the inner data.

I think we may be on the same page?

There's a validity invariant, so it's insta-UB to do that.

Per MIRI:

error: Undefined Behavior: type validation failed: encountered 0, but expected something greater or equal to 1
 --> src/main.rs:3:9
  |
3 |         std::mem::transmute::<u32, std::num::NonZeroU32>(0);
  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered 0, but expected something greater or equal to 1
  |
  = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
  = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=acb802d0234dc731d37a98aedf44ff2d

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.