[Pre-RFC] Constructing transparent wrapper types

HeroicKatora · October 24, 2020, 8:12pm

This is an update with feedback from [Pre-RFC] Patterns allowing transparent wrapper types. Thanks for all the input. The motivation is the same so you can skip it if you've read the original.

Summary

Provide a way to construct transparent wrappers from a reference to the wrapped, non-zero size field.

Motivation

Custom dynamically sized types (wrapping one of the native DSTs) are in an unfortunate place right now. It is almost entirely impossible to create them within a safe context and since they exist only (maybe soon mostly) behind references, #[repr(transparent)] is also mostly useless for this job. Custom newtype wrappers around usual types are also not optimal when interacting with code that instead targets the underlying type.

However, being able to have new invariants on (unsized) types without changing representation has many real advantages:

str can be regarded as such. While an internal type for the moment (and likely longer), a custom type in its likeness for other encodings can be useful.
Network programming and other forms of communication rarely deal with fixed size structures but have highly predictable content and internal invariants.
[impl Ord] that actually maintains order in the slice.
Avoids one level of indirection and a lifetime. Current code many times provides wrappers some with generic impl AsRef<[T]> instead of encapsulating the actual memory region.
Ascribing additional meaning to raw data:
```
struct RGB([u8; 3]);
```

Calling a &self method on a type wrapping a reference to unsized data suddenly has two lifetimes to care about. The one of the container struct which is just struct _ { inner: C, } and the actually relevant one of the memory. This makes it unecessarily unergonomic to store a borrowed result from such a type.

It also ties two separate concepts strongly together. The struct for encapsulating an inner invariant is many times also in charge of owning that data. This is however unecessary, especially but not only if the data need only be accessed immutably but at that point library authors opt to introduce illusive bounds of C: AsRef<[T]> + AsMut<[T]> instead. This creates new problems if they actually do care about the memory location since those two methods need not return the 'same' slices at all times. And note how the AsMut<[T]> bound always conditionally pops up despite the receiving method already declaring itself to be &mut self.

A much more preferable solution would be to be able to provide a native, standard internal type on which such guarantees can be made and which will also produce the desired (co-)variance–&'_ T. Note that the standard library already follows this pattern! Vec<T> is the owning abstraction while [T] is the representational. And there are plenty other owners of [T] suggesting that indeed these concepts should be addressed in different levels of abstraction. Similar follows for String and str where we have the additional accommodation of being enabled to convert between [u8] and str due to internal, unsafe magics. (Pretty literally currently since they are lang items. Some code casts pointers, some does union access, it's a bit all over the place).

The type of a value can change through coercion or an explicit constructor. The type itself can only be changed within an expression and can not be changed when matching a reference although matching by value may instead produce multiple values of other types. This presents a problem for dynamically sized types which are exclusively visible behind references (and pointers). Creating a reference to a custom DST thus always involves unsafe code for

Converting the original reference to a pointer.
Casting a pointer to the own DST
unsafe: Dereferencing that pointer and reborrowing it

This is error prone since the lifetime information is lost; and the involved types may change without compilation errors introducing misalignment or size mismatches. Using a transmute of the reference directly can have even larger impact without failing to compile, if any of the involved types changes. Not requiring unsafe at this point would also encourage using it more sparingly, improve code quality, and allow more #[deny(unsafe)] usage. Note that the reverse, converting to a reference of a contained fied, is easy since it is a basic field access. The layout of value denoted by the inner field is afterall ensured already through the usual means, and so is the reference creation to it (disregarding#[packed]).

Guide-level explanation

Usually types can be constructed and destructed via their struct syntax:

struct Foo { bar: usize }

let foo = Foo { bar: 0 };
let Foo { bar, } = foo;

This does not usually work for unsized types since fields are values and values need to be Sized. But when we immediately reference the value and the constructed type is a transparent wrapper then of course no such value need to be moved and constructed at all, it only borrows the original value wrapped in a new type. Thus, in a context where the constructed value is immediately borrowed we add a marker to initialize the field from a place expression without moving the value from it. This marker is applying the repr(transparent) attribute to the constructor expression.

#[repr(transparent)]
struct ascii { inner: [u8] };

let byte_slice: &[u8] = ...;
let val: &ascii = &ascii {
    #![repr(transparent)]
    inner: *byte_slice
};

Reference-level explanation

A struct-expression annotated with the repr(transparent) attribute must occur within a reference expression and its type itself must also be annotated with the same attribute. Its single non-zero sized field is initialized by a place expression instead of a value expression. This has the effect of re-borrowing the place of the field with a pointer cast.

For readability purposes the author suggests using an inner attribute instead.

#[repr(transparent)]
struct ascii([u8]);

let byte_slice: &[u8] = ...;
let val: &ascii = &ascii(#![repr(transparent)] *byte_slice);
// Equivalent:
let val: &ascii = &#[repr(transparent)] ascii(*byte_slice);

The reference expression around the struct expression determines the kind of borrowing that takes place.

#[repr(transparent)]
struct asii([u8]);

let byte_slice: &mut [u8] = ...;
let val: &mut ascii = &mut ascii(#![repr(transparent)] *byte_slice);
let _ = byte_slice[0]; 
//      ^ ERROR: cannot use `byte_slice[_]` because it was mutably borrowed
// some later use of `val`.

let byte_slice: &[u8] = ...;
let val: &mut ascii = &mut ascii(#![repr(transparent)] *byte_slice);
//  ERROR: can not borrow data in a `&_` reference mutably.

The field does not need to be a dynamically sized type. For example:

#[repr(transparent)]
struct AsciiChar(u8);

impl AsciiChar {
    fn as_ascii_char(ch: &u8) -> Option<&AsciiChar> {
        match ch {
            ch @ 0..=127 => Some(&AsciiChar(#![repr(transparent)] *ch)),
            _ => None,
        }
    }
}

A transparent type can contain any number of 1-aligned, zero-sized fields. These must also be initialized by value. For example, here we have a wrapper around a slice that enforce that elements are sorted according to some (statically known) comparison function.

trait Order<T> {
   const PARTIAL_CMP: fn(&T, &T) -> Option<Ordering>;
}

struct Sorted<T, U: Order<T>> {
  elements: [T],
  sorter: PhantomData<U>,
}

impl<T, U: Order<T>> Sorted<T, U> {
  pub fn new(slice: &[T]) -> Option<&Self> {
    if slice.iter().is_sorted_by(U::PARTIAL_CMP) {
      Some(&Sorted {
        #![repr(transparent)]
        elements: *slice,
        sorter: PhantomData,
      })
    } else {
      None
    }
  }
}

In MIR the final move in the initialization of zero-sized fields need not be translated. The expression computing the value has to be carried out of course since it might check or rely on a safety invariant of the type that may come from an unrelated module. But then value can then be ignored. Instead this will be the same as the current pointer cast.

#[repr(transparent)]
struct ascii([u8]);

let byte_slice: &mut [u8] = ...;
let val: &mut ascii = &mut ascii(#![repr(transparent)] *byte_slice);
// Semantically equivalent unsafe solution:
let val: &mut ascii = unsafe { &mut *(byte_slice as *mut _ as *mut ascii) };

However, we still require an unsafe block when the borrow of place expression itself would require one. That also improves the safety of that initialization since changing the input type from &[u8] to *const [u8] will error until an explicit unsafe block is put in place. In that regard, a more faithful translation of the expression might be:

let byte_ptr: *mut [u8] = ...;
let val: &mut ascii = &mut ascii(#![repr(transparent)] *byte_ptr);

let val: &mut ascii = {
    let _ = &mut *byte_ptr; // Safety check.
// ERROR: dereference of raw pointer is unsafe and requires unsafe function or block
    unsafe { &mut *(byte_ptr as *mut _ as *mut ascii) }
};

// Fine:
let val: &mut ascii = unsafe { &mut ascii(#![repr(transparent)] *byte_ptr) };

Drawbacks

It uses an attribute to manipulate the meaning of a particular syntax which complicates handling of particular expressions.

Rationale and alternatives

We could also imagine a syntax where a field initializer that is a place expression is treated separately and its semantics depend on the context of the struct expression.

let byte_slice: &[u8] = ...;
let val: &ascii = &ascii(*byte_slice);
// OR
let val: &ascii = &*ascii(byte_slice);

However this comes with additional compatibility risks. In cases where the wrapped type is not a dynamically sized type this is already permitted syntax that compiles when the type is Copy. Permitting this would impact readability as the lifetime and borrows of the reference differ. The currently allowed case introduces a temporary and references it while the new functionality would borrow the field. The extra verbosity to distinguish these two cases seem fine in particular since it's not expected to appear very often, mainly in a small amount of wrapper constructors.

Another alternative was discussed in a thread on internals using as and patterns instead. However, this is a poor match as it extends patterns by quite a bit while wanting to express an expression. As such it composes poorly and interacts with privacy in new ways.

#[repr(transparent)]
struct ascii([u8]);

let byte_slice: &[u8] = ...;
let (val as ascii(val)): &ascii = byte_slice;

One variation here is to use a simple cast but this does not support ZST fields very well.

let val: &ascii = byte_slice as &asiii;

Another alternative is to use safe transmutes.

let foo: &ascii = transmute!(byte_slice)

However, note that this is only safe for the caller. The declaration of this transmutation must be unsafe. Additionally, such a transmutation is a trait and as such can be used by all crates without regard to privacy. The new type can not introduce any additional invariants as its constructor can be bypassed.

Prior art

C permits casting of types that have ‘compatible layout’ but the compiler undertakes no effort of validating it, silently permitting undefined behaviour to occur. There is also no concept of privacy for the purpose of safety invariants.

Unresolved questions

Should a slice of the wrapper type also be constructible from a slice of the wrapped type? This seems useful in avoiding duplicating every definition for a slice variant. But for unsized original types or other use cases this is not even be applicable so that further syntactical or semantical complications would be suspect.

Future possibilities

This does not address all 'simple' wrappers. For example, a wrapper that requires its inner field to be aligned more strictly can not be annotated repr(transparent) as it already uses repr(align). It would not be safe to construct it but it could be seen as a pendant to repr(packed). The latter makes accessing a field by reference unsafe while the former makes constructing from a reference unsafe. Maybe this should be permitted with a similar syntax.

atagunov · October 24, 2020, 10:10pm

To me &* feels very confusing after C/Java..

Wouldn't spelling out the full type in `as` coercion work better?
It's just a reference re-interpretation...

slice as &Sorted<T, U>

P.S. perhaps type ascription would work just as good as coercion: slice : &Sorted<T, U>

Update: seems like creating &Sorted<T, !> isn't that good for soundness..

scottmcm · October 24, 2020, 10:34pm

I think safe transmute will be able to do this, right?

It seems like let foo: &Wrapper = transmute!(slice_ref); would be fine since Wrapper being repr(transparent) would be enough to make it defined...

HeroicKatora · October 24, 2020, 10:35pm

Safe transmute is safe for the consumer, not the definition. In particular the goal is that a crate defining such a wrapper can declare #[forbid(unsafe)] which it could not with safe transmute. I'll add it to the alternatives with that reasoning.

HeroicKatora · October 24, 2020, 10:42pm

In addition, this makes it possible for everyone to construct the wrapper from the wrapped type since there is no notion of a private trait impl. That is, the wrapper can not introduce any safety invariants as its constructors can be bypassed. In particular the Ordered wrapper or ascii couldn't rely on their content being valid if they declared such a binary compatibility.

atagunov · October 25, 2020, 10:51am

Actually postfix type ascription/coercion notation can as easily to "reinterpret"/move Box-es:

let bu : Box<[u8]> = ...;
let b = bu as Box<Sorted<[u8], U>>; // re-interpret and move the box

Update: as I have been made aware the suggested rules would allow one to instantiate types such as Box<Sorted<[u8], !>. I guess that's not good..

comex · October 26, 2020, 8:24pm

@HeroicKatora

The current safe-transmute RFC both works without any unsafe at the definition site, and adds what amounts to a mechanism for private trait impls (Here!()) in order to solve exactly that problem.

However, Here!() has come under heavy criticism and will probably be removed, at least for now.

Still, your use case could be seen as motivation to add it back or add something similar in the future. In general, this RFC does feel to me like a special case of safe transmute, rather than something that needs its own dedicated syntax.

atagunov · October 26, 2020, 9:58pm

I'd say this RFC is safe transmute + privacy: say we wanted to write our own &str
If only our module could transmute &[u8] into our &str we'd be able to verify it's valid UTF-8 first

HeroicKatora · October 26, 2020, 10:02pm

I did not know how far that RFC was expanded (wasn't it intended to be somewhat minimal initially?) Can you demonstrate it? I want to achieve both privacy in this transmute, such that I can check invariants, and safety in that a third-party crate supplies a non-trival ZST. If I read the discussion correctly then even the powerful, critized module needs to be placed in the private module of the ZST type which might be third party. As a point of view transmute is about trivial constructibility foremost. If you want, this is the opposite which focusses only on trival layout conversion—that already is arguably special case by virtue of having a dedicate repr attribute—without generality of traits but with full privacy considerations.

comex · October 26, 2020, 11:01pm

Well… how does your proposal address this? It says the ZSTs "must also be initialized by value" but the value expression is not actually evaluated ("need not be translated")? That would be unsafe to do for arbitrary expressions, as if a third-party module exposes, e.g., pub fn get_zst() -> Zst, you shouldn't be able to pull Zsts out of thin air without actually calling get_zst().

In your example, the expression is just a constructor, so it's safe – but that also implies it would work with safe-transmute (at least the version with the criticized feature).

I suppose you could safely extend this to arbitrary expressions producing ZSTs if you:

changed the semantics to evaluate the expression (once) rather than ignoring it, and
required the ZST to be Copy, since by inferring the ZST to exist in an arbitrarily-sized slice, you're effectively creating an arbitrary number of copies of it.

With that extension, it would exceed the capabilities of what the safe transmute RFC can currently do. But I'm kind of confused what the actual use case for this is. Do you have a concrete example of why you'd want to add a third-party, non-trivially-constructible ZST in this manner?

HeroicKatora · October 27, 2020, 12:13am

Oh no, I guess I should have been more specific and that is likely the source of the misunderstanding. The initialization expression should be translated and executed as a usual expression. That is, in your example if get_zst panics because the Zst is not inhabited then the initialization should panic as well. It's not intended to produce things from thin-air. Only the final step of moving the value into the newly created struct can be elided in translation as it would move+write+borrow an existing ZST value.

comex · October 27, 2020, 1:03am

Okay. And, ugh, disregard everything I said about Copy. Sorry. I thought the ZST was going into the slice element type.

atagunov · October 27, 2020, 12:05pm

As noted in private exchange

I believe for this to work in generic code there would need to be an automatic "trait bound" which could perhaps be called ZST. It would be deemed satisfied only for struct-s with no fields and for struct-s where all fields themselves satisfy the bound.
It would be nice if the syntax worked in similar fashion for references and boxes

Here's another attempt at such syntax:

#[repr(transparent)] struct Slice<Zst: ZST> { data : [u8], zst : Zst }
let ru : &[u8] =...;
let bu : Box<[u8]> = ..;
let r : &Slice<Zst1> = ru as &Slice{zst : getZst1()}
let b : Box<Slice<Zst1>> = bu as Box<Slice{zst : getZst1()}>; // move

I find it interesting to consider what - if anything interesting - would happen if we wrap a transparent wrapper type into another transparent wrapper type

Topic		Replies	Views
[Pre-RFC] Patterns allowing transparent wrapper types language design	26	3192	October 20, 2019
(Mega-pre-RFC) Reference specialization types (DSTs, proxy-references) language design	32	2638	March 25, 2019
Pre-RFC: Define the behavior of `repr(transparent)` when all fields are zero-sized types Unsafe Code Guidelines	20	1244	August 31, 2023
[Pre-RFC] Opaque Structs	5	3612	March 25, 2019
[Pre-RFC] Custom DSTs language design	33	2536	March 25, 2019