TLDR:
Currently there are various parts of code generation that cannot be opted out of. This adds additional options to safe code, even when unsafe code would prefer not to expose such things, and in fact cannot soundly do so. Even more frustrating, sometimes the compiler, while not willing to let you opt out of the code generation, also is incapable of successfully doing the code generation, leaving one in the position of going "Stop forbidding my code because you don't know how to compile that part. I don't want you to compile that part! That part only makes things worse! You are complaining that you don't know how to make things worse! Just stop!".
While there are hackish and limited ways to get around this by using invariant lifetimes as non-lifetimes, they either massively compromise type safety, requiring us to resort to unsafe functions that manipulate untyped raw pointers, and then trying to create safe wrapper around these without compiler aid, or interfere with documentation and high quality error messages. Or both.
This is similar to what is brought up here, but expands on it and I think goes in somewhat different directions (that permits basically implicit Pointee, I think?)
As work continues on the Sized/DynSized/Pointee/etc, this seems like a good time to specify the top of it, to say a type parameter opts out of all implicit knowledge of the type.
Such a type parameter is inherently extremely limited, which enables more code generation options and safety options. This is not a conservative extension to safe code, and in fact enables safe code to represent things like powerful type equalities (see below).
The proposed opt out is ?Concrete
, which seems natural to me and in fact I came up with independently of the linked post, but of course bikeshedding is possible.
While adding new ?
bounds has been shown time and time again to be a thing that risks major churn, this type bound avoids that: I believe the only type parameter currently used in the stable standard library that could have it relaxed is PhantomData
, so unlike ?Move
or ?Drop
or ?Leak
the churn would be minimal to non-existent, with the possible exception of things like ghost-cell
wanting to enable some of the nicer options this enable (they would continue to function fine without such an upgrade, and in a quite interoperable with code that wants to use the advanced features way).
In many ways a ?Concete
type is like a lifetime: they can be used in generics without blocking dyn safety, you can't specialize on them, and so on. Many of the ghost
tricks would perhaps be improved by switching to ?Concrete
types (and it also would permit the theoretical extension of lifetimes in the form of 'a: Concrete
, and the possible sound extension of Any
to such, though I am skeptical of the ergonomics of such).
However, there are a few differences:
- There is only one named lifetime possible: 'static. With
?Concrete
one can assign top level names to be used. This could be used for things like phantom tags on units of measure and such, and with the next difference it unlocks some very powerful tools. - One can have
?Concrete + TRAIT
. While concrete opts out of all implicit bounds, present and future, one can then opt back into them.
For example, if one has T: ?Concrete
one cannot pass a &T
, as it has opted out of the Pointee
trait, and so the metadata is not available. However, with ?Concrete + Pointee<Metadata=()>
one winds up with the exact assertion "I will use this as a type safe reference and nothing else". Such a generic type can be used safely in dyn types, as the code is the same for all: we need not worry about passing in information about the size of the type, how to drop it, nor anything else. It is equivalent to a type checked version of passing around *const ()
.
This would make I think a far stronger base for refining the hierarchy of what options various types have. There has been concern about types that cannot be dropped without async code, the whole question of exern types, and while almost any option would create some churn (either in the form of blocked compiles or misbehaving code), being able to start writing some of the low level functions to only ask for exactly what they need would future proof things far better.
This also, as mentioned above, enables more powerful abstractions to be created in purely safe code by enabling more precisely selected balances between "checked at compiletime" and "what code gen results". For example, the safe creation of type equality proofs, as demonstrated by the following (explanation of how this works further below):
trait TyFunc<Arg: ?Concrete> {
type Output: ?Concrete;
}
trait IsEqualityProof<A: ?Concrete, B: ?Concrete> {
fn run<'a, F: ?Concrete>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
F: TyFunc<A>,
F: TyFunc<B>,
<F as TyFunc<A>>::Output: Pointee<Metadata = ()>,
<F as TyFunc<B>>::Output: Pointee<Metadata = ()>;
}
impl<A: ?Concrete> IsEqualityProof<A, A> for () {
fn run<'a, F: ?Concrete>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<A>>::Output
where
F: TyFunc<A>,
<F as TyFunc<A>>::Output: Pointee<Metadata = ()>,
{
input
}
}
struct EqualityProof<A: ?Concrete, B: ?Concrete>(Box<dyn IsEqualityProof<A, B>>);
impl<A: ?Concrete> Default for EqualityProof<A, A> {
fn default() -> Self {
EqualityProof(Box::new(()))
}
}
impl<A: ?Concrete, B: ?Concrete> IsEqualityProof<A, B> for EqualityProof<A, B> {
fn run<'a, F: ?Concrete>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
F: TyFunc<A>,
F: TyFunc<B>,
<F as TyFunc<A>>::Output: Pointee<Metadata = ()>,
<F as TyFunc<B>>::Output: Pointee<Metadata = ()>,
{
self.0.run::<F>(input)
}
}
Do note that this is, in haskell terms, a 'lifted' equality proof. It may panic or enter an infinite loop if you try to use it, including doing so non-deterministically, and requires more than the ZST to store that fact. Unsafe code (or direct compiler support) would still be required to generate an 'unlifted' version. However, it is possible to safely turn a lifted
equality proof into an unlifted or at least guaranteed to evaluate one simply by using the original one a single time. Without a ?Concrete
bound, however, trying to translate this becomes extremely difficult:
pub trait TyFunc<Arg: ?Sized> {
type Output: ?Sized;
}
unsafe trait IsEqualProofRaw<A: ?Sized, B: ?Sized> {
fn run_raw(&self, input: *const ()) -> *const ();
}
unsafe impl<'a, A: ?Sized, B: ?Sized, T: IsEqualProofRaw<A, B> + ?Sized> IsEqualProofRaw<A, B>
for &'a T
{
fn run_raw(&self, input: *const ()) -> *const () {
T::run_raw(self, input)
}
}
unsafe impl<'a, A: ?Sized, B: ?Sized, T: IsEqualProofRaw<A, B> + ?Sized> IsEqualProofRaw<A, B>
for &'a mut T
{
fn run_raw(&self, input: *const ()) -> *const () {
T::run_raw(self, input)
}
}
unsafe impl<A: ?Sized, B: ?Sized, T: IsEqualProofRaw<A, B> + ?Sized> IsEqualProofRaw<A, B>
for Box<T>
{
fn run_raw(&self, input: *const ()) -> *const () {
T::run_raw(self, input)
}
}
trait IsEqualProof<A: ?Sized, B: ?Sized> {
fn run<'a, F: TyFunc<A> + TyFunc<B>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
<F as TyFunc<A>>::Output: Sized,
<F as TyFunc<B>>::Output: Sized;
}
impl<A: ?Sized> IsEqualProof<A, A> for () {
fn run<'a, F: TyFunc<A>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<A>>::Output
where
<F as TyFunc<A>>::Output: Sized,
{
input
}
}
#[repr(transparent)]
struct FromRawEqualProof<T: ?Sized>(T);
impl<A: ?Sized, B: ?Sized, T: IsEqualProofRaw<A, B> + ?Sized> IsEqualProof<A, B>
for FromRawEqualProof<T>
{
fn run<'a, F: TyFunc<A> + TyFunc<B>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
<F as TyFunc<A>>::Output: Sized,
<F as TyFunc<B>>::Output: Sized,
{
let input: *const <F as TyFunc<A>>::Output = input;
let input: *const () = input.cast();
let output: *const () = self.0.run_raw(input);
let output: *const <F as TyFunc<B>>::Output = output.cast();
unsafe { output.as_ref().unwrap_unchecked() }
}
}
impl<'lt, A: ?Sized, B: ?Sized, T: IsEqualProof<A, B> + ?Sized> IsEqualProof<A, B> for &'lt T {
fn run<'a, F: TyFunc<A> + TyFunc<B>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
<F as TyFunc<A>>::Output: Sized,
<F as TyFunc<B>>::Output: Sized,
{
T::run::<F>(self, input)
}
}
impl<'lt, A: ?Sized, B: ?Sized, T: IsEqualProof<A, B> + ?Sized> IsEqualProof<A, B> for &'lt mut T {
fn run<'a, F: TyFunc<A> + TyFunc<B>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
<F as TyFunc<A>>::Output: Sized,
<F as TyFunc<B>>::Output: Sized,
{
T::run::<F>(self, input)
}
}
impl<A: ?Sized, B: ?Sized, T: IsEqualProof<A, B> + ?Sized> IsEqualProof<A, B> for Box<T> {
fn run<'a, F: TyFunc<A> + TyFunc<B>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
<F as TyFunc<A>>::Output: Sized,
<F as TyFunc<B>>::Output: Sized,
{
T::run::<F>(self, input)
}
}
#[repr(transparent)]
struct AsRawEqualProof<T: ?Sized>(T);
unsafe impl<A: ?Sized, B: ?Sized, T: IsEqualProof<A, B>> IsEqualProofRaw<A, B>
for AsRawEqualProof<T>
{
fn run_raw(&self, input: *const ()) -> *const () {
enum Helper {}
impl<Ty: ?Sized> TyFunc<Ty> for Helper {
type Output = *const ();
}
*self.0.run::<Helper>(&input)
}
}
struct EqProof<A: ?Sized, B: ?Sized>(Box<dyn IsEqualProofRaw<A, B>>);
impl<A: ?Sized, B: ?Sized> IsEqualProof<A, B> for EqProof<A, B> {
fn run<'a, F: TyFunc<A> + TyFunc<B>>(
&self,
input: &'a <F as TyFunc<A>>::Output,
) -> &'a <F as TyFunc<B>>::Output
where
<F as TyFunc<A>>::Output: Sized,
<F as TyFunc<B>>::Output: Sized,
{
let prf: FromRawEqualProof<&Box<dyn IsEqualProofRaw<A, B>>> = FromRawEqualProof(&self.0);
IsEqualProof::<A, B>::run::<F>(&prf, input)
}
}
impl<A: ?Sized> Default for EqProof<A, A> {
fn default() -> Self {
EqProof(Box::new(AsRawEqualProof(())))
}
}
In fact, the above code has a tricky error due to lack of parametricity.
Specifically, the impl AsRawEqualProof for AsRawEqualProof<T>
is unsound. While the obvious implementation for an attempted IsEqualProof
is to require A
and B
to be the same and then just pass back what you are given, since the full information about what <F as TyFunc<A>>::Output
and <F as TyFunc<B>>::Output
are passed in via monomorphization (the very reason it is not dyn safe!), there is nothing stopping safe code from checking if the two types are equal, and if they are passing back the output, but otherwise panicking or entering an infinite loop. Or checking if the output is a raw pointer and if so returning a null pointer. To make it actually work, I believe one must make run_raw
actually only call run
with a type that guarantees that A
and B
are equal if successfully returned, and then pass along the pointer separately, bypassing the entire point of the system by requiring us to recreate the whole thing!
In fact, in the absence of ?Concrete
, I believe any attempt to use non-lifetimes for advanced type level proofs in a way that enables safe extensions to be filled with peril at best. An attempt to make something like ghost-cell
more user friendly by naming various parameters risks safe code examining your labels and misbehaving.
And the very unsafety that occurs is the same problem that blocks dyn safety: That we cannot decline to pass various information to function calls without a huge pile of unsafety and removing automatic type checking. I have found this to be particularly painful: It is one thing that the safe EqualityProof
takes extra space to include the possibility of calling it having side effects or non-termination, neither of which I desire to use, but it is far more painful to be told the compiler is refusing to compile my code because it cannot figure out a way to pass around far more damaging and subtle misbehaviors!
Even if the compiler was extended in its ability to develop dyn-safety, such that it could figure out how to pass around these misbehaviors, I would vastly prefer it simply not do so anyway!
Why the type equality proof system works:
What the system there does is it builds a value of a specific type of function. It could also be done with generic closures, IsEqualityProof
is basically just a closure type. It uses the fact that if one has a type of term (using higher kinded types and parametricity): for<F> fn(F<A>) -> F<B>
, then (in the absence of panics) A
and B
must be the same type.
That such a term exists if they are the same type is easy to show: one can simply pass the input as the output.
However, if they aren't the same type, such a function cannot reliably return. Unlike fn(A) -> B
, where you can satisfy it by having a B
on hand, we have asserted that anything you make up must hold for both A
and B
. One could function types to make it so that one must also be able to generate A
s. One could assert that we must be able to convert hashsets. Writing something that handles every possible case quickly seems impossible, and in fact can be easily proven to be impossible. If the types were different, we could do something like this:
trait CheckIfA {
type Helper;
}
impl CheckIfA for A {
type Helper = ();
}
impl CheckIfA for B {
type Helper = !;
}
struct IsA<T: CheckIfA>(T::Helper)
Providing a value of IsA<A>
is trivial: IsA(())
. But the function must then give back an IsA<B>
. And the field there must then be of type !
, the canonical example of impossibility. Only when A
and B
are the same type can this be ruled out. Almost. One might raise lifetimes as a concern, since we aren't actually allowed to match on those in the same way. However, we could still write something that extracts the lifetime and applies it to other things, so if the lifetimes differed we could create things that must cast the lifetimes on things like references without creating dangling references (as safe code cannot do such a thing), which is once again a contradiction, though more difficult to witness.
Other options:
- We could enable creating more structure in lifetimes, so that one could add labels to your 'lifetimes'. This would at least fix the labeling issue, but sometimes what one wants is actually part of the code gen, such as reference types. One could write new unsafe abstractions with this, but it would basically consist of creating a new
Ref<'a, 'b>
where'b
is standing in for a type rather than simply using&'a T
in the cases where one can. - We could add a third new type of generic parameter, neither a lifetime nor a type. In some ways this actually makes more sense. The
F
s there aren't exactly types, they are standing in for type constructors, things in haskell are of typeType -> Type
. However, having a bucket just for "something else" is dubious, and it would require duplicating a lot of machinery. Also we would have to decide on a further way to distinguish it, something like$F
or such. This would also then, like the previous, fail to permit reusing some of the type hierarchy for when we want to enable parts of the code gen. - We could create an entire user extendable kind system ala haskell. While this would I suppose make things more composable, it also seems like a lot of work, and the existing lifetime vs type distinction would either have to somehow be retrofitted into it, or we would have to somehow have both at once. I am unable to see how either of these could reasonably work (something about different defaults when you have a generic parameter in one syntax or the other? Possible, but it seems annoying.). IMO, the existing trait system works well enough.