[RFC] basic 'unsafe lifetime

This is written informally because imho this should only be added if it can be explained trivially and informally. It is possible I've overlooked a simple thing which could be added in to fix, but the important part is that a beginner should be able to get a basic explanation of 'unsafe and what it is / why it's used that is at least as good as with *const T, without making normal named lifetimes more complicated.

TL;DR

Add a keyword lifetime, 'unsafe. 'unsafe is a stand-in for a lifetime which is not compiler enforced. Any use of a type including the 'unsafe lifetime requires an unsafe block. Said usage is only allowed if there exists some dynamic lifetime which is valid from creation to last use; in other words: you're the borrow checker now.

Why not pointers?

For a number of reasons:

  • &'unsafe T has all of the benefits[1] of references other than safe dereferencing, such as the fact it is known aligned an non-null. As such, it can be niche optimized.
  • Access to method calls, autoref/autoderef, and functions which take references without converting pointer to reference.
  • 'unsafe can be used in struct lifetime generics. Doing so is wildly unsafe, but sometimes beneficial (e.g. for encapsulation of lifetimes).

Pointers are still very useful! 'unsafe serves a distinct but related purpose. If in doubt, use pointers instead; they're the safer option due to being more permissive with what you can do.

Define "signatures that mention 'unsafe"

If the fully expanded type signature contains the token 'unsafe, the type mentions 'unsafe and faces the consequences. This includes types which contain the 'unsafe in positions which do not represent actual stored data, such as ::std::marker::PhantomData<&'unsafe ()> and impl ::std::ops::Fn(&'unsafe ()).

At runtime, the 'unsafe lifetime is erased like any other. It only impacts what operations are considered unsafe, not runtime semantics in any way.

What about Drop?

Dropping types that include 'unsafe is unsafe. As such, any type that mentions 'unsafe has its drop glue suppressed, and must be manually dropped[3].

Types which have multiple fields, only some of which mention 'unsafe, have drop glue that drop all fields that do not mention 'unsafe.

The one semiformal bit: Patching Stacked Borrows

is trivial[4]: don't ever retag or emit protectors for a reference with the 'unsafe keyword lifetime.

Bonus

This is emulated today by lying (...kinda; 'static is complicated) and using 'static and just being really careful, especially about exposing that false 'static to external code. 'unsafe encodes existing practice and makes it, well... not safer, but more obvious the extent of the unsafe. (Dis?)Honorable mention goes to yoke for making 'unsafe properly safe. You might also know owning_ref and/or rental.


  1. And drawbacks[2], like subobject slicing (still contentious) and shared XOR mutable! ↩︎

  2. Drawbacks are just benefits to the compiler, not to you :smiling_imp: ↩︎

  3. This gives us a way to define ManuallyDrop: (PhantomData<&'unsafe ()>, T). ↩︎

  4. And if it isn't, this proposal should be rejected immediately. ↩︎

8 Likes

I'm a bit hesitant about the aliasing rules. I do like an unsafe pointer with guarantees to any nich optimization that will come and same validity invariants as references (for example always initialized if this will become the case for references), but the aliasing rules feel too strong.

This sounds bad. What with generic code? This also can easily become a terrible footgun.

At the very least I'd expect a lint to be emitted when it is dropped.

What a "use" is exactly is hard for me to understand.

I wonder if, instead of applying to all uses, there could be some new magic functions. One safe one that turns all lifetimes in the input to 'unsafe, and an unsafe one that replaces all the 'unsafes with unbound lifetimes (like you get from &*p)?

This has previously been called ptr::WellFormed, and I personally think both it and &'unsafe are useful independently.

While saying "all uses are unsafe" I think is still useful as a simple explainer, how about this (derived from your suggestion) as a more formal detention of what that means:

Any type containing lifetimes can be coerced to replace any contained lifetimes with 'unsafe. This coercion is safe, but must be explicitly requested with as. For any expression at a type mentioning 'unsafe, the 'unsafe lifetime(s) are treated as unbound lifetime(s). If the lifetime is resolved to a concrete lifetime and not 'unsafe, the expression is unsafe. Any function call where the function type mentions 'unsafe is unsafe to call.

This likely needs significant further refinement to avoid causing inference issues, but I think provides a reasonable definition for how "uses" of 'unsafe can be made unsafe.

Personally, I think a major benefit of &'unsafe T is being fairly transparently usable as &T, just unsafely. 'unsafe is likely still useful if the conversion is only done by intrinsics, but a lot less convenient. If conversion functions are needed, &'unsafe isn't any more convenient than ptr::WellFormed.

This idea is great!

I have some questions about the idea:

firstly, how NLL works with 'unsafe?

unsafe{
    let mut x=0i32;
    let a=&'unsafe x;
    println!("{}",a);
    println!("{}",a); // NLL would compiles without `'unsafe` since `a` automatically dropped here.
    let b=&'unsafe mut x;
    println!("{}",b);
//    println!("{}",a);//this won't compile with NLL, but what about `'unsafe`?
}

There must be a lot of things to do with 'unsafe.

So, I consider the "transparently" to be actually a detriment, because it's invisible.

That means two things to me:

  1. There's no good thing to point at in the error message.
  2. It's unsafe potentially multiple times in the same method.

The combination of those means, for example, that there's no nice place to put the // SAFETY: reason comment.

Hmm, is there any place where you specifically need to get a &'unsafe T from a slice::Iter<'unsafe, T>, as opposed to a &'a T?


That said, because it's safe I agree that there doesn't need to be a specific callout point for the 'a-to-'unsafe conversion. That could even be a coercion, and it'd be fine.

1 Like

Does this mean that the type is just not dropped, like ManuallyDrop, or that implicitly dropping it will cause a compile error? The former sounds like a horrible footgun, and looking like an innocuous reference only makes it worse. The latter doesn't have a precedent in the language, and asThe Pain Of Real Linear Types in Rust discusses, likely just won't work.

I.e. it misses the primary benefit for both the user and the compiler. Together with the Drop issue, this means that &'unsafe cannot be blindly used as if it was just another lifetime. In particular, I cannot pass it into any lifetime-generic function, since it doesn't abide to the usual interface of the references. Or would you mean that &'unsafe is converted into some unbound reference when passed into fn foo<'a>(_: &'a Bar){} ? That would negate any benefits from a separate unsafe lifetime, since most code handles lifetimes through such generic functions. Only lifetime-less code and code which concretely accepts &'unsafe (which doesn't currently exist) will be able to pass it around.

Actually, how would that even interact with generics? There is no bound on fn drop<T>() {} which excludes &'unsafe T, but it should be impossible to pass &'unsafe T into it, since dropping &'unsafe T is itself unsafe. That would just cause resource leaks. Do we need to use a separate T: ?'unsafe lifetime bound?

Overall, this reminds me of the proposal to add lifetimes to raw references. I think something in that direction would be a better choice. Unsafe lifetime isn't really a lifetime, references with unsafe lifetime aren't references, so they should be a separate clearly distinguishable concept, if they are ever added.

3 Likes

Including even std::mem::size_of? How about std::any::type_name?

I think at this point we've sufficiently motivated my bailout clause. This RFC exists to ask if 'unsafe can exist to "just" push borrow checking onto the developer, and I think we've fairly shown that it's unfortunately significantly more complicated than any simple rules can capture.

I do think something like 'unsafe would be beneficial, but it'll require much more careful design, which this RFC unsuccessfully tried to replace with just (not enough) downscoping (to be trivial).

Explicit conversions may be enough to descope to trivial, but I'll leave that to someone else to pick up.

3 Likes

(NOT A CONTRIBUTION)

I considered the 'unsafe lifetime when looking at reworking raw pointers. The core idea is that 'unsafe is the "top" lifetime, which all other lifetimes outlive.

I'm not convinced it makes sense to allow this lifetime to be used with regular references, for the reasons named above, but the idea was that a new kind of raw pointer ("unsafe references") which would be nonnull, but not necessarily aligned or pointing to valid data, and would carry a lifetime like normal references do. If omitted, this lifetime would always elide to 'unsafe. To convert an unsafe reference to a safe reference was then to assert both that it was aligned and that it was valid for some real lifetime.

The nice thing about letting raw references have a lifetime is that then users could create raw references with a real lifetime and track it for some given period of time, for example with an unsafe indexing operator, so an unsafe reference would be able to be asserted to live for a given lifetime and then passed around and operated on, with other unsafe references derived from it. The unsafe lifetime would exist to allow them to omit the lifetime if they don't want to use it. It would still be unsafe to dereference.

This may not make sense in practice, I didn't get very far into this. I know Gankra's more recent work is also relevant to this.

1 Like

So the purpose of this third (fourth? fifth?) reference type is as I understand it: a reference with raw pointer aliasing semantics (“anything goes”), which makes it unsafe to dereference (as a place deref trivially becomes a reference), and additionally may be unaligned (to support use for unaligned accesses).

The benefit of these reference-pointer hybrids is that they can carry a lifetime, which serves purely as a note to the developer that the reference is intended to be valid (as nothing enforces that it is).

Of course, there's nothing theoretically stopping us from adding a lifetime to *'lifetime const T. At that point the only difference is that the new reference is non-null.

The idea of some “&raw” along those lines is still at least somewhat on the community mind; in fact, I just brought up the concept as the remembered reason why &raw const $place syntax hasn't stabilized yet.


(Off topic meta recommendation: you might consider making “NOT A CONTRIBUTION” a link to explain that it's a disclaimer required by your employer and elaborating slightly on what's being disclaimed / what the definition of contribution is there. This could help avoid other people asking what it means. Feel free to completely ignore.)

1 Like

I'd love CString::as_ptr() to return *'self c_char.

So basically you have a reference without aliasing or mutability guarantees?

Something currently happens when a reference is converted to a raw pointer (we need to retag since we want to, e.g., weaken Unique to SharedReadWrite). I presume you want something similar for &'unsafe? Basically they should behave like *mut/*const currently do?

Actually,

the original intent really was that it would have the tag of a reference, not of a raw pointer, the only difference being that its tag could be revoked before it's statically unusable.

IIUC, it's impossible to actually have two references that share a Unique tag by construction. Now that I'm thinking a little more carefully than the “trivially obvious” semantics... this would probably be completely disastrous.

So I think the semantics I was originally imagining was that &'unsafe mut T would actually still do Unique retagging, but instead of a retag failing being immediate UB, it produces an invalid tag.

What happens when you use a raw pointer with an expired lifetime? From a function argument boundary this is prevented, but in a function body without a bounding signature, what?

Does it just work with unsafe? Normal use is already unsafe. Is it a hard error? What if the lifetime was conservative, and it's still a valid pointer that you want to use? Magic for<'a, T> fn(*'a T) -> *T where it's not an error? Deny-by-default lint?

In a perfect system, using a pointer with expired lifetime should be one more step unsafe than using one with a valid lifetime. We don't have a double unsafe... unless... unsafe unsafe {}? IIRC unsafe in expression position has to be followed by a block; this has been used to suggest allowing unsafe $expr as a shorthand for unsafe { $expr } before.

Whereas 'static is the bottom lifetime, which outlives all other lifetimes.

But wait, the lifetime of pointers is effectively *'static mut T, since their “lifetime” never “expires”. That means k#new_pointer_type 'unsafe T is both unsafe to dereference (because they always are, no matter the lifetime), and carries an invalid lifetime, making it... more unsafe to use.

I guess the point is that k#new_pointer_type 'static T is a pointer with a proof that it's valid until program end, but with raw pointer aliasing rules.


..... Rather than &'a raw or &'unsafe or *'a or anything, what's wrong with just

&unsafe 'a T
&unsafe 'a mut T

(Literally: unsafe reference.)

If 'a is omitted, it defaults to 'static (or 'unsafe, I don't know which is better)... modulo figuring out if lifetime elision should apply like to normal safe references.

............... I'm one of the grammar people, I should immediately know that it's the same reason for the turbo fish: this is only a possible syntax in type position ... unless we apply the same thing done to &raw const to disambiguate ... or maybe it's actually okay because unsafe in expression position must be followed by { ... except trying to take advantage of that to make this a valid type is probably breaking to some usage of decl macros binders ...


I apologize if any of this is incoherent. I'm still adjusting to a new medication. I feel like I'm maybe making less sense today and yet, I cannot convince myself to log off.

(NOT A CONTRIBUTION)

Also that they're non-null by default, so you can get NPO by default and use Option without worrying about the representation getting bigger, also that it would've been bundled up with other syntactic changes to make writing correct unsafe code easier, a rethinking of the pointer types based on what we've learned the last (now) 7 years. Giving them a lifetime was just one avenue of extension, but an intriguing one.

The point of 'unsafe in this scenario was to have a sensible answer for all the times you want raw pointers but don't have a lifetime to give them (such as almost always when they appear in a struct definition). That way the definition of (e.g.) a box-like type isn't something awful like &'static unsafe mut T.

An example of where lifetimes on raw pointers could be useful: I sketched out a type that would be like RawVec in semantics, a heap allocated contiguous array of maybe-initialized values. Here, going from &'a RawVec<T> to &'a unsafe [T] was interesting; you need unsafe code to dereference (since you can't dereference them if they're not initialized) but if your code doesn't contain any of the method/operator/whatever that extends a lifetime, you know this unsafe reference doesn't outlive the RawVec-like type. So a certain class of possible bugs is eliminated.

Clever. In one possible version, this was the syntax, and the 'unsafe lifetime was just a compiler-internal concept for what the lifetime of &unsafe T inside a struct definition was, or something like this, you couldn't actually write 'unsafe (or maybe you could, but would really never have a reason to).

Well at least that's still better than it is now, when 'static is actually used like this in practice :upside_down_face: c.f. rental, owning_ref, yoke

Additionally, a Box implementation is actually better off implemented as &'static mut T than as ptr::NonNull<T>, because the former actually communicates correctly that a box is a unique dereferencable owner, and alignment can be used for niching.

1 Like

It's not really impossible and that's fine. What is impossible is to forge a tag, i.e., to create a pointer with an existing old tag that you haven't been given. That is the key property.

You can have two &mut with the same tag by making an exact bit-wise copy of one, e.g. with transmute_copy. All the automatic retag insertion makes it hard to not immediately overwrite that tag with a different one, but there are tricks you can use and it is important to me that this is not load-bearing.

That, on the other hand, is a huge change. It would need very careful evaluation whether that is still sufficient for all the optimizations.

We have a version of Stacked Borrows where a load with a tag that is not in the borrow stack returns poison, rather than introducing UB. However, stores still have to do their full set of checks; I am not aware of a good way to weaken that. (And &mut retagging is basically a store.)


To me this sounds like something that is better represented via &'a [MaybeUninit<T>]. Or is the idea that this would also basically replace MaybeUninit?

1 Like

(NOT A CONTRIBUTION)

I was just playing around with ideas and maybe it wouldn't work out, but the idea was that if you had a reference type that doesn't guarantee its valid for dereference, you could have a more natural way of working with uninitialized memory by reference. So you'd still need MaybeUninit for actually constructing an uninitialized value, but MaybeUninit<T> would implement some sort of UnsafeDeref to give you an &unsafe T (and same for slices of MaybeUninit<T>).

More abstractly, I was trying to see if we could make unsafe code easier to read and write if we took the lessons that we've encoded in std types like MaybeUninit and NonNull and incorporated them into the design of the unsafe language. As well as other lessons we've learned like making typecasting and mutability casting syntactically distinct instead of using as for both; we had a call about this at one point if you recall.

That makes sense; so basically we'd have a type that is "implicitly" MaybeUninit behind the pointer.