Is synthesizing zero sized values safe?

RustyYato · December 20, 2019, 9:23pm

You can look into my generic-field-projection crate where I use a mixure of type level lists and runtime checks which get optimized away to get safe unique paths.

(look for ProjectToSet, Field, FindOverlap, project::from_mut)

Here's how it works, ProjectToSet projects a reference to a set of fields as (which are represented as types that implement the Field trait). The Field trait is unsafe to implement, and provides a function name. This function provides a unique path that identifies the field as an iterator. (for example, foo.bar.x would be ["bar", "x"] as an iterator). This iterator is build of std::iter::once and std::iter::Chain, making it really easy for LLVM to see what is going on.

project::from_mut then calls FindOverlap to see if any of these "names" overlap (meaning that the fields would alias). FindOverlap is a type that simulates generic closures. It goes through every field-type in the set pair-wise and figures out if the names overlap. If any of the names overlap, then it return true. If FindOverlap returns true, then project::from_mut will raise a panic and be on it's way. Otherwise it will do the projection.

For all cases that I tried (up to 16 fields), LLVM was able to see through all the checks and reduce project::from_mut to just the pointer arithmetic. (through simple inlineing and const propogation)

This is fairly complex, but in the end you get robust, efficient, and safe code which I find to be worth the cost.

matt1985 · December 20, 2019, 9:59pm

My crate mostly uses the fp!( .a.b, .a.c ) macro (the first . is optional) to construct disjoint field paths safely,since it's intended for emulating structural types,rather than for generically manipulating fields.

"emulating structural types" in this case means that the name of the field is concrete,while the type that you get the field from is generic or a dyn Trait.

In the future I might do something like your crate if I decide to improve support for generic operations on field paths.

Lokathor · December 20, 2019, 10:26pm

I was assuming no such thing, i was simply wrong

RalfJung · December 22, 2019, 1:26pm

As has been said already, in general conjuring ZST is UB, as is evident by the fact that ! is a ZST. So you need to have special knowledge that your ZST is inhabited to be allowed to conjure it.

Closure types can be uninhabited if they capture a ! (or Void-style empty enum). I'm afraid I won't have time to do an in-depth review of a full crate. But if there's a small-ish self-contained code snippet demonstrating the key pattern I could take a look at that.

This sounds to me like you want MaybeUnint; this is exactly the kind of pattern it works well for: delaying the point when we actually assert to have a valid inhabitant of a type.

felix.s · December 22, 2019, 2:25pm

What if the closure captures a zero-sized proof token that happens to only be valid in the current process? The closure would be zero-sized as well and it would slip past your check.

Also, it seems that ~~if a closure captures an uninhabited type, then~~ it would be possible to trigger UB just by launching the binary with an appropriate command-line argument‌~~, even if all code paths that would lead to this from within the program are dead~~.

Not sure how seriously the latter problem deserves to be treated. It reminds me of one time when I wanted to write a program that uses mmap to access a read-only file and was wondering what would happen if the file was to be modified after all, while the program is running.

(Edit: I took a better look at how it works. The problem reduces to receiving a bogus pointer over the IPC channel; uninhabited types don't make things any worse here.)

RalfJung · December 22, 2019, 4:50pm

Indeed, that is the more subtle alternative to "closure that captures uninhabited ZST". There's little you can conclude from making sure that the size of the closure environment is 0.

Manishearth · December 22, 2019, 6:15pm

It can't capture an uninhabited type, I have an instance of it, I just don't have an instance of it in the spawned process.

felix.s · December 23, 2019, 10:20am

I was thinking of a situation where, with some uninhabited F, an attacker would be able to pass the command-line argument and an address of run_func<F, A, B> and tell your binary to run that. But then, if F is uninhabited, the compiler may choose not to monomorphise run_func<F, A, B> in the first place (because it's only referred to in spawn<F, A, B>, which is dead code because it receives an argument of uninhabited type F), so it just reduces to the question of whether you should be able to trust function pointers coming from outside the process.

The crate still assumes that .text section of the executable is a monolith whose layout is the same in each loaded image (even if it may be loaded at different offsets each time); it would fail to work in the presence of a strong form of ASLR that randomises all functions' locations relative to one another each time the executable is loaded (so that the relative address of run_func differs between the parent and the child process), or with a hypothetical JIT implementation of Rust that performs monomorphisation at runtime (the pointer may not even exist in the child process).

Manishearth · December 23, 2019, 4:32pm

Again, uninhabited types are not a problem here. The spawn function takes an instance of the type as an argument.

dhm · December 23, 2019, 7:29pm

Hmm, this looks like a good opportunity to suggest something that has been on my mind for a while: a trait to generalize over capture-less closures and fn "items":

trait FnItem<Args> : Fn<Args> {
    const fn_item: Self;
}

This way, "synthesizing" and then calling such a closure from its type F only, would be as simple as doing: F::fn_item(...).

RalfJung · December 26, 2019, 3:46pm

Oh I see. Well, in terms of the validity invariant, there are only two possible ZSTs: inhabited and uninhabited. (After all, there is no data that the invariant could depend on.)

So if you know that the type is inhabited, I think you won't cause UB by synthesizing instances of it. But of course there might still be used-defined invariants attached to it that cannot actually be transported across process boundaries (that's what you are doing here, right?). For example, I could imagine a ZST that serves as a witness that some singleton has been initialized -- presenting the ZST means re-initialization can be skipped. But of course there is nothing ensuring that the singleton was initialized in the target process as well.

So I don't think that in general you can safely send any ZST to another process.

RustyYato · December 26, 2019, 4:32pm

For an example of something similar to this, qcell::TCellOwner must be unique per process. It updates a global it is initialized and dropped to check this invariant. If you send it to another proccess, you could safely obtain two instances and that can lead to aliasing unique references.

Manishearth · March 25, 2020, 4:32pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Documenting more layout guarantees Unsafe Code Guidelines	20	1357	December 22, 2024
Size of uninhabited types language design	18	2171	March 25, 2019
Make mem::uninitialized and mem::zeroed panic for (some) types where 0 is a niche Unsafe Code Guidelines	30	3930	December 22, 2024
Type-safe atomics usage libs	17	1826	August 23, 2021
Relaxing the improper_ctypes lint to allow passing ZSTs behind a raw ptr language design	11	1836	March 25, 2019

Is synthesizing zero sized values safe?

Related topics