Pre-RFC: I/O Safety

phlopsi · May 3, 2021, 7:44am

IMHO, UnwindSafe is language bloat, that doesn't serve any real purpose other than making the language more complex and harder to understand. You cannot design sound unwind-unsafe data structures due to the lack of unsafe. Being able to convert !UnwindSafe to UnwindSafe via AssertUnwindSafe in safe Rust means, that practically everything has to be UnwindSafe.

In conclusion, data structures that would benefit from being able to transition into a state, that behaves well when dropped, but must not be operated on in any other way, if a panic occurs, are impossible to design in Rust except for panic=abort programs. You must add additional code to ensure, that such dangerous operations don't happen, most likely by forcing a manual abort, but that requires additional branches and probably additional state-tracking, as well.

Thomasdezeeuw · May 3, 2021, 10:06am

I haven't read the entire discussion here so sorry if this is off-topic, but I @RalfJung suggested I share my design for a safer AsRawFd trait here (from the discussion in `SockRef::from`, `Socket::sendfile` and other functions that operate on arbitrary file descriptors or `SOCKET`s potentially should be unsafe · Issue #218 · rust-lang/socket2 · GitHub).

struct BorrowedFd<'fd> {
    fd: RawFd,
    _lifetime: PhantomData<&'fd RawFd>, // Lifetime of the file descriptor we're borrowing from
}

trait AsBorrowedFd {
    // Perhaps some types need mutable access, so we would need `&'fd mut self` instead.
    fn as_borrow_fd<'fd>(&'fd self) -> BorrowedFd<'fd>;
}

matklad · May 3, 2021, 10:44am

I think that UnwindSafe machinery serves a different goal.

To use C++ terminology, there’s “no exception safety”, “basic exception safety”, “full exception safety”.

First means that a type is unsafe to use after unwinding. Second means that the type is safe to use, but might give wrong results, and third means that the type has transactional semantics and is always valid after unwinding.

You seem to say that “it’s impossible to use UnwindSafe for the unsafe code to opt out of basic guarantee”. This is true: all code in rust should provide at least basic exception safety.

But my understanding is that UnwindSafe serves a different purpose: it differentiates between full and basic exception safety guarantees. This is definitely not useless. Ie, in rust-analyzer, where we use catch_unwind, there is a case where UnwindSafe bound prevents (wrongly) re-use chalk instances whose caches become inconsistent after unwinding.

So, unwind safety does work as intended, and serves its purpose. It’s unclear though if this is worth the effort (plus, there are ergonomic issues like Send not implying UnwindSafe or the traits being in std rather than core).

sunfishcode · May 3, 2021, 5:04pm

Consider code which simplifies down to this:

struct StuffDoer {
    owned: Box<dyn AsRawFd>, 
    ffi_thing: FFIThing,
}

impl StuffDoer {
    pub fn new(owned: impl AsRawFd) -> Self {
        let raw_fd = owned.as_raw_fd();
        Self {
            owned: Box::new(owned),
            ffi_thing: unsafe { new_ffi_thing(raw_fd) },
        }
    }

    pub fn do_stuff(&self) -> io::Result<()> {
        unsafe { ffi_thing_do_stuff(&self.ffi_thing as *const FFIThing) }
    }
}

Of course, we don't know what's in the box. But we own it, and have encapsulated it, so we know we're not doing any other operations on it. And of course, this is an I/O analog of a self-referential type, but unlike with memory, I/O objects aren't safely movable, so in practice, this kind of thing can work reliably. This code should be ok with AsRawFd + DanIoSafe, but it wouldn't be ok with AsRawFd + RalfIoSafe.

My assumption was that, at this point in Rust's maturity, the first step here would need to be to introduce the minimum requirements to achieve I/O safety, so that it's as easy as possible to migrate existing code, and after that, design new and better (and more opinionated) opt-in features. Would Rust be ok saying that code patterns like the above example are discouraged or deprecated at this point?

phlopsi · May 5, 2021, 9:15am

Your explanation was short and clear and your arguments are compelling. If I ever argue about UnwindSafe, again, it'll be about me focusing on Rust not having offered no-exception-safety handling rather than completely dismissing UnwindSafe. The other part I dislike about UnwindSafe is, that it is an auto trait, but that is more of a topic for another thread and about auto traits, in general.

RalfJung · May 8, 2021, 2:03pm

If written with AsBorrowedFd, then your example is an instance of the problem where one field of a struct wants to borrow from another field of the struct. This occurs with memory safety as well; I don't think IO safety should have some ad-hoc approach to avoiding this problem such as DanIoSafe. This pattern currently requires unsafe code when talking about references and memory; I think it is perfectly fine (expected even) that it would then also require unsafe code when talking about FDs.

For example, unsafe code could do something like first create the Box, use Box::into_raw, turn that raw ptr back into a &'static T and use that to get a BorrowedFd<'static>, and keep around the raw pointer to still be able to drop the Box in the destructor. (And since this works with BorrowedFd, it also works with RalfIoSafe, there are just fewer types to guide us there.)

RalfJung · May 27, 2021, 10:37am

Looks like there is a proper RFC now:

github.com/rust-lang/rfcs

RFC: I/O Safety

rust-lang:master ← sunfishcode:main

opened 01:00AM - 25 May 21 UTC

sunfishcode

+429 -0

[Pre-RFC on IRLO](https://internals.rust-lang.org/t/pre-rfc-i-o-safety/14585) … Raw OS handles such as `RawFd` and `RawHandle` have hazards similar to raw pointers; they may be bogus or may dangle, leading to broken encapsulation boundaries and code whose behavior is impossible to bound in general. Introduce a concept of *I/O safety*, and introduce a new `IoSafe` trait to support it, This builds on, and provides a new explanation for, the `from_raw_fd` function being unsafe. [Rendered](https://github.com/sunfishcode/rfcs/blob/main/text/0000-io-safety.md)

Soni · August 28, 2021, 11:47pm

So uh, we do plan on deprecating File yeah? Particularly for /dev/fd?

notriddle · August 29, 2021, 1:24am

Of course not. Deprecating File would cause warnings in huge numbers of Rust programs, even though those Rust programs weren't doing anything weird. If deprecating File was involved, I don't think it would be considered. It would cause far too much churn.

What's being deprecated is RawFd (and its Windows-based cousin, RawHandle), which are probably not used as widely as File is.

I suggest reading the original RFC. It's not long, or dense with jargon.

Soni · August 29, 2021, 1:43am

We use RawFd on the assumption that /dev/fd/$fd is safe. The RFC doesn't say anything about this and we didn't see anyone bring up /dev/fd before.

Specifically we use it to bridge between an API that deals with files and an API that deals with URLs. we just make sure the API that deals with files keeps the File open at all times, and then push file:///dev/fd/$fd URLs to the API that deals with URLs. But anyway, just wondering what the RFC means in the context of /dev/fd.

quinedot · August 29, 2021, 9:31am

It came up early on in the PR and as far as I can tell was deemed out of scope (with a couple "would be nice to"s and a couple "totally unfeasible to actually do so"s).

Soni · August 29, 2021, 2:39pm

We see a lot of discussion about /proc/self/mem, but not a lot about /dev/fd, which is used extensively by e.g. bash and is also useful to pass open files around within the process.

Note that /dev/fd is equivalent to dup() and is available across a wide range of unixlikes, whereas /proc/self/mem is linux-only.

sunfishcode · August 29, 2021, 9:05pm

I/O safety, like memory safety, is concerned with behavior within a program. /dev/fd is much like /proc/self/mem in that it's effectively a way for a program to reach outside itself, and cause something to reach back in. There are many ways that programs can reach outside and cause things to reach back in and break language invariants (and this applies to many languages, not just Rust).

In general, preventing programs from reaching outside and causing things to reach back in requires sandboxing. There are many sandboxing systems out there, with many tradeoffs; one option that can be adopted incrementally in Rust programs is cap-std; using a Dir, one can open files under a specific directory, and not outside it.

In your /dev/fd use case, there are a few hazards which would warrant unsafe:

If the code forming file:///dev/fd/{} url strings doesn't properly own or borrow the fd it substitutes for the {}, the strings effectively represent forged fd values.
If one of those file:///dev/fd/{} url strings can ever be passed to another process, with a different fd space, it would be a forged fd value within the other process.
If one of those file:///dev/fd/{} url strings can ever outlive the fd's resource, it's a dangling fd value.

A safer way to bridge between an API that deals with files and an API that deals with URLs is to use file descriptors as the common type, rather than strings. Instead of converting file descriptors into URL strings, open URLs and obtain file descriptors, and pass file descriptors around. That way, you can use OwnedFd or other owning types, which manage the lifetimes automatically.

An example of such a type is the io-streams crate's StreamReader, which can wrap a file descriptor from a File or a TcpStream (or other things), and implements Read.

Soni · August 29, 2021, 10:20pm

Hmm, so the optimal solution would be to encourage OS devs to somehow make fds UUIDs, such that forged fds don't/cannot cause problems?

quinedot · August 29, 2021, 10:39pm

I was referring to

nth(0) File::open("/proc/self/fd/5")
nth(1) "we have to basically ignore its existence [...] Ideally we'd be able to ask the Linux kernel to just turn off these files"
nth(2) "[blocking fds] would break legit uses of the fd directory such as bash's process substitution feature"
nth(3) "while marking those as unsafe in response might be humorous , the joke would be short lived, and I am not entirely sure the laughs would last much longer if we asked the OS to block a bunch of file descriptors"

sunfishcode · August 30, 2021, 1:01am

Making fds long enough to be secure UUIDs would require more memory and more computation when using them. The optimal approach would be one where application code never, or almost never, looks at the bits of a fd value directly, so fd values can remain small and simple without being hazardous. I/O safety in Rust is a step in that direction

Soni · August 30, 2021, 12:19pm

Hmm, we see. Can we sell Rust on panicking on EBADF?

mathstuf · August 30, 2021, 1:07pm

Please no. EBADF is "just another error" not cause to rip everything down and drop you at a shell. If you want that kind of stuff to happen, check for it and panic!() yourself. And make sure that errors in your dependencies can be torn apart to find that EBADF cause if needed.

Externally-maintained resources are something that need to be considered carefully in Rust, but there's not much Rust (the language) can do about it. I don't think the stdlib should try too hard either since these kinds of guarantees are very fickle and can differ wildly between platforms. I've had to add "bogus" lifetimes to ensure that temporary directories outlive wholly-owned Rust structs that depend upon filesystem state tracked by other structures. I fixed it for the "normal" case, but I can't prevent some external process coming in and removing the directory out from underneath me anyways.

For example, the filesystem is nothing but one large, mutable, global bucket of state. One of Rust's favorite things to deal with .POSIX gives decent, if not great, indicators when things go wrong (the worst is probably SIGBUS when a mapped file is truncated, but at that point, you're probably dealing with adversaries anyways). But when you have external code floating about, a panic on "routine" (but plausible) things like having a file closed on you is not suitable IMO. Nothing unsafe happened (as far as the Rust language is concerned; your program may have other issues), just unexpected.

Another example is ensuring data written to a File hits the disk. You've got multiple layers of things that need to happen. syncing the file, syncing the file's data, syncing the filesystem it lives on, then cleaning up whatever journal you used to track your progress in that pipeline. If your filesystem is something like NFS, overlayfs, etc. there may be other things to do as well to make sure the backing store is actually on disk too. This is far better suited to a crate so that things like filesystem-specific behaviors can be tracked without fiddling around with the standard library and trying to keep up with the wild zoo of ioctl calls and the like.

I like that the Rust stdlib gives the parts for these things rather than trying to give fully-formed gadgets that work "most of the time". Making these gadgets is what crates are for and where they belong because the external ecosystem moves at rates that are "never" going to line up with Rust's release cycles making them very hard to keep up-to-date.

sunfishcode · August 30, 2021, 1:34pm

The current I/O safety feature does not panic on EBADF.

notriddle · November 28, 2021, 1:35pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why don't the AsRaw{Fd, Handle, Socket} traits require a mutable receiver? libs	4	1102	March 25, 2019
Pre-RFC: Mark all APIs that allow access to arbitrary files as unsafe language design	5	4053	March 25, 2019
Pre-RFC: RawSlices	9	2161	March 25, 2019
Uninitialized memory	57	10277	March 25, 2019
Pre-RFC: making unsafe more safe to use language design	12	3279	March 25, 2019

Pre-RFC: I/O Safety

Related topics