Can the standard library shrink Option<File>?

It seems like we should be able to add a niche optimization to the FileDesc type internal. With this optimization, an Option<File> will become the same size as a regular File.

diff --git a/src/libstd/sys/unix/fd.rs b/src/libstd/sys/unix/fd.rs
index c481ca8961f..090d9f7804a 100644
--- a/src/libstd/sys/unix/fd.rs
+++ b/src/libstd/sys/unix/fd.rs
@@ -10,6 +10,7 @@ use crate::sys_common::AsInner;
 use libc::{c_int, c_void, ssize_t};
 
 #[derive(Debug)]
+#[rustc_layout_scalar_valid_range_start(0)]
 pub struct FileDesc {
     fd: c_int,
 }
@@ -28,7 +29,8 @@ fn max_len() -> usize {
 
 impl FileDesc {
     pub fn new(fd: c_int) -> FileDesc {
-        FileDesc { fd }
+        assert!(fd >= 0);
+        unsafe { FileDesc { fd } }
     }
 
     pub fn raw(&self) -> c_int {

Can a valid file descriptor be negative?

On Linux, the answer is pretty obviously no. Linux file descriptors are stored in an array of structs that has its capacity bounded to INT_MAX, so any negative int would either be considered nonsense (if treated as negative) or be higher than INT_MAX (if it was bit-reinterpreted as an unsigned value). Also, notice how dup2 returns a negative error code while also returning a positive file descriptor directly, as if they couldn't possibly overlap.

The Single UNIX Specification explicitly says that open can't return a negative file descriptor, and says in the page for dup2 that you should get EBADF if you try to claim a negative file descriptor.

Unfortunately, their description of file descriptor allocation never explicitly says that the negative range is out of bounds, but open references this algorithm while simultaneously claiming that it never gives a negative result.

Also, Wikipedia says it can't be negative, but they don't give a source on that particular claim (urgh!).

Does Rust's existing API allow it?

Here's what FromRawFd says about it.

Constructs a new instance of Self from the given raw file descriptor.

This function consumes ownership of the specified file descriptor. The returned object will take responsibility for closing it when the object goes out of scope.

This function is also unsafe as the primitives currently returned have the contract that they are the sole owner of the file descriptor they are wrapping. Usage of this function could accidentally allow violating this contract which can cause memory unsafety in code that relies on it being true.

It certainly cannot "claim exclusive ownership" of a resource that does not exist, so presumably an invalid file descriptor must not be supplied to this function. Also, it's unsafe, so triggering memory problems should not be out of the question here.

But while it says that it "can cause memory unsafety," (1) that's not the same thing as being insta-UB (2) it doesn't actually say when, presumably because it might be platform-dependent.

Is this one of those cases where "it's not worth the possibility that someone, somewhere, might be relying on this," even though it's not clearly allowed here? Or am I just twisting the language to try to support my pet project, and stuffing a number that isn't a file descriptor into a File is fine as long as you forget it before it tries to call close?

9 Likes

If Option<File> and File are made to have the same size on nix then shouldn't Rust commit to making that a part of the API's contract for all platforms (current and future)? People are very good at relying on these kinds of Option optimizations so I'd worry a nix only one might lead to issues down the road.

1 Like

I don't think so.

For one thing, it's not obvious that it would be possible on every platform that Rust might target. What happens on a target where a file handle is allowed to cover every possible value?

For another, I'm not sure it would actually help. The primary case where you'd want this is FFI, and that will ultimately be platform-specific anyway, because it's dependent on what the File actually contains.

3 Likes

I don't think we can safely say that all negative values are invalid file descriptors, but we don't need to. We can say that values from -1 to -4095 can never be valid file descriptors, because if a system call returned those, they would be interpreted as a negative errno value rather than a valid return.

(By the same argument, Linux uses -100 as the value for AT_FDCWD, the value passed to the *at variants of system to indicate the current directory.)

File::FromRawFd could assert that the file descriptor is not -1 to -4095.

Can we declare -1 to -4095 a niche?

1 Like

A similar niche would also be useful inside the Linux kernel, which has a concept of an error/pointer combination type: in addition to NULL, 4095 other values are reserved and cannot be valid pointers, so functions can return a value that's effectively Result<SomeType *, errno> (but without the type safety). Could we make it possible to declare this explicitly, so that C and Rust could interoperably represent the concept of a pointer-sized Result containing a pointer or errno, and Rust could do so with type safety and the ? operator?

1 Like

IIRC, niche optimizations support an arbitrary (but single) range of niche (invalid) values to stuff data into. However, rustc is only capable at the moment of specifying #[rustc_scalar_valid_range_start], which makes any value less than the argument an invalid value acceptable for niching.

rustc isn't possible of even simplified versions of that niche optimization yet, but nothing prevents it from being able to do so in the future. The difficulty is that you have to be able to take a reference to any contained value, so it's not enough for errno to fit in the niche, it also has to actually physically be entirely contained in the niche.

This can, of course, be imitated by a trio of types which #[repr(transparent)] and do the niche packing/checking/transmuting manually (and with impl Try, you can get ? support as well).

1 Like

IIRC, niche optimizations support an arbitrary (but single) range of niche (invalid) values to stuff data into. However, rustc is only capable at the moment of specifying #[rustc_scalar_valid_range_start], which makes any value less than the argument an invalid value acceptable for niching.

Does the internal architecture of rustc support setting an arbitrary range, and we just need an attribute to convey that range to rustc? Because if so, that seems like a fairly simple language extension to add.

rustc isn't possible of even simplified versions of that niche optimization yet, but nothing prevents it from being able to do so in the future. The difficulty is that you have to be able to take a reference to any contained value, so it's not enough for errno to fit in the niche, it also has to actually physically be entirely contained in the niche.

Technically it could be contained within the niche, insofar as the in-memory representation would be the same. If we guarantee that a NegativeErrno has no more than 4095 different values, and that a valid pointer can never have one of those values, then the bytes of a NegativeErrno value would be entirely contained within the bytes of the niched pointer, and a reference would work.

This can, of course, be imitated by a trio of types which #[repr(transparent)] and do the niche packing/checking/transmuting manually (and with impl Try, you can get ? support as well).

True. That type could also be a generic that supports arbitrary pointers or references, so that the pointer remains type-safe.

It would be nice, though, if we could arrange for that type to actually be Result. Lots of things know how to work with a Result (or with functions that return a Result), and Result has so many nice helpers that we wouldn't want to duplicate.

This isn't true though? There's an _end just like there's a _start.

Inside rustc itself we use the last 256 or so values of newtyped indices, as a niche.

cc @oli-obk

2 Likes

Things have been improved since I last brushed by it, then. If I'm not mistaken, it's still not possible to specify an internal hole as a niche yet, but obviously I don't know for sure.

Surely thatā€™s too conservativeā€”itā€™s standard practice in user mode to check if the return value of open, dup, etc. is negative, not to compare it against -1. Even if the Linux kernel wanted to support negative fds, it couldnā€™t return them to user mode without severe breakage.

2 Likes

The system call interface and the library interface are two different things. The system call interface only uses -1 to -4095 as errors, and those get translated to errno values in userspace. The corresponding library calls return -1 on errors, and technically userspace should just check for that.

(File descriptors should have always been unsigned values with a small reserved range, rather than signed values, but that would be hard to change now.)

Returning a negative file descriptor from something like open would break many bits of userspace, but that's unlikely to ever happen anyway, as open returns the lowest available file descriptor.

However, that doesn't mean that other calls couldn't potentially work with so-large-they're-negative file descriptors. I've seen and participated in discussions that proposed letting userspace explicitly use negative file descriptors for some designated purpose.

It sounds like we could potentially specify the range from -4095 to -1 as a niche for File values, and that should solve the issue of Option<File> and other simple niche usage.

Somewhat drifting off-topic, but do you know how it happens to interact with a lowered mmap_min_addr?

Unrelated as far as I know. That only controls valid userspace pointers, not valid kernel pointers. If -1 or -4095 were a pointer, it'd be at the top of memory, not the bottom, and the top of memory belongs to the kernel.

Yes, but (generally) only the library interface matters for Rust's implementation of std. The system call interface is the business of libc. As a practical difference, it seems that we would want to find a niche that applies across all Unix-style targets, not just Linux.

Sure, Windows has this trick: user-mode handles are all positive (when interpreted as a signed integer), but there are so-called pseudo handles implemented as well-known negative values that are recognized by the kernel to mean "current thread", "current process", or similar. Windows doesn't have any pseudo handles that apply to files today, but that could change.

But in such cases, presumably the fd would not be one that you'd pass to close. It wouldn't be one that would reasonably be owned by File.

1 Like

Well, if you want a niche that works everywhere, I'd suggest just making -1 the niche, which is enough to make Option work.

The remainder of the discussion was more about an optimization I'd like to see for the use of Rust inside the Linux kernel.

In the case I was talking about, they would have been real file descriptors that would need opening and closing.

1 Like

Okay, sure. Let's do that, then.

6 Likes

Thanks! I posted a few comments, but overall that looks great.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.