Pre-RFC: Rust and Large File Support (LFS)

I was recently working on large file support in another project, and thought I should see how Rust fares. In fact LFS in Rust appears almost non-existent. A few platforms do use foo64 link names, but in general it appears to not really be considered. So, I’d like to fix this, but want to ping opinions here before I draft an RFC. That is, if an RFC is even necessary – this could be considered just bugfixing.

For the uninitiated, Large File Support (LFS) in C is basically about making sure off_t and ino_t are 64-bit, as well as related compound types like struct stat and related libc functions. This is only a concern for 32-bit platforms; 64-bit code should always be fine. It’s also an issue solved long ago – see here that Linux has had LFS since kernel 2.4 and glibc 2.2.

Some platforms like FreeBSD went ahead and converted to 64-bit off_t etc. all at once. On 32-bit Linux with glibc, the default is still the old 32-bit off_t, and you can opt in with -D_FILE_OFFSET_BITS=64 to get the 64-bit types and functions swapped in-place. I’m not sure what OSX and Windows have. In an autoconf project, you can use AC_SYS_LARGEFILE to set the necessary #defines automatically. It’s a little sad you still have to deal with this at all, but I guess backwards compat with 32-bit off_t programs was considered important enough, and anyway that ship has sailed.

I think Rust should just do the right thing by default, and always use the proper 64-bit types and interfaces. There may be some concerns about changing exposed types post-1.0, like a lot of what MetadataExt returns, but I think there’s strong justification for a breaking change here. The potential for breaking is why I think this might need an RFC. There’s also the libc crate to consider, but again I think the best answer is to expose only 64-bit file interfaces there. (Not to explicitly name fstat64 etc. – just call it fstat but link to fstat64 like what _FILE_OFFSET_BITS would get you.)

Now here’s a concrete example:

fn main() {
    for arg in std::env::args().skip(1) {
        match std::fs::metadata(&arg) {
            Err(e) => println!("Can't stat '{}': {}", arg, e),
            Ok(m) => println!("path:'{}' len:{}", arg, m.len()),
        }
    }
}

Linux x86_64 is fine:

$ truncate -s 1G foo
$ truncate -s 2G bar
$ rustc -Vv
rustc 1.5.0-nightly (9d3e79ad3 2015-10-10)
binary: rustc
commit-hash: 9d3e79ad3728f6723f8e49908696e66a68d7db95
commit-date: 2015-10-10
host: x86_64-unknown-linux-gnu
release: 1.5.0-nightly
$ rustc metadata.rs && ./metadata foo bar
path:'foo' len:1073741824
path:'bar' len:2147483648

But on i686 Linux with the same test files:

$ rustc -Vv
rustc 1.5.0-nightly (9d3e79ad3 2015-10-10)
binary: rustc
commit-hash: 9d3e79ad3728f6723f8e49908696e66a68d7db95
commit-date: 2015-10-10
host: i686-unknown-linux-gnu
release: 1.5.0-nightly
$ rustc metadata.rs && ./metadata foo bar
path:'foo' len:1073741824
Can't stat 'bar': Value too large for defined data type (os error 75)
3 Likes

I did find RFC 1291 for promoting libc from the nursery, which would need coordination with LFS goals. At a brief scan, LFS is not addressed already there, just briefly mentioned one reply that off_t does vary bitwidth in Linux. I still need to read through it in depth though.

cc @alexcrichton

I think I’d be totally fine tweaking definitions of some typedefs for these platforms or linking to different functions. From a std::fs perspective we should be covered. I also doubt this would have much breakage through libc and the rest of the ecosystem, so I’d be totally down for a patch!

I’m not an expert, but it looks like Rust is already doing The Right Thing on Windows. By that, I mean it’s using the WinNT API instead of the C API. :stuck_out_tongue:

See also https://github.com/rust-lang/rust/issues/28978 , which might have some influence on how exactly you decide to change std::os::unix::fs::MetadataExt.

OK, if widening these is not controversial, then I'll just get to it, thanks!

Well that's good for rust and libstd, at least. :slight_smile: It still should be sorted out for the libc crate though, right? It looks like stat and fstat are already linked to 64-bit variants for Windows, but off_t itself is still i32, and for instance lseek maps to _lseek with a c_long offset (could be _lseeki64), so it's not all the way there.

FWIW, OSX also appears to be wholly LFS. I don't know the history -- maybe all BSDs bit that LFS compatibility bullet a long time ago.

I guess I should revise my target goal: all platforms that have LFS in their C API (or similar, e.g. WinNT) should use it by default in Rust too. But Android surprises me -- I'll comment over on that issue. Still, it's possible someone will want a Rust port on a weird embedded system where LFS is irrelevant, so this needn't become a mandate.

It is using the win32 subsystem. It is not directly interfacing with NT. Fortunately when the win32 APIs were designed, Microsoft made sure everything could handle 64-bit files. There are a few deprecated APIs lingering from the win16 era, but Rust doesn't use them.

Note that rust doesn't use stat or fstat or any of those C/POSIX functions on Windows, and I really hope anyone using libc on Windows doesn't use them either.

Well, Windows does implement a lot of the POSIX functions, with 64-bit LFS-like versions where needed too. (Not exactly the same way Linux does LFS renames, but close). So if the libc crate is going to provide these on Windows, then I think it should use LFS versions as much as possible. Now, which standards libc should provide is a question for RFC 1291.

But even sticking to pure C standards, C89/C99/etc., would not be LFS safe -- e.g. fseek and ftell use a long offset, which has 32 bits even on 64-bit Windows. (yay LLP64!) They have _fseeki64 and _ftelli64 for that. At least fpos_t is always 64-bit for fgetpos and fsetpos though.

BTW, even though technically the Windows crt off_t is always long (32-bit), we might take precedence from mingw-w64 for defining a 64-bit off_t anyway so Rust is always LFS safe. Issue 28978 raised the possibility of mismatching off_t for other C API bindings, but I think that’s already a sketchy thing to rely on. There’s a balance here between exposing to the exact letter what libc says vs. a more ideal state of LFS.

But I care most about the Linux side anyway, where off_t truly can be either 32- or 64-bit depending on just _FILE_OFFSET_BITS, so choosing the larger for Rust is easily justified.

Someone in the wild even got bitten by this:

https://github.com/ogham/exa/issues/86#issuecomment-159594466

Ugh, thanks for the nudge.

The new libc has explicit off64_t, stat64, etc., rather than mapping these implicitly the way that _FILE_OFFSET_BITS=64 would, so libstd will need to explicitly choose the better calls.

It is unfortunate that this issue persists in Rust. Many standard libraries of other languages have been bitten by these things long ago and have been fixed. Rust has been designed in sensible ways for many things, but this makes me wonder how many 32-bit problems are lying around, hidden.

Here’s a non-exhaustive list (inspired by Linux syscalls) of things that could be wrong:

  • I/O
  • ✓Seek: ok
  • pwrite/pread/sendfile: not directly exposed by Rust
  • Time
  • ✗not in std, and while RFC 1288 specifies 64-bit second types, the implementation uses 32-bit time_t on some systems
  • time values in stat are 32-bit on 32-bit systems
  • File-related
  • ✗stat: off_t not ok as evidenced here
  • truncate/statfs/fadvise/fcntl/getdents: not directly exposed by Rust

This one is unsolved in Linux, no? There's nothing Rust can do when even stat64 still has 32-bit time_t.

I see, you’re right about that. The latest activity on this was in May. But we might as well define our interfaces such that when system calls support 64-bit, Rust programs only need to be recompiled with a new libstd for them to be Y2038-safe.

Would you propose that even raw::stat should be fully 64-bit abstracted? I was just thinking to map it to stat64 for the time being. (no pun intended)

Yes, since the size of the struct might change when switching to 64-bit time_t. This can cause incompatibilities when de-/serializing.

Although I guess this would results in incompatibilities with the libc functions…

Progress in glibc: https://lwn.net/Articles/664800/

Just for posterity – LFS should be good in Rust 1.8 for both Linux and Android! :champagne:

Thanks especially to @alexcrichton for his hard work on libc refactoring and RFC 1415. I mostly just sniped him with opinions and then cleaned up a few remaining bindings. :slight_smile:

As for @jethrogb’s point about time_t, this still has to wait for OS support to really be solved, but at least Rust’s aliases are now widened to 64-bit and deprecated per that RFC. The rest of Rust’s time interfaces should be 64-bit already.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.