Prefer usizes in std

I got worried in using cursor.position(). For 32-bits hosts, it’ll be a bit slower (u64). If pos were usize, it’d be much better. I don’t want to duplicate Cursor anyway…

IMO Rust’d have best adopted C naming, except uint and int for arch-sized ints.

This would mean a 32-bit host couldn’t properly deal with files over 4GB. That kind of problem is why you’ll now see off_t (pointer-sized) and off64_t (64-bit) on Linux, along with a bunch of alternate functions like stat and stat64. It’s a mess, which we avoid in Rust by using 64-bit everywhere.

11 Likes

I was going to say the same thing, but then I realized Cursor is just a wrapper around [u8], so you’re limited to usize anyway.

1 Like

Except that, as the first line says, the point of Cursor is that it “provides it with a Seek implementation.”, and seek deals in u64s. If you only want to work on a slice you don’t need io::Cursor.

7 Likes

On this point, in fact they were named like that, then renamed to usize and isize in RFC 544.

3 Likes

To be clear, I agree it should use u64 since that’s the most natural when working with most io-related interfaces/APIs and would likely reduce the amount of casting. My point was that using usize wouldn’t “mean a 32-bit host couldn’t properly deal with files over 4GB” since this particular interface is a wrapper around [u8] (which is already limited to usize).

1 Like

As mentioned elsewhere in this thread, file sizes need to be u64 on all platforms. usize is only for memory.

1 Like

This thread seem to be going in circles. On the one hand @mjbshaw is stating that Cursor is wrapper around a slice, and that slices have a length defined in terms of usize. The responses seem to be that we want to work with files that may be bigger than what a usize can hold. This however does not address the original question of how a type defined in terms of a usize length can hold a u64 length file.

1 Like

len() mostly returns usize, so to use over 2^32 bytes doesn’t make sense.

1 Like

I had an idea to propose making the offset type of Cursor a generic type parameter, to be able to use usize with finitely sized buffers. But now I think it’s better to use a specialized API for windowed-random access to buffers, like the Buf trait from bytes. Notably, impl IntoBuf for Bytes was recently changed to not wrap the container into a Cursor: https://github.com/tokio-rs/bytes/pull/261 … and if you follow the bread crumbs there, you can see @alexcrichton expressing a bit of regret for adding Cursor in the first place :grin:

1 Like

I can picture possible use cases with sparse files or memory mapped files… though curiously, I did not find anything that impls Seek in the memmap crate.

Edit: No wait, silly me, that’s what I get for not having my coffee. Seek taking u64 is useful (and correct) even just for regular files! You can easily have a file that’s larger than 4GB on a 32-bit system. So long as you only interact with it through the Read, Write and Seek traits, there is no need to have the entire file loaded into memory.

2 Likes

Cursor can't hold a u64-length file.

But other than position() and set_position(pos), most of Cursor's methods aren't inherent. They're methods on the Seek trait, which is supposed to work with files. I assume the reason why those two methods don't use usize is to reduce casting.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.