[Pre-RFC/Idea] Extend `io::Seek` with convenience methods (with e.g. `stream_len()`)


#1

Edit: Update proposal here. PR here.


Hello everyone!

Recently I tried to obtain the length of a stream (Read + Seek) and was surprised that there isn’t a nice method for it. I had to manually write this:

let length = stream.seek(SeekFrom::End(0))?;
stream.seek(SeekFrom::Start(0))?;

Sure, it’s not terrible, but stream.len() or something similar would be a lot nicer and easier to read.


Therefore I propose to add the following provided methods to std::io::Seek:

  • fn stream_len(&mut self) -> io::Result<u64>: seeks to the end (to get the stream length) and then returns to the original position. Code:
    let old_pos = self.seek(SeekFrom::Current(0))?;
    let len = self.seek(SeekFrom::End(0))?;
    self.seek(SeekFrom::Start(old_pos))?;
    Ok(len)
    
  • fn seek_start(&mut self) -> io::Result<()>
  • fn seek_end(&mut self) -> io::Result<u64>

(Alternative names for the last two: seek_to_{start,end}.)

It’s mostly convenience, in particular it avoids the need to import std::io::SeekFrom. Additionally, stream_len() could prevent a common bug: to forget to seek to the beginning after getting the stream length from seeking to the end.


Obviously, this is nothing groundbreaking. But I think it would be nice little addition (in particular stream_len() – I don’t care about the other two methods as much).

What do you think about this? Any related ideas? And: if most of you agree that this is a good idea: official RFC or just open a PR on the main repo?


#2

I have a worry. In particular, stream_len looks like a getter, but it has the side effect of rewinding to the beginning. What if I was somewhere in the middle? This could probably be fixed… but the length can also change at any time (if it’s a file)…


#3

Ah, dang, I knew I forgot something! I will fix my first post. Of course, it is supposed to get the current position and return to that position after seeking to the end! Thanks for catching that mistake so early.

EDIT: fixed it in the original post now.


#4

This can be experimented with in a library crate, by defining a SeekExt trait that has a blanket impl for T: std::io::Seek.


#5

Yip, true. And I guess it’s a good idea to quickly create such a crate (will probably do soon-ish).

But I doubt it would be used a lot. I already wrote two RFCs which were accepted and which added rather tiny, convenience methods. People rather not use a crate (adding a crate takes time) if they can just write the slightly longer form. I’ve talked about this in more detail here.


#6

It’s true that tiny convenience methods are better added to the crate that defines the core trait, but there has been some pushback against populating fundamental traits with too many utility methods. See, for example, the evolution of Future from 0.1 to 0.3: all utility methods now live in trait FutureExt, and the convenience aspect is addressed by putting both traits into futures::prelude. However, one reason to add default-implemented methods in the core trait is possibility of optimized specialization.

Your proposed stream_len method, however, stretches the limits of trivial. Seeking a file object modifies the state of the object and may result in system calls. If any of the seek operations fails and cannot be recovered from (which your proposed implementation does not try to do), the object is left in a different seek position, which possibility is not obvious from the “gettish” name and signature of the method.


#7

Good point about “state of the object” in case anything fails. Somehow I am ashamed of how little thought I apparently put in this idea, since I didn’t think of these details. I will try to implement all of this in a crate soon. Hopefully I think of all problems and details when actually coding it. Afterwards we can take a look at the final API to have a better foundation to argue on.


#8

I found some time to quickly throw together the crate seek-ext. I also thought about the mentioned problems and changed my proposal a bit. I now propose to add these two methods:

  • fn stream_len(&mut self) -> io::Result<u64>
  • fn current_position(&mut self) -> io::Result<u64>

Here is the full implementation including documentation (if you prefer, you can read it in the rendered form):

/// Returns the length (in bytes) of this stream.
///
/// This method is implemented using three seek operations. If this method
/// returns successfully, the seek position is unchanged (i.e. the position
/// before calling this method is the same as afterwards). However, if this
/// method returns an error, the seek position is undefined.
///
/// If you need to obtain the length of *many* streams and you don't care
/// about the seek position afterwards, you can reduce the number of seek
/// operations by simply calling `seek(SeekFrom::End(0))` and use its
/// return value (it is also the stream length).
///
///
/// # Example
///
/// ```
/// use std::io::{Cursor, Seek, SeekFrom};
/// use seek_ext::SeekExt;
///
/// # fn main() -> Result<(), std::io::Error> {
/// let mut c = Cursor::new(vec![0; 6]);
/// let pos_before = c.seek(SeekFrom::Current(4))?;
///
/// assert_eq!(c.stream_len()?, 6);
/// assert_eq!(c.current_position()?, pos_before);
/// # Ok(())
/// # }
/// ```
fn stream_len(&mut self) -> Result<u64> {
    let old_pos = self.current_position()?;
    let len = self.seek(SeekFrom::End(0))?;
    self.seek(SeekFrom::Start(old_pos))?;
    Ok(len)
}

/// Returns the current seek position from the start of the stream.
///
/// This is equivalent to `self.seek(SeekFrom::Current(0))`.
///
///
/// # Example
///
/// ```
/// use std::io::{Cursor, Seek, SeekFrom};
/// use seek_ext::SeekExt;
///
/// # fn main() -> Result<(), std::io::Error> {
/// let mut c = Cursor::new(vec![0; 6]);
///
/// c.seek(SeekFrom::Current(4))?;
/// assert_eq!(c.current_position()?, 4);
///
/// c.seek(SeekFrom::Current(-3))?;
/// assert_eq!(c.current_position()?, 1);
/// # Ok(())
/// # }
/// ```
fn current_position(&mut self) -> Result<u64> {
    self.seek(SeekFrom::Current(0))
}

In particular, as a response to the problems mentioned in this thread:

  • "stream_len looks like a getter": from the name alone, yes. But luckily we have nice types in Rust and the &mut self is a clear indicator that the state of the object might be changed. Additionally, the documentation clearly states what side effect can/should be expected.

  • If any of the seek() operations inside of stream_len fails, the stream position is undefined”: yes, so be it. As you can see, I included this information in the documentation which just states this fact. I don’t think it’s a problem, because (a) in almost all cases, IO errors will be passed up the stack and the seekable object won’t ever be inspected again, and (b) the user can recover from it by simply resetting the seek position.