Make FromStr composable


#1

This is primarily motivated by a toy RPN calculator I’m writing, but I think this has pretty general application.

At a certain point in my lexer, I need to parse a number out of the input stream. Ideally, I’d be able to use the FromStr::<f64> machinery, which already has all the logic (including overflow detection!) already built.

Unfortunately, the current semantics of FromStr prevent this use case:

  • The entire input stream is considered to be consumed. Trailing characters are considered an error.
  • Even if trailing characters were not an error, the caller has no idea how much of the input was consumed on a successful parse.

To remedy this, I propose a more composable main trait method, and the existing behaviour can be maintained as a shim. There’s probably a better name than from_str_with_unread but I’ll leave that bikeshed for another time:

trait FromStr {
    fn from_str_with_unread<'a>(s: &'a str) -> Option<(Self, &'a str)>;
    
    fn from_str(s: &str) -> Option<Self> {
        FromStr::<Self>::from_str_with_unread(s).map(|(x, _)| x);
    }
}

The basic idea behind from_str_with_unread() is to return the unconsumed subslice of the string passed in, so the caller can use it to do further parsing. FromStrRadix would also be similarly modified.


#2

Some previous discussion:


#3

There was also rumblings about a generic From trait that would replace FromStr completely, but I think that’s fallen out of fashion now: http://discuss.rust-lang.org/t/pre-rfc-remove-fromerror-trait-add-from-trait/783/17


#4

Is the intention of FromStr to support a full parser, or just for quick and dirty itoa translations? If it’s the former, I think we should add a parsing library instead of doing this. If it’s the latter, what we have now is fine.


#5

@cgaebel I think the goal is to explicitly not support a full parser, those would be better served by rust-peg or other parser generators.

A more apt comparison would be strtod() instead of itoa(). It even follows a similar API, allowing the caller to pass in an optional char ** that will be updated to point to the first unconsumed character.

As it stands the current semantics simply don’t give the caller enough power. On #16176, mahkoh says “There are several places in the stdlib that already roll their own parsers because of this.” (though I haven’t independently verified this)

For floating point conversions in particular, hand rolling your own parser is both difficult and perilous. Take a look at the Bugs in strtod() section on this page, especially this PHP bug that sends the parser into an infinite loop because of errors in its string to floating point conversion.


#6

What about a library function that FromStr just uses in its implementation that is also publicly available?

fn parse_float<I: Iterator<u8>>(s: I) -> Option<(f64, uint)>

which returns an optional “parsed float + bytes consumed”. Alternative, higher level abstractions can be implemented on top of this pretty easily.


#7

What about a library function that FromStr just uses in its implementation that is also publicly available?

Right, so this works for the f64 case, and perhaps also for the f32 case. But then there’s also no equivalent to strtol() and strtoul() for integers. While integers are easier to do parsing for, checking overflow is still not trivial, and definitely not something I want to hand roll every time.

Providing functions for generic arguments based on a destination type is the exact use case for traits. Given that the existing behaviour can very easily be maintained and doesn’t exactly impose a higher implementation burden, I would be more in favour of changing the FromStr trait than providing specialized free functions.