Array/vectors parsing


#1

This is valid D language code:

void main() {
    import std.conv: to;
    auto a1 = "[1, 2, 3]".to!(uint[3]);
    auto a2 = "[1, 2, 3]".to!(uint[]);
}

It shows that the standard “to” parsing function allows to convert a string to both a in-place stack-allocated fixed-size array and a heap-allocated dynamic array (it raises an exception if there’s a parsing error or lengths don’t match).

In Haskell you can do about the same (to a list):

a2 :: [Int]
a2 = read "[1, 2, 3]"

Currently to do the same in Rust you need more complex code (you could write it in some other ways):

fn main() {
    let mut a1 = [0u32; 3];
    let mut last = None;
    for (i, n) in "[1, 2, 3]"
                  .trim_matches(|c| c == '[' || c == ']')
                  .split(',')
                  .enumerate() {
        last = Some(i);
        a1[i] = n.trim().parse::<u32>().unwrap();
    }
    assert_eq!(last, Some(a1.len() - 1));

    let a2: Vec<u32> =
        "[1, 2, 3]"
        .trim_matches(|c| c == '[' || c == ']')
        .split(',')
        .map(|n| n.trim().parse().unwrap())
        .collect();
}

I’d like something similar in Rust std:

fn main() {
    let a1: [u32; 3] = "[1, 2, 3]".parse().unwrap();
    let a2: Vec<u32> = "[1, 2, 3]".parse().unwrap();
}

#2

I like this idea, but it can be a more complex topic. For example, should it be generic over inner type like impl<T: FromStr> FromStr for Vec<T>? Should we consider tuples? How can we parse Vec<String>?

Anyway, take my :+1:


#3

Why do you want this in std instead of in a crate?


#4

Right, that’s a valid question, it isn’t a common need (perhaps it’s sufficiently common enough just for me).


#5

Because it’s a feature naturally integrated with the parse() of the std. And it’s not a feature that’s going to change in the next years. It’s a feature good for script-like Rust code, when you read arrays from a text file.


#6

I agree with twmb that this isn’t really something that should be in std. There is a clear way to convert strings into integers. There is not a clear single way to encode arrays.

You can parse your syntax as JSON pretty easily:

extern crate serde_json;

fn main() {
    let a1: [u32; 3] = serde_json::from_str("[1, 2, 3]").unwrap();
    let a2: Vec<u32> = serde_json::from_str("[1, 2, 3]").unwrap();
}

#7

Not a fan of the idea. This would basically hard-wire a half-baked serialization format which only supports arrays (?) right into the standard library. If you want to store arrays in a serialized format, you should probably pick a proper serializer; for example, your string [1, 2, 3] is valid JSON. There are other formats too, one of my favorites is ron.


#8

To the extent possible, FromStr should be the inverse of Debug (Haskell calls these Read and Show). Therefore, it is only natural that you should be able to parse arrays, vectors, linked lists, hashmaps, etc.


#9

Why? Debug is not a serialization format, or even a fixed format. If anything, it should be the inverse of Display, but I don’t think even that’s something that needs to be strictly followed. FromStr should only be implemented for types that have an obvious (to people who think in base 10 :wink: ), singular, non-ambiguous textual representation. Almost all structured data is not that.


#10

That’s an accepted rule in some other languages (including Rust but also Python off the top of my head), but as far as I know there’s no such consensus in Rust. And it’s far from obvious that the reasoning for it carries over. Besides the language-agnostic reasons against such a rule, current Rust in particular has:

  • a high quality serialization and deserialization library with derive support (serde) that covers this use case
  • but no widespread FromStr implementations and no standard derive(FromStr), i.e., no precedent and no tools for following such a rule
  • the distinction between Debug and Display (why should FromStr be the inverse of Debug rather than Display?)

#11

Note that if you just change the format to “1 2 3 4”, you can parse it with core-only as .split_whitespace().map(i32::from_str), which does a pretty good job of handling the “easy format to read in whatever” cases seen in things like programming contests.

:+1:

Debug is explicitly not for this kind of thing, and often leaks internals in ways that make it actively bad for that. Take Instant, for example, which very intentionally doesn’t give access to its internal numbers – unless you use Debug.


#12

I wouldn’t even bother putting this in a crate. It’s such a small and specific function, and I’ve never needed to use it myself.

Just put it in a function and be done with it.