Generalising strings and growable collections


#1

Currently, these are the types used for strings and arrays, in a table format:

|          | Array  | String |
|----------|--------|--------|
| Growable | Vec<T> | String |
| Slice    | &[T]   | &str   |

This seems OK until you try to, for example, add in a rope array-like type, and a rope string type based on a rope array of bytes:

|          | Array  | String | Rope array   | Rope string  |
|----------|--------|--------|--------------|--------------|
| Growable | Vec<T> | String | Rope<T>      | RopeStr      |
| Slice    | &[T]   | &str   | RopeSlice<T> | RopeStrSlice |

See the problem? I’ve had to create four new types just to represent a single data structure. Sure, I can implement the string types as wrappers around the rope types, but that also requires duplicating methods like push, insert &c… I also see no way of removing code duplication between the slices and the growable versions. To me this seems ridiculous, and makes implementing a usable rope type very tedious. I propose the following scheme instead:

|          | Array     | String    | Rope array    | Rope string         |
|----------|-----------|-----------|---------------|---------------------|
| Growable | Grow<[T]> | Grow<Str> | Grow<Rope<T>> | Grow<Str<Rope<u8>>> |
| Slice    | &[T]      | &Str      | &Rope<T>      | &Str<Rope<u8>>      |

Basically what this does is add a new pointer type, Grow<T>, that represents a value that can change in size (like today’s Vec, String &c.). It also makes str a library type, Str, that takes a single type parameter (defaulting to [u8]) representing the structure that internally stores the string. Such a structure would have to implement a trait that provides methods such as pushing characters, iterating over characters &c. but not necessarily things like iterating over bytes and so on. This could make things like Str<[char]> (a UTF-32 string with constant-time char indexing) possible as well.

This scheme means that all I have to do to add in my rope type is create a new unsized type, Rope<T>, and implement the StrContainer (or whatever it’s called) trait for Rope<u8>. Instantly I have a growable rope array type (Grow<Rope<T>>), a rope slice type (&Rope<T>), a growable rope string type (Grow<Str<Rope<u8>>>) and a rope string slice type (&Str<Rope<u8>>). (These names are pretty long, so I’d probably create aliases.)

One great thing about this change is that it requires no changes to the language (AFAICT) other than removing &str. It also makes the proposed Deref<str>/Deref<[T]> impls for String/Vec make a lot more sense, as it actually is dereferencing a pointer.

So, any thoughts? Is there a simpler way of implementing a rope array/array slice/string/string slice, without having to rethink the very nature of Rust strings and growable collections? ; )


#2

I don’t understand how Grow<Rope<T>> can work; since the meaning of “growable” in “growable collection” is highly dependent on the exact collection, it seems strange that an external wrapper could magically make something growable; maybe you could provide a demonstration?

(I don’t understand what it means to have an unsized Rope<T> type, either.)


#3

I’ve got an implementation of a rope which is internally represented roughly like this:

enum RopeTree<T> {
    Branch(uint, uint, Box<RopeTree<T>>, Box<RopeTree<T>>),
    Leaf(Vec<T>),
}

As a DST, it would look like this:

enum RopeTree<T> {
    // TBH, I’m not sure what type these pointers should be
    Branch(uint, uint, Grow<RopeTree<T>>, Grow<RopeTree<T>>),
    Leaf([T]),
}

That would make Grow<RopeTree<T>> equivalent to the original definition, and &RopeTree<T> would provide a view into a rope.

it seems strange that an external wrapper could magically make something growable; maybe you could provide a demonstration?

I was imagining that Grow<T> would just contain a pointer to a T and a capacity, and would provide some kind of method for changing the capacity of the allocation. For example, a push method on a rope would be implemented for &mut Grow<Self>, and would use said method to allocate more space for the [T]s at the leaves.


#4

I fully support moving str to the library, but Str as you define it is not friendly to DST coercions.
E.g. you can’t convert a reference to string of fixed size &Str<[u8, ..N]> to string slice &Str<[u8]>, like you can do, e.g. with Box and friends (Box<[u8, ..N]> -> Box<[u8]>).
String slice should be a “smart pointer” to dynamically sized type and not a plain reference to it, It would provide more flexibility and shorter notation without “&”. See the “Unresolved questions” section in my yesterday’s RFC.

Edit: Well, now I’m not sure the “smart pointer” solution will work either, but at least I mentioned the problem.


#5

you can’t convert a reference to string of fixed size &Str<[u8, …N]> to string slice &Str<[u8]>, like you can do, e.g. with Box and friends (Box<[u8, …N]> -> Box<[u8]>).

Yes, you can:

#![feature(default_type_params)]

struct Str<Sized? T = [u8]>(T);

fn main() {
    let x: Str<[u8, ..2]> = Str([1, 2]);
    let y: &Str<[u8]> = &x;
}

#6

Exellent, one problem less! I wasn’t sure I fully understood DST, that’s why I placed it in “unresolved questions”.


#7

It doesn’t provide a very useful view, since it can only point at a whole node, e.g. if r is rope with two parts like "Foo", "Bar", what does r.slice(1, 5) return? For a normal string "FooBar".slice(1, 5) == "ooBa", but you can’t make a &RopeTree<str> that points to the correct data out of another &RopeTree<str> (since slice would presumably take &self).

All the &RopeTree vs. Grow<RopeTree> provides is views to nodes of a rope, but, for a user of the data structure, the nodes are (almost always) not semantically important. It doesn’t matter some string data "hello world" was created in one contiguous slab (i.e. just a single Leaf), or if it was incrementally built to be made up of "hello", " ", "world".

The usefulness of &[] and &DataStructure<...> is fairly limited, since there’s not that many data structures that have semantically meaningful contiguous memory (essentially only Vec and String) and without contiguous memory, &[] isn’t so useful, and, often, the basic unit of a data structure is not relevant (like with a Rope, or a BTreeMap or most other types) so the “view”/“slice” &DataStructure<...> isn’t useful. Iterators (which can be regarded as generalised slices) are significantly more flexible.

My point is: I don’t think this Grow can be general enough to be useful. (Maybe if it was restricted to just Vec and String, but any ‘interesting’ data structure is almost certainly going to be doing too much magic to fit into this box.)


#8

Whoops, I thought that I’d thought that bit through. I somehow thought that the [T]s at the ends could be used to give a start and length, but although they would give a start and length it wouldn’t be adjustable.

I guess that makes Grow kinda useless, but perhaps there’s still some use in giving str and String a type parameter, removing some code duplication.


#9

One day, when the type system is a bit better, I hope to pull out most of the Vec logic into some kind of AbstractVec structure. It would be generic over anything that provides some basic minimal API like as_slice, as_mut_slice, capacity, and grow. Then we can implement pure-heap, pure-stack, and hybrid Vecs uniformly.

AbstractString could similarly be written to be generic over something with the right minimal API. Now, the concrete Vec/String types would have to be aliased/newtyped from the Abstract variants, but all the “real” code would be deduplicated. I don’t consider having to write out a bunch of method-forwarding boiler-plate to be a big maintenance/safety issue. Just tedious as hell. I would love to have an impl/method forwarding syntax for such a thing, though.

However, Ropes (which ever version you choose) have a sufficiently different memory layout that I think that this strategy wouldn’t work. As huon notes, there is no contiguous memory for the abstract container to work on. I suppose you could yield a slice of slices or something, but now “cheap” methods are becoming very expensive. I think for ropes you genuinely just want to pass around &(mut)Ropes around. It could have a slices iterator for when you want to drop down to that level, but mostly you wouldn’t want to.


#10

As a hack until a good abstract to define Grow is defined, could we use associated types to define growable types add hoc? It’s not as nice, but it does permit generic client code which is probably the biggest benefit anyways.