Tweaks to array, byte and string literals


#1

Hi.

I have a small suggestion.

Short summary

Change the types of byte string and string literals. Introduce strings of fixed size.

Proposed changes

Keep the types of array literals as [T, ..N]. Change the types of byte string literals from &[u8] to [u8, ..N] Change the types of string literals form &str to to str[..N].

Introduce the missing family of types - strings of fixed size: str[..N]. str[..N] is essentially a [u8, ..N] with UTF-8 invariants and, eventually, additional string methods/traits. It fills the gap in the vector/string chart: Vec<T> | String ------ | ------ [T, N] | ??? ------ | ------ &[T] | &str

Keep the autocoercion of array literals (not arrays of fixed size in general) to slices. Add the autocoercion of changed byte string literals to slices. Add the autocoercion of changed string literals to slices. Non-literal arrays and strings do not autocoerce to slices, in accordance with the general agreements on explicitness.

Motivation

The main motivation today is a “forward compatibility”. Currently, given the lack of non-type generic parameters and compile time (function) evaluation, strings of fixed size are not very useful. But after introducing CTE the need in compile time string operations will raise quickly. Even with non-type generic parameters alone strings of fixed size can be used in runtime for “heapless” string operations, which is useful in constrained environments. Before 1.0 str[..N] can be implemented as minimally as possible just to allow the change of the types of string literals.

The secondary motivation is consistency between literal types and better design in general. All the literals stop losing the knowledge of their sizes at compile time. The changed types [T, ..N] and str[..N] can be easily converted to slices, but the reverse operation requires CTE. All the literals (and variables they are assigned to) have usual value semantics and do not refer to some external storage inaccessible by normal means.

Examples of uses not possible with current types, but possible with proposed types:

// Today: initialize mutable array with literal
let mut arr: [u8, ..3] = b"abc";
arr[0] = b'd';

// Future, with CTE: compile time string concatenation
static LANG_DIR: str[..5 /*Should, probably, be inferred*/ ] = "lang/";
static EN_FILE: str[.._] = LANG_DIR + "en"; // str[..N] implements Add
static FR_FILE: str[.._] = LANG_DIR + "fr";
// Or, without CTE: runtime heapless string concatenation
let DE_FILE = LANG_DIR + "fr";

The autocoercion of literals is kept for ergonomic purpose. Writing something like:

fn f(arg: &str) {}
f("Hello"[]);
f(&"Hello");

for all literals would be just unacceptable.

Backward compatibility

All the current static strings keep compiling.

static GOODBYE: &'static str = "Goodbye"; // Autocoercion

All the code using autocoercion for array literals keeps compiling.

fn g(arg: &[int]) {}
g([1i, 2, 3]); // Autocoercion

In general, the surface changes are minimal.

Drawbacks

Minor breakage.

Example:

fn main() {
    let s = "Hello";
    fn f(arg: &str) {}
    f(s); // Will require explicit slicing f(s[]) or implicit DST coersion from reference f(&s)
}

Alternatives

Keep the status quo or apply the changes partially.

Drawbacks:

// Today: can't use byte string literals in some cases
let mut arr: [u8, ..3] = [b'a', b'b', b'c'];
arr[0] = b'd';

// Future: str[..N] is added, CTE is added, but the literal types remains old
// Have to use conversion methods
let mut arr: [u8, ..3] = b"abc".to_fixed();
arr[0] = b'd';

static LANG_DIR: str[.._] = "lang/".to_fixed();
static EN_FILE: str[.._] = LANG_DIR + "en".to_fixed();
static FR_FILE: str[.._] = LANG_DIR + "fr".to_fixed();

// Bad future: str[..N] is not added
// Heapless/compile-time string operations aren't possible, or performed with "magic" like extended concat! or recursive macros.

Precedents

C and C++ string literals are char arrays of fixed size. C++ library proposal for strings of fixed size (link), the paper also contains some discussion and motivation.

Afterword

I’d be glad to receive any feedback, in particular, how easy or difficult the minimal implementation of str[…N] would be.


#2

I can’t see a mention of a very important problem: string (and by string) literals have to be kept in static memory. My solution to that would be to make literals lvalues of their type, which would mean that coercions to a slice are noops and putting the value on the stack copies from a constant global.

The other problem, which doesn’t have an immediate solution, is that we were hoping to move str to the library post-DST, as a newtype of [u8].

Other than that, the implementation should be straight-forward.


#3

Thanks for reminding. In C++ string literals are lvalues with static storage duration, so I just took it for granted, but this is an important detail and should be mentioned. In this regard array literals are different, since they can’t have static lifetime in general case.

But the alternative is also possible: assume for a minute, that string/byte literals have the same lifetimes as their corresponding array literals. Then

  • : One special case less in the specification
  • : Everything with static lifetime is explicitly marked as static
  • : Ergonomics

Have to write

fn foo() {
    fn take_static(arg: &'static str) {}
    take_static("Hello"); // Autocoercion
}

as

static HELLO: &'static str = "Hello"; // Autocoercion
fn foo() {
    fn take_static(arg: &'static str) {}
    take_static(HELLO);
}

, but functions (or struct fields) taking/returning &'static str and not just &str are probably rare.

  • : (?) Performance cost. An implementation will still keep literals in static memory and will have to copy them to stack, but the copy is unobservable and can be eliminated.

That’s great news! Keeping the core language small is always a good idea. But for strings of fixed size it would require non-type generic parameters. Something like:

// Can't reuse name "str" here (?)
struct str_bikeshed<N: uint>([u8, ..N]);

#4

If the “static” aspect is more important, than consistency between literal types, then I have an alternative proposal:

Keep the types of array literals as [T, …N]. (Non-static) (Or even change them to &'a [T, …N], then the consistency will be achieved too). Change the types of byte string literals from &'static [u8] to &'static [u8, …N]. (Static) Change the types of string literals form &'static str to &'static str[…N]. (Static)

Introduce the missing family of types - strings of fixed size: str[…N].

No additional autocoercions, and coercions [T, ..N] -> &[T] can be removed too, even for literals. DST coercions (&[T, ..N] -> &[T], &str[..N] -> &str) do all the job for ergonomics.

The main point stays - literals are not dynamically sized, they are statically sized and should not lose their size at compile time.

Pros: No special rules about string literals being lvalues. They are rvalues and point to static memory by definition. No accidental copies from static memory to stack. Deliberate copies are still possible.

Examples:

// Today: initialize mutable array with literal
let mut arr: [u8, ..3] = *b"abc";
arr[0] = b'd';

// Future, with CTE: compile time string concatenation
static LANG_DIR: str[.._] = *"lang/";
static EN_FILE: str[.._] = LANG_DIR + *"en"; // str[..N] implements Add
static FR_FILE: str[.._] = LANG_DIR + *"fr";
// Or, without CTE: runtime heapless string concatenation
let DE_FILE = LANG_DIR + *"de";

// Today, backward compatibility
fn f(arg: &str) {}
f("Hello"); // DST coercion

static GOODBYE: &'static str = "Goodbye"; // DST coercion

fn main() {
    let s = "Hello";
    fn f(arg: &str) {}
    f(s); // No breakage, DST coercion
}

// (With the change to array literals [T, ..N] -> &'a [T, ..N])
fn g(arg: &[int]) {}
g([1i, 2, 3]); // DST coercion &[int, ..3] -> &[int]

In fact, now I like this proposal even better than the original one : )