[pre-RFC] Allowing string literals to be either &'static str or String, similar to numeric literals

Thanks for the sleuthing! I hadn’t thought about those particular cases; definitely a good warning against making this change.

Also great point on the truncation family of methods! Talking about COW got me into a mental model where mutation requires reallocation.

I started a topic on something like this at [pre-RFC] custom string literals. The rfc I posted was very rough, but if you’re interested in custom string literals I’d like you’re feedback.

Actually, that's not true. owning_ref was updated to support mutable references, which implicitly added the requirement that implementers of StableAddress + DerefMut are stable under deref_mut().

I tried to document the requirements here as best I could.

Anyway, I am extremely opposed to changing the implementations of core types like String and Vec. Tons of code has been written under the assumption that they work the way they do. Not only will changing things break code and make the language more confusing, it will also (rightfully) promote the impression that Rust is unstable and immature.

As for the original suggestion, I think special casing String is the wrong way to go about it. In cases where performance isn't an issue, I want everything to default to easy_strings::EZString, not String. I think user defined string literals are the best solution.

P.S: IMO, defaulting ints to i32 was a bad idea - I've already encountered bugs due to values unexpectedly becoming i32s.

@Storyyeller, @Gankra: Ah right, I forgot about the recent-ish DerefMut patchs to StableAddress, my bad.

@Gankra: Regardless of anything else - why would pop() and truncate() need to allocate? Couldn’t they just slice the static string?

What does "stable under deref_mut" mean? The docs say:

More specifically, implementors must ensure that the result of calling deref() is valid for the lifetime of the object, not just the lifetime of the borrow, and that the deref is valid even if the object is moved. Also, it must be valid even after invoking arbitrary &self methods or doing anything transitively accessible from &Self. If Self also implements DerefMut, the same restrictions apply to deref_mut() and it must remain valid if anything transitively accessible from the result of deref_mut() is mutated/called. Additionally, multiple calls to deref, (and deref_mut if implemented) must return the same address.

I feel like I must be totally misunderstanding this paragraph, because I don't see how Vec<T> could meet them given that pushing to it can reallocate. What am I missing?

EDIT: It seems that specifically you can't have mutated the object, but you can deref it to a mutable reference? I guess that's what this means:

Basically, it must be valid to convert the result of deref() to a pointer, and later dereference that pointer, as long as the original object is still live, even if it has been moved or &self methods have been called on it.

(Not to be pedantic but you can mutate objects without calling &mut self methods through reassigning visible fields, but I think I understand what you mean.)

(Not to be pedantic but you can mutate objects without calling &mut self methods through reassigning visible fields, but I think I understand what you mean.)

Yes, but OwningRef/Value take ownership of the container and never expose a mutable reference to it. The only mutable access on it is deref_mut() (and technically, drop)

Anyway, if you have ideas for how to clarify the documentation, feel free to suggest them.

Was any thought given to simply switch the type of string literals from &'static str (resp. &'static [u8]) to &'static String (resp. &'static Vec<u8>)?

This doesn’t allow let v: Vec<String> = vec!["a", "b"];, as it’s instead let v: Vec<&String> = vec!["a", "b"]; however:

  • it allows punting on introducing the distinction between str and String,
  • it does not require any magic in String or Vec: &String and &Vec do not allow mutation, so the pointed to memory can be stored in ROM as usual (one extra word),
  • it does not require inference specialization, &'static String can naturally Deref into &'static str.

Did I miss any discussion? Would it not be good enough?

That’s obviously not backwards compatible, so it’s not really an option.

I am not entirely sure how to formulate this or if it is at all feasible, but I want to put out an idea.

Currently, allocators return mutable references, therefore they need to point to the heap. However, could we design custom allocators to potentially return immutable references? Then we could have an “allocator” into program memory, an “allocator” into the stack.

Could we solve this &str vs &String discrepancy with custom allocators?

let s = "Hello World".as_string();
// s: &'static String<immutable_static_program_memory_allocator<'static>>
// s.vec: &'static Vec<u8, immutable_static_program_memory_allocator<'static>>

I think we should revisit this next year. Currently I’m in favor of performing this optimization/coercion with these modifications:

  • Only apply it to &'static str & String, not &'static [T] and Vec<T>.
  • String::capacity returns max(len, capacity), so string.capacity() - string.len() will never underflow.
6 Likes

Suppose "x" produced an immutable String when needed

Would that avoid any change to String/Vec

except a trick to avoid free()-ing static mem?

There is no such thing as an immutable String. If users have ownership they can mutate it.

2 Likes

Even with that kind of desugaring, immutability still is not an intrinsic property of values in rust, only of name bindings… What would prevent the user from doing this?

fn foo(mut s: String) {
   // Use s mutably
}

fn main() {
    foo("hello");
}

I don’t like implicit coercion proposals in this thread. If the main problem is annoyance of .to_string(), then in my opinion using explicit prefixes like s"foo" is a much better solution and it’s in line with already existing syntax b"foo" and r"foo".

On a slightly different note: I would also like very much to have something like h"00 01 0a ff" to define binary arrays using hex notation. (spaces are optional and just for readability)

2 Likes

Could "x" then possibly desugar into & 'static String?

Where the actual String would be in static data?

Or maybe even into *static_var where static_var has got type & 'static String?

That would make it doubly indirect and not solve the problem of people trying to pass string literals to APIs that take String. Moreover it would not be beneficial: the branch on null that we have to do is almost certainly not a performance problem, and if it is we just won’t do this.

I suspect that would error due to trying to move out of the static.

Of course, we could try .clone() instead of * but then that clone would either need to (a) point into the static memory (generating the same issue as before) or (b) allocate on the heap (entirely defeating the optimization).

Note that we have a non-trivial gotchas related to i32 fallback, where fallback is applied at a step where it shouldn’t be, blocks actual inference and in general behaves badly.

There’s a selection of a few I personally hit fairly often:

I’m extremely strongly against adding any new “fallback” schemes before we figure out how to properly implement these “fallbacks” in the first place.

The String::literally and perhaps a string literal that constructs such a String “literally” seem like a nice optimisation tho.

3 Likes

It would allow you to write things like

fn foo() -> String {
     "Bar"
}

I think literals should have type inference.

I’m not a fan of the fallbacks. Ideally, it should be some abstract type where it goes to a fallback very late when you need to start generating code. The way it’s done seems like an implementation issue.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.