Pass `&` references by value

This is probably nonsense, but might invoke an interesting conversation. Or perhaps it’s not nonsense?

Niko Matsakis, in his excellent presentation, said that Rust is about control without compromising safety. So you can embed the contents of a field in the contents of a struct, instead of holding a pointer to the heap. According to the same philosophy, when you transfer x to a function it is done by copying its bytes and not by passing a pointer to it, since this extra level of indirection will usually cost more than the duplication of the bytes.

However, if x has a destructor, you can’t pass it to a function by value, unless you’re willing not to use x any more. For example, many file-related functions expect a &Path, which is basically a pointer to a pointer and a length. I’m pretty sure most programs will run faster if a function would just receive a pointer to the data and a length. It is possible with two special types: if you have a String you can pass s.to_slice(), and the same goes for a vector. But I don’t know of any way to do this with any other type.

It just seems to me that performance-wise, most programs would benefit if instead of passing &T as a pointer, it would be passed by copying the bytes of T. This would work as long as T doesn’t include any Unsafe fields (directly or through other fields). For those types, and for cases where performance-wise you’d prefer a pointer over a copy, you’d use, say, Ref<'a, T>, which would be a new name for today’s &'a T.

Because this would make calling &T “a reference” a bit strange, I’ll call it “a view”.

You can add another feature, which would make the special types &str and &[T] unneeded. If, in a struct definition, a field is prefixed by the keyword onlyinmut, it would not be copied to &T and would not be accessible from it. This would mean that the definition of Vec would become:

pub struct Vec<T> {
    ptr: *mut T
    len: uint,
    onlyinmut capacity: uint,
}

Function which currently receive a &[T] would receive a &Vec<T>, which would be exactly equivalent to today’s &[T]. And when calling them, you’d just use &v instead of today’s v.as_slice(). This would make Rust’s type system simpler (as there would be no &str and &[T]), and would make string handling simpler - no need to have &str, String, Str, and &String (which is legal but shouldn’t be used) - there would just be String and &String, like with any other type.

(It might make sense to force onlyinmut fields to be private, so the difference in representation between T and &T would remain an implementation detail, an optimization, and won’t bother users of the type)

If the meaning of &T changes that way, it would make perfect sense to not require explicit referencing of function arguments - that is, you’d be able to write open(path) instead of open(&path). It would make sense since the performance characteristics of passing a view would be just those of passing a value. The only difference is that you’d still be allowed to use path afterwards.

This would make iteration simpler. For example, it would make the following code work:

for path in paths.iter() {
    tmp_path = path.plus_suffix(".tmp");
    rename(path, tmp_path).unwrap();
}

(Currently you have to prefix tmp_path with a & for this to work, which I find a bit ugly.) You’d also be able to just use .iter() in most cases, whereas today you’d usually want to use .move_iter() if it’s possible.

To summarise, this would make views more similar to simple values, which would make Rust more performant, more consistent, and easier to use.

What do you think? Where is my crucial mistake?

Cheers, Noam

1 Like

Consider passing an array of mesh data to a render function; you’re now cpu bounding this to copy the entire block of memory (potentially huge) every function call, purely for semantic reasons.

Also, how about &&T, or *T? What do you do in these cases?

I’m not convinced.

I suppose you could argue that you relegate pass-by-pointer to the *T type, and all &T’s are passed by value, but then again, many &T’s (eg. &[T]) contain a *T; specifically in the case of &str this is the case (and therefore contain unsafe pointer fields as you’ve indicated)

So ultimately I don’t think you would get any meaningful benefit from this.

Just to clarify, I believe that @noamraph is trying to pass the Vec struct (which does contain a *T pointer, but is not considered unsafe) by value, not the contents of that *T pointer. This is instead of passing a pointer to the struct, which then has another pointer to the actual array. As for the &&T pointer, it seems like you could just pass the &T by value with no ill effects.

Hm, maybe I don’t understand what’s being proposed here (entirely possible), but as I see it:

What you’ve said is true, but only true for Slice.

Ie. If a function takes &[T] it is literally in memory expecting a fat pointer, (*T, uint). So udner this proposal if you have a Vec you can copy the inner contents of Vec instead of the Vec.as_slice(). Fair enough, totally true.

However, if you’re making that a general rule, what about other types?

What if struct Foo { … } is more than just a pointer and a uint? What if it’s a large complex structure, or has a fixed length array in it? What does passing &Foo mean now?

If you follow the same logic, and Foo is struct { *T, *T, *T, uint, uint, uint } you have to copy all of these inner values and pass them instead of one uint64 pointer right?

Or would this logic only apply to Slice?

I believe he wants to make the same rules of direct passing apply to & references – that is small objects get passed by value, larger ones are automatically put behind a pointer.

That’s exactly what I mean.

It might make sense if those were the rules, but I believe that the current rules are that everything is passed by value, no matter its size, and if a type has a destructor it is “moved”, that is, you can’t use it again from the calling function (and the called function is responsible for calling the destructor).

Indeed, according to this idea, you’d copy all Foo's inner values instead of pass a pointer to foo. This would mean more work when calling the function, but less work inside the function, as it would be able to access the members directly without an indirection. I think that in most cases this would be a performance gain. If a type is large and isn’t accessed much, then you’d use Ref<T>.

Perhaps it would make the most sense to construct a Ref<T> by ref t, just like you construct a Box<T> by box t. However the ref keyword might already by taken.

I'll have to bow out I guess.

according to this idea, you'd copy all Foo's inner values instead of pass a pointer to foo

Sounds exactly like you're derefferencing the pointer and copying the (potentially huge) memory block it points to.

...but apparently you're not. So disregard me I guess~

Perhaps I wasn’t clear enough. Anyway, I meant that you’d copy all the size_of<Foo>() bytes of foo into the function. If some of the bytes are pointers, only the pointers would be copied, not the data they point to. So you won’t copy any huge memory block.

This would be my one concern with indiscriminately copying Foo to the function. Making choices about whether to pass a (possibly large) fixed size array sounds like a hard problem that we should leave the programmer to decide. If they can't use &Foo as a reference type, then what would be the alternative to force the struct to be passed by reference?

I agree that the programmer should be able to decide whether he wants to pass by value or by reference. Currently, if a type has a destructor, the programmer is forced to pass by reference, even if it doesn’t make sense, except for two special types: a string and a vector.

I propose that &foo would create a view of foo by copying its bytes, and ref foo would create a view by passing a pointer. I’m suggesting the shorter syntax for passing by value because I think that in most cases it would result in better performance, and because I think it’s more consistent with the rest of Rust.

It sounds like you want a shallow copy of objects where any owned ptrs they contain are converted to borrowed ptrs.

Vaguely related: is the plan for Vec<T> to be redefined with a Box<[T]> once we get that?

Programmers can (sort of) opt in to pass-by-value without move semantics, by implementing and using the Clone trait. This isn’t the same level of support/sugar that your proposal would offer, but it does give some control to the programmer when it’s really needed.

This doesn’t work for things with destructor – Rust insists that pass-by-value things are destructed at the end of the function.

This sounds like it’s trying to reintroduce passing modes. I don’t even know what the sigils stood for at the time, I just remember it being deemed too confusing. It might be nice to be able to say “borrowed by value” or “borrowed as word sized, indirecting if necessary”, but I figure it’s a lot of complexity to add just to avoid indirecting through a stack pointer sometimes, and anything short of that means introducing “smart” behaviour that can’t be opted out of.

You already have that smart behaviour which cannot be opted out of when passing things ‘by-value’ or returning things.

I want to explain my idea again, in the hope that I’ll manage to make it clearer this time. I also have a refinement of the idea, which makes it applicable without any additional syntax, and immediately allows to throw the special (and confusing) types &str and &[T] away, while making the language more performant.

First, the explanation. Every type Foo has its size_of<Foo>() bytes. Unless some of them are of type Unsafe (usually they aren’t), whenever a view &foo is alive, the compiler makes sure that those bytes are not mutated. If they are not mutated, it means that it doesn’t matter whether you have a pointer to those bytes, or if you have a copy of those bytes. This means that the question of whether the type &Foo is a pointer or a copy of Foo's bytes is just an implementation detail. Take for example the code

let v = &foo;
let x = v.x

Currently the compiler translates it to

Foo* v = &foo;  // v is a pointer to foo
int x = (*v).x; // to access v.x you need to follow the pointer

If &Foo were implemented by copy, the code would be translated to:

Foo v = foo;  // to create v you copy foo's bytes
int x = v.x;  // you can now access v.x immediately

As you can see, it is only a matter of code generation. This would be entirely transparent to a Rust program, unless it checked what is size_of<&Foo>().

 

Now, based on this, let me propose a variant of the original idea, which doesn’t require any new syntax.

  • Types could be decorated, to affect the implementation of their view.
  • If a type isn’t decorated, its view is a copy of its bytes.
  • A type could be decorated by #[ref_view], which would make the compiler implement its view as a pointer. Types which include Unsafe members must be decorated with #[ref_view].
  • Other types could be decorated to specify fields which are not copied into the view. For example, Vec would be defined like this:
#[not_in_view(capacity)]
pub struct Vec<T> {
    ptr: *mut T
    len: uint,
    capacity: uint,
}
  • Fields listed as not_in_view must be private, so code outside the type’s module won’t be bothered by the discrepancy between the type and its view.

This has a lot of benefits:

  • No extra syntax to the language, just some more work for the machine code generator.
  • No need for the special types &str, &[T], which are confusing a lot of people. There would be just String and &String, Vec and &Vec.
  • No need for dynamically sized types (I confess that I didn’t manage to really understand those, so maybe I’m wrong here.)
  • Virtually all existing code would continue to run, and I believe most of it would become faster. If some code becomes slower, you can always add #[ref_view] where it’s needed to get back the original performance.

Did I manage to explain myself? What do you think of this idea?

@noamraph It’s an interesting idea, but I see this as two largely unrelated proposals:

  • Add an optimisation to & pointers to copy for small Copy types
  • Remove &[T] and &str and just use &Vec and &String instead. (This is optimised with #[not_in_view] so that the capacity doesn’t need to be included in references, but AFAICT this isn’t strictly necessary.)

The first proposal makes sense (and is probably a decent idea IMO) but the second one isn’t really practical, unfortunately—when allocation is not available, Vec cannot be used. Also, DST is still needed—trait objects with custom pointers (like Rc<Show>) need DST to work, and that isn’t fixed by your proposal.

Those types don’t need Copy, having no Unsafe elements in them suffices.