View types based on pattern matching

This is a quite daring proposal, but Niko said he wanted daring. The problem at hand is accessing some struct in a split_at_mut way, giving the possibility of executing functions that work over disjoint sets set of mutable field accesses. @nikomatsakis have a blog post explaining it.

I am aware there have been many similar attempts, which have either been forgotten or unable to progress. Consequently, I have tried to be thorough on its analysis, trying to provide all details firsthand. In many ways this proposal is a continuation of what Niko's proposes in his blog, and some examples have been adapted from his.


Summary

Add a type view::<BaseType> Pattern that moderates the access to the data within the base type BaseType in the way Pattern describes. For example, the value data : view::<(i32,u8,u8)> (mut x,y,_) can be used with &mut data.x and &data.y. That is, the third element has been hidden and the second has been restricted to shared access.

Motivation

There is a whole post in Niko's blog about this problem.

If for example we have some tuple let mut data : (i32,u8,u8) = ..; with the proposed views we could hold simultaneously a view &mut (mut x,y,_) and a view &mut (_,y,mut z). Currently if two functions use different parts of a struct we have to split the data into references to each of its parts or otherwise refactorize our code.

A feature to address this problem has been attempted multiples times, see for example 1, 2, 3, 4 and myself few weeks after learning Rust. Additionally, it feels natural to look for a counterpart of slice::split_at to be available in structs.

An important distinction is the desired functionality to work with your own private types and the one to be offered to your clients. For the former it should be possible to allow inference of these views. For the latter the views should be carefully selected and they must avoid leaking internal details.

Guide-level explanation

A type view pat with a fully-qualified name view::<T> pat is logically a supertype of T, and contains an actual T, in which only the data indicated by the pattern can be accessed. Thus, the memory layout of the view is the same as the base type. For example, they have the same size and alignment. In its most common usage, the base type would be a reference.

Example

We show first an example that fails to compile, yet inlining the functions would work.

struct Combined{//<-- We will use this type in other examples.
	shared: SharedPart,//<-- ignored in this example
	a: Vec<i32>,
	b: Vec<i32>,
}
impl Combined{
	fn push_into_a(&mut self, value:i32){
		boring_operations(value);//<-- something we do not want to write each time we push.
		self.a.push(value);
	}
	fn work(&mut self){
		for &elem in self.b.iter()//<-- shared borrow of self.
		{
			self.push_into_a(elem);//<-- self cannot be borrowed mutably since it is already borrowed.
			// yet `self.a.push(elem);` would work seamless.
		}
	}
}

With this proposal we could write the following.

impl Combined{
	fn push_into_a<S>(self:S, value:i32)
		// We bound `S` to be any view with
		// at least exclusive access to the field `a`.
		where S: view &mut Combined { mut a }
	{
		boring_operations(value);//<-- something we do not want to write each time we push.
		self.a.push(value);
	}
	fn work<S>(self:S)
		// We bound `S` to be any view with at least
		// exclusive access to the field `a`
		// and shared access to field `b`.
		where S: view &mut Combined {mut a, b}
	{
		for &elem in self.b.iter()
		{
			//we have not made any reborrow of `a` so we may make it here.
			self.push_into_a(elem);
		}
	}
}

The bound S: view &mut Combined { mut a } in method push_into_a is how we indicate it does not write into b and is hence compatible with the borrows when we call it from inside work. The bound used in work is unnecessary in this case—could have been just &mut self—and would only became relevant if some other function would write into self.shared or self.b.

General aspects of views

The names used in the pattern would be the names to access the field. For example we can make a rename at the struct level.

impl Combined{
	fn silly(self:&mut Combined{
		shared: first,
		a: second,
		b: third,
	}) { body(first,second,third); }
}

This silly method could be called directly combined.silly() as the rename would be a simple coercion. This name usage has more sense with unnamed structs.

fn f<T:Clone,U>( data: view::<&mut(Vec<T>,T,U)> &mut (mut list,elem,_) ){
	list.push(elem.clone());
}

The general rule for a coercion to be allowed is whether it could be emulated by doing a match, reducing visibility, and renaming variables.

For structs, with field:_ we remove visibility of a given field. With .. we remove visibility of remaining fields. With ..V, where V is a view of the same type, we declare to use the visibility of V for fields we have not already specified. These are expected to be rarely used.

impl Combined{
	fn hide_a<C>(x:C) -> view Combined{a:_,..C}
		where C: view Combined{}
	{
		x
	}
}

I think we should assume an implicit .. in structs, it seems too verbose otherwise.

For any type T, the new type view::<&T> &_ is a view of T that cannot access anything inside it, yet it guarantees there exists some T somewhere with that lifetime. And the contents of T may change during that lifetime from other references.

For any struct T we have &mut T: view::<T> T{ any fields }. Removing a field from a view or the mut attribute gives a supertype.

Reborrowing a view follows similar rules to reborrowing &_ and &mut _. It is possible to make a view B borrowed from a view A if for each mutable field in B there is not any other reborrow and for each shared field there is not a mutable reborrow.

To offer a view to a downstream crate a type alias should be used.

pub type UserCombined = view &mut Combined{shared,mut a};

This allows to name the type, and therefore, we allow to call our functions prepared for that view. However, to use the names we still require the original fields to be visible.

Building a view

Views are not explicitly built, the base type is just coerced into the required view. If desired, use explicit ascription to force a cast.

fn foo(arg:&Combined){
	//Force a view of some struct.
	let data : view &Combined{a,b} = arg;
	...
	//Force a view of a tuple.
	let pair = (0i32,1i32);
	let casted_pair : view &(x,_) = &pair;
}

Receiving and returning a view

For full transparency, setter-like methods may want to receive and return a view. In this case is more clear we want super-types instead of just coercions. We would like this to work with either &mut Combined or any of its views that includes the field being mutated.

impl Combined{
	fn set_a<S>(self:S, value:Vec<i32>) -> S
		where S: view &mut Combined { mut a }
	{
		self.a = value;
		self
	}
}

Full inference with impl &mut Self

Alternatively, we may want the compiler to fully infer the smallest view possible. The following code uses impl T to mean the function is implemented for an undeclared amount of views, which can be checked at compile time by inspecting the function body. This should be used with much care since changes in the function body would imply a semver change not reflected in the function signature.

impl Combined{
	// Making this public would cause headaches.
	fn push_into_a(self:impl &mut Self, value:i32)
	{
		boring_operations(value);
		self.a.push(value);
	}
	// We should decide if want the impl here for future developments or
	// remove it for ease of making it part of the public API.
	fn work(self:impl &mut Self)
	{
		for &elem in self.b.iter()
		{
			self.push_into_a(elem);
		}
	}
}

Non-reference types

Outside of reference types it seems less useful. Here it is mostly a performance improvement.

struct Small{
	first: i32,
	second: i32,
}
struct Medium{
	first: i32,
	second: Small,
	third: i32,
	fourth: Small,
}
// Some heavy nested structure.
struct Big{
	first: Small,
	second: i32,
	third: Medium,
	fourth: Medium,
}
// For some functions we actually use just three fields from the inside.
// It is used like it was `Part{x,y,z}`.
type Part = view Big{
	first: Small{
		first: x,
		..//<-- do we want these dots?
	},
	second: z,
	fourth: Medium{
		second: Small{
			second: y,
			..
		},
		..
	},
	..
};
impl Big{
	fn f(self) -> Part{
		self//<-- even if their fields' names differ we have automatic coercions.
	}
	fn g(self:Part) -> (i32,i32){
		(self.x,self.y)//<-- drop of `Big` would occur here.
	}
}

In this example Part would be an owned view of Big. At low-level, the g method would just receive a self pointer instead of a pointer to each component. However, this is only a performance benefit if we do not copy Part, since it would imply a copy of Big. It is also necessary to keep the integrity of the whole Big. For example, using std::mem::swap with &mut Part should not mess up anything required in a possible Big::drop.

Deconstructing a view

If we have a view we may deconstruct it using the original field names (we cannot bind names we have hidden in the view).

let foo : view &Combined{a:x,b:y} = ...;
match foo{
	&Combined{
		a,
		b,
		//shared, //<-- we cannot bind this since it is not part of the view.
	} => f(a,b),
}

And also with the new names. Either explicitly indicating the view:

let foo : view &Combined{a:x,b:y} = ...;
match foo{
	// Yes, I am aware this is awful...
	view &Combined{a:x,b:y} {
		x,
		y,
	} => f(x,y),
}

Or having previously named it in a bound.

fn f<V:view &Combined{a:x,b:y}>(foo:V)
{
	match foo{
		V{
			x,
			y,
		} => f(x,y),
	}
}

Or with a type alias.

type FooView<'t> = view &'t Combined{a:x,b:y};
let foo : FooView  = ...;
match foo{
	FooView{
		x,
		y,
	} => f(x,y),
}

Extension for refutable patterns

Up to now we have considered views over irrefutable patterns. Allowing for refutable ones add a lot of power and craziness and should be considered with much care.

struct DumbPack{
	first: Option<Thing>,
	second: Option<Thing>,
}
type InformedPack<'a> = view &'a mut DumbPack{
	first: Some(mut first),
	second: Some(mut second),
};
impl DubmPack{
	fn inform(&mut self) -> Option<InformedPack>{
		match self{
			&mut DumbPack { first:Some(_), second:Some(_) } => Some(self),
			_ => None,
		}
	}
	//work knowing we have all options with values.
	fn work<'a>(self:InformedPack<'a>)
	{
		//Now this would be irrefutable.
		let Some(first) = self.first.as_mut();
	}
}

In this example InformedPack is a view of DumbPack asserting all its Option<_> have Some(_) value. This view has to be created inside the arm of a pattern matching implying that assertion. In some way, this seems to equip any type with some kind of provenance. I imagine @RalfJung will have something to say about this.

Another example is to remove some Variant of an integer to make it fit inside an Option without increasing its size. This combines refutable patterns with non-reference views.

type U8WithNiche = view::<u8> 0..=254;//<-- we remove the 255u8 variant
//Now Option<U8WithNiche> could be a single byte.

fn build_option_u8(x:u8) -> Option<U8WithNiche>
{
	//At low level this function is a nop.
	match x
	{
		//In each arm of the match the expression being matched (`x` here) would have to be coerceable as the corresponding view type.
		0..=254 => Some(x),//<-- Here `x` is coerced to `view 0..=254`.
		_ => None,
	}
}

If we include several variants in a view type we have some resemblances to anonymous enum types.

fn f(comp: view &std::path::Component::(CurDir|ParentDir|Normal(path)) )
	-> Option<&OsStr>
{
	use std::path::Component::*;
	match comp{
		Normal (path) => Some(path),
		CurDir | ParentDir => None,
		//No need to match the Component variants Prefix and RootDir.
	}
}

Reference-level explanation

References to a view

One must be aware of the differences between view &mut Pat, &mut view Pat, and view &mut Pat. From a &mut view Pat we can build a view &mut Pat, but not viceversa. A view &mut _ cannot access anything, a &mut view _ can std::mem::swap the content with another &mut view _. Or worse, since we may copy the empty view view &mut _, allowing the transposing cast would give UB.

// Good: `&mut view Pat` into `view &mut Pat`.
let mut thing : view (x,_) = (0,0);
let mv_thing: &mut view (x,_) = &mut thing;
let vm_thing: view &mut (x,_) = mv_thing;
// Bad: `view &mut Pat` into `&mut view Pat`.
let mut thing = (0,0);
let first: view &mut (x,_) = &mut thing;//<-- a view for the first element
let second: view &mut (_,y) = &mut thing;//<-- a view for the second element, obviously disjoint.
let first_empty : view &mut _ = first;//<-- we can always hide an elment.
let second_empty : view &mut _ = second;//<-- and again
let evil_first : &mut view _ = first_empty;//<-- we are still blind, right? NOPE, this is invalid.
let evil_second : &mut view _ = first_second;//<-- and invalid cast again
//Now we have a pair of `&mut T` on the same location!
mem::swap(evil_first,evil_second);//<-- UB

Is it not clear if doing this with a ZST (Zero-Sized-Type) would cause problems by replicating a &mut (). @RalfJung advises against these awful things but confirms that the Stacked Borrows model allows to do anything with ZSTs.

Casting a view &_ into &view _ is similarly invalid, as there may be another view &mut{mut something} incompatible with reading through &view _.

// Bad: `view &Pat` into `&view Pat`.
let mut thing = (0,0);
let vm_thing : view &mut (mut x,mut y) = &mut thing;
let empty_thing : view &_ = vm_thing;
//We have reborrowed anything from `vm_view` so we may still use it.
let evil : &view _ = empty_thing;//<-- The INVALID step, transpose the reference out of the field.
//evil is a reference to whole block of data, although we do not see into it, and is like padding from this view.
//I am not sure if we may read the 'padding'  from evil without using unsafe??
//Using the [safe-transmute](https://docs.rs/safe-transmute/latest/safe_transmute/to_bytes/fn.transmute_to_bytes.html) crate.
let bytes:&[u8] = safe_transmute::to_bytes::transmute_to_bytes(evil);
vm_thing.x = whatever();//<-- We mutate a part of that data.

For those same reasons we cannot use a &mut foo when we have a live reborrow view &_ of it. A &foo can be still used even if we make views of it.

Furthermore &mut view (x,_) is not a subtype of &mut view _, since &mut_ is invariant. Such casts would not be UB but would allow to arbitrarily bypass the view rules.

let data : &mut view(x,_) = ...;
let mut auxiliar : view (mut x,_) = (0,0);
let downgraded_auxiliar : &mut view(x,_) = &mut auxiliar;//INVALID operation.
std::mem::swap(data,downgraded_auxiliar);
//Our data is now on the auxiliar variable
drop(downgraded_axuiliar);
auxiliar.x = 42;
let downgraded_auxiliar : &mut view(x,_) = &mut auxiliar;//INVALID operation.
std::mem::swap(data,downgraded_auxiliar);
//Our data is again in its place,
//but we have modified a field that was supposed not to be modified.

Alpha equivalence

The types view &Foo{data:a} and view &Foo{data:b} should be the same type, even if their fields are accessed differently (as foo.a and foo.b respectively). Or at the very least, they should be freely coerceable into each other. The same goes for view &(i32,i32) (x,_) and view &(i32,i32) (y,_).

It must be clear that view &(i32,i32) (x,_) and view &(i32,i32) (_,x) are different types.

General Traits

Views cannot inherit traits from their base types. Specifically, view::<&mut T> Pat cannot implement Deref or DerefMut.

Are there traits we want to be implemented in views?

The Drop trait

The view is always of an actual type it is carrying. This means that when the view is dropped the viewed data also must be dropped. In turn this means the trait Drop is exceptionally inherited by the views. Nevertheless, the common usage is expected to go through references, which do not implement Drop.

repr(packed)

Views can make packed structs easier to work with. Since making references to their inner fields is UB, is not possible to use the alternative of making a function argument for each of the used fields.

Variance

Views can be covariant over hidden fields and fields with shared access, and invariant over fields with exclusive access. This is, we may cast view::<&mut (Cat,Cat,Cat)> &mut(mut a,b,_) into mut::<&mut (Cat,Animal,Animal)> &mut(mut a,b,_), but no further.

What is a semver change

  1. For the cases of explicit views, increasing the access of a field may cause a downstream crate to fail so it must mean a semver change.
  2. For a function with a where clause it is the same.
  3. The impl &Type case is more problematic. Changes in the function body that change the inferred view must be a semver change.
  4. For a exposed view, changing a receptor's name is a semver change. Note we may change the names of our internal fields and keep the external name in the pattern defining the view.

Drawbacks

While this proposal is a pretty much opt-in feature, we still have to interact with other crates that may opt-in. So some cognitive overhead is inevitable. Additionally, if your downstream clients want to opt-in you may need to adapt some code.

Even if the usability is very transparent, the implementation of this proposal may require too much work.

Rationale and alternatives

  • Explicitly declaring each view and their impls, like I suggested. This is so verbose that it would only be used in very special cases.
  • Fields in traits could be used to build views. However, this require to be explicit when defining the trait and when implementing it for some type, making difficult to allow for automatic inference. It may cause some churn because of dynamic trait objects.
  • Just apply partial borrows as necessary, like in this rfc or this IRLO discussion. The discussion derived as to how mark those usages in the function signature.
  • Mark some functions as to use non-local borrowcheck analysis IRLO thread.
  • More exotic attempts, like this.
  • Views as just collections of fields, without ascribing some base fields. This could be easy to use, and with lots of automation. However it has issues on assuming a memory layout (would they be Sized?) and on the order of the fields. It could not be used with nested structures.
  • Use Pattern classes? like Scala extractors? I have not seen this idea further elaborated, but it seems interesting.
  • Partially Initialized types, like this, was proposed as alternative to MaybeUninit, but it can be considered a view over maybe-uninitialized data.
  • Not doing anything. Many people have shown interest in different alternatives. Not doing anything is to ignore them.

Prior art

  • The name view was taken from SQL language.
  • Both slice and str have the methods split_at and split_at_mut. It seems natural to try similar methods for other types.

Unresolved questions

  • Any generic trait implementations should be extended to apply to views? Some CoerceUnsized perhaps?
  • When making an alias, should we allow to make some fields public? E.g., pub type UserCombined<'t> = view &'t mut Combined{pub shared,pub mut a}; with all fields private in the original struct.
  • Should the compiler recommend adding a view bound when it detects uses of self while borrowed?

Future possibilities

Return views from traits

It seems that views could be used to have some kind of fields in traits, a process of building views from those traits was already considered. There is some kind of reversal, but we need some additional bound for it to be useful.

Consider the bound V: view T{:a} to mean any view of the type T in which the name a is exposing some undetermined field.

trait Foo{
	type View : view &Self{:a};
	//I think this needs a `where Self::View::a : i32` to make it enough definite.
	fn get_view(&self) -> Self::View;
}
impl Foo for Example{
	type View = view &Example{thing:a};
	fn get_view(&self) -> &Example{thing:a} { self }
}
fn f(foo: impl Foo)
{
	let v = foo.get_view();
	// v.a behaves much like a field of the trait.
	work(&v.a);
}

Views of uninitialized data

With view::<T> Pat we guarantee T is fully initialized. However, in previous conversations was common for some people (I remember @Soni) to argue in favour of using them for partially initialized types. In principle we could consider a view::<MaybeUninit<T>> Pat, but we would need to allow some kind of pattern matching into MaybeUninit. I am liking this idea, perhaps defining an alias #T for MaybeUninit<T>. The Drop rule would go as this:

let thing: Example = ...;
let thing:#Example = thing;//<-- drop the contents of Example. Do not deallocate the memory.

Then view::<#Example> #Example{first} would mean an owned block of memory in which only first is known to be initialized. If we have initialized all the fields then we would be able to create to struct from its parts, we should be able to work in-place.

let thing : #Example = MaybeUninit::uninit();
let thing : view #Example{first,second,third}  = fill(thing)//<-- fill all the fields.
let thing : Example = thing;//<-- we have all the fields. So we could build Example by fields. Ensure to make it inplace, (i.e., a NOP)

A whole example of safely initializating an struct would be

struct Example<T>{ first:T, second:T }
impl Example<T>{
	fn start_initialization() -> #Self
	{
		MaybeUninit::uninit()
	}
	fn initialize_first<S>(mut self:S, value:T) -> view #Self{first,..S}
		where S: view #Self{}
	{
		self.first = value;
		self
	}
	fn initialize_second<S>(mut self:S, value:T) -> view #Self{second,..S}
		where S: view #Self{}
	{
		self.second = value;
		self
	}
	fn finish_initialization(self:#Self{first,second})->Self
	{
		self
	}
}

With this idea we could match inside a #T.

//Assuming Example::first : i32
let thing : #Example = ...;
let mut thing : view #Example {} = thing;
{
	let #Example{ref mut first	} = thing;
	// Here we have first:#i32
	first=7;//<-- change the type of `thing` to `view #Example {first}`
}

Although this needs additional rules.

let cond : #bool = MaybeUninit::uninit();
match cond{
	true => (),//<-- error, #bool does not have variants.
}

There are some other possibilities like view Foo{uninit x} or uninit Foo{x}.

Views of partially initialized slices

To do the same with slices, I find missing a pattern notation for a slice starting with some [T;N]. I can imagine a very far future Rust with

// Take a slice initialized up to some point and initialize another element.
fn f<T,N>( mut arr: view &mut#[ start @ ..[T;N], ..] , value:T) -> Either< &mut [T] , view &mut#[ start @ [T;N+1], ..] >
{
	match arr {
		[ start @ [T;N] ] => Left(start),
		//Assuming we have no problem in identifying
		// &mut [ ..[T;N], ..[next, ..]  ]
		// with &mut [ ..[..[T;N], next] , .. ]
		// and a bunch of other things.
		[ ..[ ..[T;N], next ], ..] => {
			// arr is coerced to the type `view &mut [ whatever @ ..[ ..[T;N], next ], .. ]`, with `next:#T`.
			arr.next = value;//<-- this changes the type of `arr` into `view &mut [ whatever @ ..[ ..[T;N], next], .. ]`, with `next:T`.
			//which can be coerced into `view &mut [ start @ ..[T;N+1], .. ]`.
			Right(arr),
		},
	}
}

References

  1. Niko's blog post "View types for Rust"
  2. Allow fields in traits that map to lvalues in impl'ing type #1546
  3. Niko's take on using traits with field to build views
  4. My IRLO thread Having mtability in several views of a struct https
  5. RFC on partial borrowing
  6. IRLO pre-RFC: “field” as an item and “borrows”
  7. withoutboats' comment about a pattern-classes idea
  8. Leonardo's IRLO thread on Partial self borrowing
  9. Soni's IRLO thread on Partially Initialized Types
  10. crlf0710's IRLO on opt-in for non-local borrowchecking analysis
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.