Discussion: A different approach to implement a borrow checker

I'm working on a kind of borrow checker that should be more flexible than current rust's one and allow more use cases. I don't propose making changes to rust here but ask for thoughts and opinions. I know this forum is for rust language but I think rust developers here have more knowledge about borrow checkers. Also I'm not that good in English so excuse me for language errors.

This is the article I've written:

This was a kind of references validator I implemented for debugging in c++ to detect use after free bugs. Instead of assigning a lifetime to each object and doing complex analysis to find out how long does this object remain valid, the compiler checks which reference is pointing to which object and references are validated on use.

Keep in mind that static analysis done at compile time can't cover all valid cases and will have many false positives so the high overhead c++ implementation can't be fully done at compile time.

Also note that the concepts and methods described here are not specific to a particular programming language but I will use a kind of rust syntax since it is the only programming language I know that has a borrow checker at compile time.

Let's start with examples

let i1 = 0;
let ri1 = &i1;
println!("{}", *ri1); // ri1 is valid because i1 is still valid
let r;
{
	let i = 0;
	r = &i;
	println!("{}", *r); // r is valid because i is valid
	// i is deallocated
}
println!("{}", *r); // error: r is invalid because i is invalid
let i2 = 0;
r = &i2;
println!("{}", *r); // ok: r is valid because i2 is valid
struct I32St {
	val: i32,
}

fn consume_st(_: I32St) {}

let s = I32St{ val: 0 };
let r = &s;
println!("{}", r.val); // ok: r is valid because s is valid
consume_st(s); // ok: r now is invalid but it is only checked on use
println!("{}", r.val); // error: r is invalid because s was moved (destructive move)
let s2 = I32St{ va1: 1 };
r = &s2; // r now refers to s2
println!("{}", r.val); // ok: r is valid because s2 is valid

Conclusion from previous examples: The compiler must know which reference points to which object and only check the validity of references when they are actually used (derefrenced, passed to another function, ...)

Now what if we want to pass references to functions how will the compiler determine what happened inside the function body? There are currently two approaches:

1- The compiler will see the function body and continue analysing as if it was a block of code in the caller function. This is used for lambdas since the compiler can see their body at the call site

2- The function signature will tell the compiler what does it wnat to do with its parameters and where does it get its return value

Let's say we have two references each pointing to a variable of the same type and we pass them by reference (reference to reference) to a function that will exchange them.

In rust one could write this code:

fn exchange_refs<'a, 'b>(i1: &mut &'a i32, i2: &mut &'b i32) {
    *i1 = *i2;
}

fn main() {
    let i1 = 1;
    let i2 = 2;
    let mut ri1 = &i1;
    let mut ri2 = &i2;
    exchange_refs(&mut ri1, &mut ri2);
}

But the compiler complains that: lifetime may not live long enough, Explaining: assignment requires that 'b must outlive 'a. Then hinting: consider adding the following bound: 'b: 'a

Let's follow the compiler:

fn exchange_refs<'a, 'b: 'a>(i1: &mut &'a i32, i2: &mut &'b i32) {
    *i1 = *i2;
}

fn main() {
    let i1 = 1;
    let i2 = 2;
    let mut ri1 = &i1;
    let mut ri2 = &i2;
    exchange_refs(&mut ri1, &mut ri2);
}

The code compiles fine now because the compiler after analysis that i1 outlives i2.

Instead of using rust's current behavior this is used instead:

  • annotations 'a and 'b are not called lifetimes, instead they are used to name two dependencies
  • The bound 'b: 'a is not interpreted : ab outlives 'a. Instead: the function moves (or may move) dependency 'a to 'b
  • Inside the function body it's enough to assume that all references passed by parameters are pointing to variables that are immediately dropped one the function returns
  • The caller will validate the references before passing them to the function then after the function returns the compiler will assume that ri1 now points to the value ri2 was pointing to (i2)
fn exchange_refs<'a, 'b: 'a>(i1: &mut &'a i32, i2: &mut &'b i32) {
    *i1 = *i2;
}

fn main() {
    let i1 = 1;
    let i2 = 2;
    let mut ri1 = &i1;
    let mut ri2 = &i2;
	// assert that &mut ri1 and &mut ri2 are valid
    exchange_refs(&mut ri1, &mut ri2);
	// now assume ri1 points to i2 and continue
}
// the function tells the caller that its return value refers (or may refer) to either 'a or 'b
fn max_of<'a, 'b>(i1: &'a i32, i2: &'b i32) -> &'a + 'b i32 {
	// returned reference must refer to either 'a or 'b
	if *i1 > *i2 {
		return i1;
	} else {
		return i2;
	}
}

fn main() {
	let mut max_n = max_of(&0, &1); // ok: references are valid inside the function body and max_n is a reference referring to both
	println!("{}", *max_n); // ok: max_n points to valid static values 0 amd 1
	{
		let i1 = 0;
		let i2 = 1;
		max_n = max_of(&i1, &i2);
		println!("{}", *max_n); // ok: max_n points to valid i1 amd i2
		// i1 and i2 goes out of scope
	}
	println!("{}", *max_n); // error: max_n points to invalid i1 amd i2
	{
		let i = 0;
		max_n = max_of(&i, &1);
		println!("{}", *max_n); // ok: max_n points to valid i and static 1
	}
	println!("{}", *max_n); // error: max_n points to invalid i1 and static 1
}

This model is simpler than rust's one and easier to implement and understand.

There is another thing that needs dependencies annotations, it's structs ! Structs can contain references as fields and the compiler needs to name the dependencies of these references across function boundaries to know who depends on whom.

// the compiler knows that RefHolder may depend on i32 variable
struct RefHolder<'a> {
	r: &'a i32,
}

impl<'a> RefHolder<'a> {
	// after the call the caller will assume that r is now a dependency of returned RefHolder since
	// it's annotated with 'a
	fn new(r: &'a i32) -> Self {
		Self { r: r }
	}
	// after the call the caller will assume that both r and r2 are dependencies of self
	fn add_ref(&mut self, r2: &'a i32) {
		self.r = r;
	}
}
// after the call the caller will assume that RefHolder pointed to by rh also depends now on value pointed to by r
fn add_ref_to_ref_holder<'a, 'b: 'a>(rh: &mut RefHolder<'a>, r: &'b i32) {
	rh.r = r;
}

fn main() {
	let i1 = 0;
	{
		let i2 = 1;
		let mut rh = RefHolder::new(&i1); // rh depends on i1
		rh.add_ref(&i2); // rh depends now on both i1 and i2
	}
	let mut rh = RefHolder::new(&i1); // rh depends on i1
	let i2 = 1;
	add_ref_to_ref_holder(&mut rh, &i2); // rh depends now on both i1 and i2
}

The compiler must make sure that two fields of a struct of unrelated types don't share the same dependency annotation

trait SomeTrait {  }

impl SomeTrait for i32 {}

struct SomeStruct<'a, 'b> {
	r1: &'a i32, // ok: 'a names a dependency of type i32
	r2: &'a dyn SomeTrait, // ok: r2 may also refer to i32
	r3: &'a i64, // error: 'a can't refer to both i32 and i64
	r4: &'b f32, // ok: 'b names a dependency of type f32
}

This will help us while implementing multiple mutable aliases later.

One may ask why the compiler may assume that a reference refers to multiple values while a reference can only point to one value at a time? The problem is that if a function says it links 'a to 'b does not mean that it must do that exactly. It might only assign a reference under a certain contition so the compiler can't be sure that the reference now points only to 'b. There may be a syntax for the function to inform the compiler that is guaranteed that 'a will always be 'b after the function returns for example 'a = 'b and this will be enforced inside the function.

Doos this add anything to the current rust's borrow checker? Yes, One advantage is that it is easier to implement and understand. Another is that it may be exteneded to support multiple mutable aliases as we will see later.

In next articles I'll explore how to safely have multiple mutable references to the same object, how threads should be implemented and how to teach this kind of borrow checker to allow custom use cases that diverges from the general use case.

This requires inspecting the body of exchange_refs, right? As exchange_refs could choose to not change ri1 and thus keep it pointing to i1. That did make changes to function implementations breaking changes. It did also blow up compile time as we can't borrowck all functions independently.

So you did have to add extra lifetimes in that case? I don't see any advantage to that.

You can already have mutable aliases using &Cell<T> and &RefCell<T>. Making &mut T aliasable would be unsound not just because it will trivially allow data races, but also because it will allow iterator invalidation while still iterating, which is UB.

I only find your proposal confusing, not easier to understand.

In any case how would you suggest updating rust for your proposal in a backward compatible way? At least this part is a breaking change:

man did you read what I wrote?

I find your writing rather confusing but I believe that you are not working on a borrow checker but on a lower-level verification more akin to @RalfJung's Stacked Borrows model. That is, the set of rules that could define when is okay to dereference a pointer within a Rust's unsafe block. You may find experimental support for that and related matters in Miri.

Missed that part. My bad. I was focused on trying to understand the borrow checker scheme itself.

I updated the article and uploaded it on github here borrow-checker/README.md at main · cppdev123/borrow-checker · GitHub

I think this has nothing to do with stacked borrows

Your proposal seems to be mainly driven by specific examples showing what the user would expect. Could you also show what would the compiler reasoning be in the general case, and also give some reasoning of why this would be sound, still in the general case? I see you wrote at the end:

There are many aspects I didn't address, refer to or didn't know of or otherwise forgot to mention so I'm not surprised if any of the methods introduced here are wrong or make nonsense. Any feedback will be welcome!

But it feels very hard to argue about your methods given that there's no formal definition of them, we would be arguing about our personal interpretations.

1 Like