Idea: make assert!() a keyword - REJECTED

ckaran · September 15, 2020, 2:21pm

Continuing the discussion from Paying developers for pull requests:

As it turns out, I've been thinking about how to do this for a while now, chewing over different ideas. Some possible directions:

Develop a software verification language, analogous to hardware verification languages in chip design.
Strongly encourage the use of types everywhere, so that we can take advantage of the Curry-Howard correspondence and type theory in general.
Make assert!() (and related macros) into full keywords in the language.

The first point would mean that one needs to learn rust, then maybe how proc macros work, and finally learn the verification language. That is an incredible amount of work, and will likely lead to the weirdness that was Objective-C++.

The second point leads to exponential growth in the number of types, even when you use traits and trait bounds to the maximum you possibly can. E.g., an integer can be valid over some range, if it is interpreted as kilograms, etc., etc. etc. That isn't really sustainable in the long run.

The third point is the interesting one. Suppose that assert!() were a keyword in rust. You could express conditions like the following:

fn foo(i: isize) -> u8 {
    assert!(i < 256);
    assert!(i >= 0);

    i as u8
}

fn big(i: isize) -> isize {
    assert!(i >= 256);

    i
}

pub fn bar(i: isize) -> u8 {
    foo(i)
}

pub fn baz() -> u8 {
    foo(23)
}

pub fn broken(i: isize) -> isize {
    let temp: u8 = foo(i);  
    let temp: isize = u8 as isize;
    big(temp)
}

The compiler would have the ability to do all of the following:

If some pieces of information are unknown, then the current behavior for assert!() would continue to be used, generating code that is executed at runtime. So the compiled version of bar() might have foo() inlined into it, assert!() statements and all.
If all pieces of information are known at compile time, then something like Chalk could verify that the claims are true statically. Thus, for baz() the internals could be simplified to 23.
Verification code could be pushed out to the public interface. That is, the assert!() code in foo() need not actually be in that block; we know that foo() is a private function, and therefore not visible outside of the crate. We could move the test into the public functions bar() and baz() instead. Since baz() can be simplified to 23, we could remove the verification code entirely.
Verification statements could be combined to see if any inputs can satisfy the constraints. broken() cannot be satisfied under any conditions, and this can be verified at compile time.
Testing and verification can be reduced significantly. Right now, the only way I can verify that baz() works correctly (or that broken() is indeed broken) is by creating a unit test for it and seeing if any assertions are triggered. For functions that have complex arguments, or for methods of objects with a large amount of state, this can be very difficult to test; either the programmer needs to know all of the triggering conditions, or has to use a good fuzzer to try to find combinations of inputs that trigger failures. Building in a proof checker like this means that you don't have to worry about that.
It would make the type system itself more powerful, and would make optimizations possible that aren't possible today. For example, consider the following code:
```
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct Foo {
    field: f64
}

impl Foo {
    pub new(field: f64) -> Foo {
        assert!(!field.is_nan());
        Foo{field}
    }
}
```
in the far future, Chalk might be able to determine statically that the only way to set Foo::field is via Foo::new(), and that means that field can never be NaN. That means that the derive of Eq, and Ord would be valid. I don't mean to imply that this will be easy to implement, just that it would be possible.

Finally, making assert!() a keyword is the least intrusive method I can think of for the end user of rust. You don't need to learn a new language, you don't have to add in other code separate from your codebase that you need to maintain, and you don't have combinatorial explosion from an excess of types.

So, thoughts?

FINAL OUTCOME OF DISCUSSIONS

This is what I consider to be a final outcome of the discussions. THE PROPOSAL IS REJECTED.

After thorough discussions, two things were finally noted. First, because assert!() is a part of libcore and libstd, attempts to redefine it would lead to compile errors due to the ambiguity as to which version of assert!() was in use. That ensures that misguided (or malicious) attempts at replacing assert!() will be detected at compile time, ensuring that there is only one 'real' definition of assert!() already.

Second, the macros in libcore/libstd are in @CAD97's words, special. For at least some of the macros, the compiler is already aware of them, which is why they work. assert!() can be treated in exactly the same manner already.

Combining these facts together, assert!() is already close enough to being a keyword that formalizing it as such is unnecessary, which is why the proposal was rejected.

mjbshaw · September 15, 2020, 2:35pm

Why does any of this require assert! to be a keyword? Why isn't usual static analysis sufficient?

ckaran · September 15, 2020, 2:44pm

Is all of what I mentioned earlier done in static analysis? AFAIK, it isn't, but that might just be my ignorance showing.

mjbshaw · September 15, 2020, 2:53pm

It's not currently done, but I can't see any technical reason it couldn't be done. It's just a lot of work. I don't think elevating assert! to a keyword changes that.

bjorn3 · September 15, 2020, 2:58pm

A static verifier could prove a failed assertion instead of prove the absence of a failed assertion.

An optimizer may be able to prove that it is safe to remove the assertion in certain cases. Running any kind of static verifier during compilation however is way too costly.

See point 1.

Proof verifiers are not as easy to use as you think. From what I have heard you often need to manually write part of your proof, as it gets to complex to automatically generate.

As I already mentioned earlier in this post any kind of static verifier of proof checker is way too slow to run during compilation.

There is no need to make it a keyword. For example prusti works for everything that panics. Not just some magic assert keyword.

ckaran · September 15, 2020, 3:38pm

I agree with the first statement, but only partially agree with the second statement. We already have debug and release modes in rustc, where one of the goals of the debug mode is to compile as quickly as possible. Release mode is slower, but produces better code. This could be another mode/compiler flag that is normally off, and which is only turned on when the developer really needs it. I personally wouldn't care of my normal compile times increased significantly if I had stronger assurance of correct code.

I think that in some cases you could prove the assertion itself. That will require some work though.

I have some experience with them, and you're right, they are proof verifiers and not proof writers. That said, I suspect that we can get a long way with assert statements alone.

But is this sufficient for the compiler to be able to do optimization?

wesleywiser · September 15, 2020, 4:23pm

If you aren't changing the language semantics, then there's no reason this needs to be done in the compiler or the language. You can simply run prusti or whatever tool you choose after your build.

But is this sufficient for the compiler to be able to do optimization?

In general optimizations are "best effort". If you look at your code example on the playground and ask for the LLVM IR in Release mode, you'll see

; playground::baz
; Function Attrs: norecurse nounwind nonlazybind readnone uwtable
define i8 @_ZN10playground3baz17hf18dd909003dc7f0E() unnamed_addr #0 {
start:
  ret i8 23
}

which means foo was inlinined into baz and the asserts were removed since they can never fire leaving just the return of the constant value 23.

In general, Rust tries really hard not to introspect the bodies of functions to learn things about them beyond what their function signature says.

scottmcm · September 15, 2020, 5:53pm

I think keyword is probably overkill, but attribute might be interesting.

There's two things I'm thinking about there:

If it's an attribute, lints could read it to provide generic lints about constant expressions passed to the function.

For example, clippy has a specific iterator_step_by_zero lint. But step_by has an assert!(step != 0)†, so if that were exposed to the lint infrastructure it could be one lint for "always asserts".

And who knows, maybe that would also make it easier for a more complicated static analysis tool to try to prove/disprove them as well, for more than just constants.

† Though not in exactly the place we'd want for what I'm talking about here, but that's a side point and it could be moved.

If it's an attribute, it could also do more complicated transformations to make it more likely that the asserts will be checked in the caller.

This could do something along the lines of outlining, for more complicated methods. Imaging a complicated, non-generic function:

pub fn foo(x: i32) -> i32 { 
    assert_eq!(x & 1, 0);
    ... lots of complicated work here ...
}

We want to compile the complicated stuff only once, but we'd also like to elide the asserts by evaluating them in the caller. So imagine that we transformed that code into something like this:

#[inline]
pub fn foo(x: usize) -> String { 
    assert!(x < isize::MAX as usize);
    unsafe { inner(x) }

    #[forbid(unsafe-op-in-unsafe-fn)]
    unsafe fn inner(x: i32) -> i32 {
        unsafe { assume(x < isize::MAX as usize) };
        ... lots of complicated work here ...
    }
}

That way the conditions can get inlined and hopefully folded away, while still allowing separate compilation of the complicated bit and letting LLVM to optimize based on the conditions.

(There some clear weaknesses in here, like assume often being not a good idea, but hopefully it gets the idea across.)

kornel · September 15, 2020, 6:05pm

The keyword/macro here seems to be the easy part. It could probably work just as well with if !cond {return} or anything conditional and diverging.

In a basic sense this already works in LLVM for optimization purposes. LLVM will notice asserts (and other code paths that panic or return) and use their assumptions to e.g. remove obviously unnecessary bounds checks.

However, if it this becomes more than just invisible optimization, and is going to be propagated across functions, and affect language semantics like Ord compatibility, then it starts to look like dependent types. That would be a nice feature, but it'd be a huge language change, way bigger than just parsing of assert.

ckaran · September 15, 2020, 6:11pm

Works for me! Are there any locations where assert!() can be used that an attribute can't be? I'm not just thinking about the arguments to a function, but also the results of calculations within a function. (This is an honest question, I just don't know enough to know).

ckaran · September 15, 2020, 7:05pm

What part of the language would change if dependent types were introduced? I skimmed the link you gave, and it appears that the language would remain the same, but the compiler might generate more/different code, maybe creating new objects with automatically generated marker traits of some kind.

Xanewok · September 15, 2020, 8:46pm

It'd be interesting to optionally introduce Hoare's logic in some way or another. For some early ideas, see GitHub - nrc/libhoare: Design by contract style assertions for Rust (that's from before proper attribute proc macros)

ckaran · September 15, 2020, 8:52pm

The pre/post/invariant idea was part of what I was thinking could be subsumed by assert!().

Tom-Phinney · September 15, 2020, 9:07pm

assert!() seems inadequate to cover all the related uses. See Alastair Reid's summary of Rust verification tools for an overview, and in particular the multiple versions of pre!()/post!()/invariant!() listed in the contracts crate, and the much larger variety of assertions listed in the MIRAI annotations crate.

kornel · September 15, 2020, 9:41pm

The type system. In a very fundamental way. Currently it's always certain that f32 values are compatible with f32 values. They're the same type. You know everything about this type just by its name.

With dependent types it's no longer possible to tell whether f32 value can be passed to a function or assigned to a variable that accepts f32. Now you have to scan all the code paths and all the function bodies that may possibly interact with the two instances you evaluate to learn their actual hidden type inferred from code.

Consider the Ord-compatible Foo:

impl Foo {
    pub new(field: f64) -> Foo {
        assert!(!field.is_nan());
        Foo{field}
    }
}

Now if the compiler sees:

impl Foo {
    pub fn set(&mut self, value: f64) {
        self.field = value;
    }
}

Is this code valid? What should happen?

It may say "Error! Can't assign f64::with_NaN to self.field which has the type f64::without_NaN". It could be interpreted as if the field type was different than regular f64.

Or it could accept this definition, but interpret it as equivalent of pub fn set<F>(value: F) where F: f64 + Ord, and allow foo.set(1.0), but err on foo.set(1./0.), or require assert!(!x.is_nan()); foo.set(x).

In both cases you have new features in the type system - either new types or new where clauses that don't use existing type syntax, can't be named other than by writing assert!. I'm not judging whether that would be good or bad, but it is certainly a big departure from how types work in Rust currently.

ckaran · September 15, 2020, 9:53pm

Got it, thank you for the clear explanation.

kornel:

Consider the Ord -compatible Foo :

impl Foo {
    pub new(field: f64) -> Foo {
        assert!(!field.is_nan());
        Foo{field}
    }
}

Now if the compiler sees:

impl Foo {
    pub fn set(&mut self, value: f64) {
        self.field = value;
    }
}

Good point... My initial thought was that you'd construct the union of the set of values that self.field can take on, (i.e., new()'s set is missing f64::NaN, but set()'s set has it), and make that the new type, but now that I think about it I can see that won't work.

This is a tougher nut to crack than I first thought!

ratmice · September 15, 2020, 10:01pm

Besides agreeing with Tom, and the fact that theres lots of tools for adding these attributes to your own source, It would be nice to have these in plain-old libstd,

It has some obvious upsides, but if given actual compatibility guarantees these {c,w}ould be problematic in future evolution of the std library. It would be nice of there was a effort to toe the line, Allowing these attributes as something-like-unstable feature, but not necessarily guaranteeing the future compatability for the conditions (i.e. it would be fine to remove one if an algorithm changes.)

Anyhow -- If we want to go through all the trouble of language blessing these attributes, It is good to have plan in place so that we could actually make use of these in std. If we can't make use of them in std for future compatability concerns we don't really gain much besides the status quote external tools are already in.

mankinskin · September 16, 2020, 12:19am

Regarding dependent types, I am not sure about the current RFC state, but why not just have something like this:

type Ordf32 = f32 where |x: f32| !x.is_nan();

Meaning you can just apply predicates to values of a type and create a new type like that. If the predicates are const functions, the compiler can type check them at compile time, otherwise it could be checked at run-time.

phlopsi · September 16, 2020, 8:15am

If you need to assert something like the above, I'd consider giving your code the str treatment, i.e. create a type that guarantess, that the assertions never fail and accept that as an argument, instead. This does increase code complexity, but at the same time improves code optimization, because you only have to check once and the compiler performs these checks closer to where they're actually needed.

If you pass around your value through several functions before asserting the content, the compiler may fail to remove the assertions. If you had been passing around the checked type, instead, chances are higher, that the compiler can figure out, right at the beginning, that the assertions can be optimized away. Asserting is kinda like checking for null in Java. The difference between Rust and Java is, that you can actually prevent the mess of having to assert everywhere, because Rust's type system works for and not against you.

ckaran · September 16, 2020, 2:29pm

I like this formulation (the compiler could have a stage where it generates this type of code automatically), but I think what @kornel was getting at was deciding which transformation is the correct one can be difficult. In his earlier post, he gave the following example:

kornel:

impl Foo {
    pub new(field: f64) -> Foo {
        assert!(!field.is_nan());
        Foo{field}
    }
}

impl Foo {
    pub fn set(&mut self, value: f64) {
        self.field = value;
    }
}

The issue is deciding what the new dependent type is going to be; according to the assert!() in new(), the compiler should create and substitute in Ordf64 for Foo::field's type. According to set(), Foo::field should remain an f64.

In my mind, this is the kind of error that the compiler should detect. It is impossible to satisfy both requirements at the same time. The problem is how in the world do we do that?!?

A possible solution[1] is to use @mankinskin's idea to generate new types automatically whenever a variable is written[2]. The new type contains the conditions for all the asserts, just as @mankinskin showed. The algorithm for generating the types will need to be deterministic, and produce types that can be compared to one another for equality[3]. In the next pass, if a variable has more than one type, then those types are compared to see if they are compatible. Initially, I see 'compatible' as 'equal'. I think that proving the set of values each type is able to represent is sufficient for this particular use case, but I'm not sure. Can anyone see a case where where this might not be true?

In the future, it may be possible to prove definitions are not equal, but are compatible. However, as far as I know, that would result in a relaxation of the rules, permitting compilation of code that wasn't previously permitted. I think that maintains rust's forward compatibility guarantees, which is why I'm suggesting focusing on equality first.

[1] This is purely a spur of the moment idea, and probably has a lot of downsides that I haven't thought of yet. Take it as a starting point, not a complete thought.

[2] I don't think we need to consider when a variable is read, but I'm not sure. It may be that we need to create a new type for each and every access.

[3] No, I don't know how to do this, this is just the starting point for the idea, I'm relying on other, smarter people to figure out how to do this part!

Topic		Replies	Views
Reviving RFC 1662: assert_lt, assert_le, assert_gt, assert_ge libs	6	1854	September 26, 2020
Idea: a path toward reducing the need for `unsafe` by making invariant machine checkable language design	11	1441	April 15, 2023
Keywords to reserve for Rust2018 epoch	37	5898	March 25, 2019
Design lang language design	14	864	January 31, 2021
Idea: keyword gate annotation #![keywords(...)] language design	13	1183	March 25, 2019

Idea: make assert!() a keyword - REJECTED

FINAL OUTCOME OF DISCUSSIONS

Related topics