Idea: make assert!() a keyword

Types are nothing but sets of possible values. Dependent types are just types that are defined using other values, i.e. they use values of some type in their own definition. Correct me if I am wrong about this, but this is my understanding right now.

This means that dependent types could be checked just like any other type by the compiler, if its definition is a completely constant expression. Otherwise, if it was dependent on runtime values, the compiler would need to add run time assertions at the right places.

The problem with Foo::set is that this assertion is missing. To ensure that the data of Foo is indeed of type Ordf32, any data that is written to it needs to be Ordf32, which does not happen in Foo::set. These kinds of assertions should be done by the compiler and should be implicit and optimized, however it should be obvious when a typed is dynamically checked like this, as this might be expensive when it cannot be optimized away.

Maybe I am missing something but I think all we need for dependent types (and thus more verifiable code) is a syntax to tell the compiler "Ensure that values of this type always satisfy these conditions", first at runtime, then at compile-time using const functions.

This is basically what a type system is doing already, I imagine, so basically this is just a change to make the type system more programmable.

1 Like

Actually, that is the heart of the problem. As the example was written, the programmer claimed that Foo::field is an f64. The compiler has now detected that in actuality it is either Ordf64 or f64. The compiler doesn't know the programmer's intent (maybe the programmer was updating the code to allow f64::NaN, but forgot to change both the derives and Foo::new()). The compiler can't know programmer intent, only what the programmer actually wrote. So it cannot arbitrarily decide that Foo::field is actually Ordf64 and declare that Foo::set() is wrong. All it can do is detect that they are incompatible, and throw it back as an error.

2 Likes

How much change does this induce in Rust's HIR and MIR? (Naïvely, I presume that the dependent type would be encoded as a base type plus a restriction assertion function.)

For reference and an examplary approach, how does Ada inject such checks into its (presumed) IR and eventually into LLIR (LLVM's IR)? Could that serve as a guide for Rust?

Addendum: I suspect that restriction assertion functions will need to be associative and commutative so that they enable somewhat-arbitrary composition of restrictions, enabling values of dependent types to be converted from less-restrictive to more-restrictive without concern for the order in which the successive restrictions are applied. (Such freedom of ordering the restrictions enables more compiler optimization.)

3 Likes

I mean, this could be done using todays Rust already:

struct Ordf32 {
    inner: f32,
}
impl From<f32> for Ordf32 {
    fn from(x: f32) -> Self {
        assert!(!x.is_nan());
        Self { inner: x }
    }
}

However as soon as you implement more methods for Ordf32 it gets easy to break the assertion without noticing:

impl Ordf32 {
    fn div_by(&mut self, x: f32) {
        self.inner /= x; // self.inner.is_nan() if x was 0.0
    }
}

so it would be useful to define types which guarantee that values of this type always satisfy the constraints.

However I am unsure how to do this correctly, especially without just crashing whenever the constraints are violated. Const constraints could probably be checked everywhere at compile-time, but for run-time checks it would be more useful to return Result instead of just panicking.

I think it would be best to turn this into a macro first, before making a compiler feature of this.

Do you mean for:

In my not-so-humble opinion, crashing is the correct behavior. assert!() doesn't return a result, it just crashes immediately. This would be no different. The only thing I'd like to see is a better backtrace/error reporting setup so that when a crash occurs it can be debugged easily.

Normally I'd agree with you, but I honestly don't see how this could be done as a macro. Can you please explain how to handle our running example below?

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct Foo {
    field: f64
}

impl Foo {
    pub new(field: f64) -> Foo {
        assert!(!field.is_nan());
        Foo{field}
    }
}

impl Foo {
    pub fn set(&mut self, value: f64) {
        self.field = value;
    }
}
1 Like

Given that right now even

pub type VecOfCopy<T: Copy> = Vec<T>;

doesn't work, there's no "just" about it :upside_down_face:

3 Likes

This looks very good to me – I mean from the user point of view… I'm definitely not capable of telling how much it could be possible or difficult to implement in the compiler.

Anyway, I think that requiring the where clause to be a const function for the compiler to statically verify the constraints is good. Generating various runtime checks for such constraints, if the compiler can't verify them statically, opens a can of worms IMO. I appreciate the Rust way "it compiles, it runs" a lot, it gives me a great and very different alternative to my bread and butter, which is Java (perhaps my concern with the runtime checks is now better understandable :smirk:).

1 Like

But that would be nice too! :heart_eyes:

1 Like

The issue is that there will always be cases where you can't know at compile time if something is true or not, which forces you to use runtime checks. The advantage of the proposal is that the compiler can decide if it needs to put the code in or not, rather than the end user. That can reduce the overall amount of work done at runtime.

1 Like

It also means that the (speed or size) cost of code to check the assertions at runtime likely will decrease as compilers get better and as ISAs (instruction set architectures) evolve to better support such checking.

I don't agree with this.. Panics are unhandable errors. The only place I find them useful is in small test programs, which are allowed to crash, but in larger programs, where reliability is crucial, I want to be able to catch errors and handle them programmatically.

The way I could imagine this to work is something like this

#[constraint(!field.is_nan())]
struct Foo {
    field: f64
}

Which implements the type like this:

struct Foo {
    field: Constrainedf64
}
struct Constrainedf64(inner: f64);

impl From<f64> for Constrainedf64 {
    fn from(field: f64) -> Self {
        assert!(!field.is_nan());
        Self(field)
    }
}
impl Into<f64> for Constrainedf64 {
    fn into(self) -> f64 {
        self.0
    }
}

And makes it usable like this

impl Foo {
    pub new(field: f64) -> Foo {
        Foo{field: field.into()} // calls constraints
    }
}
impl Foo {
    pub fn set(&mut self, value: f64) {
        self.field = value.into(); // calls constraints
    }
}

The main point here is that the macro replaces the type definition with protected types, making it possible to manage how they are created and changed.

This could be implemented using TryFrom or something and it might work for const functions aswell, but I have never used those before, so I am not sure.

I understand where you're coming from, but the issue is that I see assert!() as a mathematical truth; once you break that truth, everything after will result in Undefined Behavior.

Put another way, when you have recoverable error handling, the compiler is relying on the programmer to do the right thing after s/he has already demonstrated that they don't know what they're doing! With assert!() panicking as soon as a truth is violated, we get strong guarantees about the behavior of the program which means that the compiler can be much more aggressive in how it optimizes code in the future. E.g., a common programming trick in the C world is that if you know your pointers are always 4 byte aligned, then you can use the lowest 2 bits as flags. With a panicking assert, the compiler could do that automatically. With a recoverable assert, the compiler can't be sure that is possible, and needs to be more conservative in its approach.

1 Like

I would like to have both. For constant constraints there will be a compiler error, which will catch any violations before the program runs. When the constraints can not be run by the compiler, say because it needs to read a file (lets imagine we have a type that can only contain PDF files), then I think it would be useful to recover from a violation, say because the user used the wrong file, without the program always crashing. The constraint will still be enforced, and as with any Result we can write code that runs only on the successful value, which is definitely using a value that satisfies the constraints, and which can be optimized for this just as if the result had never been there. Maybe we need different kinds of macros:

#[assert_constraint(...)]
struct Foo {...}
// applies an assertion before constructing the type and implements From

#[try_constraint(...)]
struct Foo {...}
// applies a function returning an Option<ConstraintError> and implements TryFrom

#[const_constraint(...)]
// applies a constant predicate and implements From, triggers compile_error! if const constraints are not satisfied

That way we could write something like this:

fn check_pdf(file: &File) -> Option<ConstraintError> { ... }

#[try_constraint(check_pdf(file))]
struct PdfFile {
    file: File,
}

or maybe

fn is_pdf(file: &File) -> bool { ... }

#[try_constraint(is_pdf(file))]
struct PdfFile {
    file: File,
}
match PdfFile::try_from(File::open("./my_file.pdf").expect("Failed to open file")) {
    Ok(file: PdfFile) => ...,
    Err(error: ConstraintError) => ...,
}

There is no magic here, it is just a runtime check which could easily be built in today. The benefit is simply the shared interface and the concise syntax.

1 Like

I'm not averse to a try variant that always returns a Result. However I feel that the panic variant should be the default, which I suspect most Rustaceans will use most of the time, switching to the Result variant only after the basic algorithm has been validated.

1 Like

So I experimented a bit, and I could not really come to a useful solution. First, const functions are not really far advanced enough to allow for constrained constant values, so I sticked to run time checks first. But even there are problems with ergonomics, for example:

#[constraint(self.x != self.y)]
pub struct Unsquare {
    x: u32,
    y: u32,
}

This is one of the tests I wanted to get working. The type should be usable just as usual, but creating any instances of it which do not satisfy the constraint (x != y) should cause a panic. However there is no way to prevent a library user from implementing functions which create an Unsquare, but do not enforce the constraint, or at least I could not think of any.

I thought of having the macro replace the internal types by hidden types, which handle the checked construction:

// after macro expansion
pub struct Unsquare {
    x: ConstrainedU32,
    y: ConstrainedU32,
}

But this does not allow for checks such as x != y because that requires multiple fields to be compared. So Unsquare itself needs some way to check if the constraint is satisfied.

I guess you could have the macro expand to a managed type altogether, so that Unsquare just wraps the constrained version:

// after macro expansion
pub struct Unsquare(ConstrainedUnsquare);
struct ConstrainedUnsquare {
    x: u32,
    y: u32,
}
impl ConstrainedUnsquare {
    pub fn checked(self) -> Self {
        assert!(self.x != self.y);
        self
    }
}
impl Deref for Unsquare {
    type Target = ConstrainedUnsquare;
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
impl DerefMut for Unsquare {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0
    }
}

But this just doesn't keep the user from changing the internal data to anything without checking it. Also this doesn't work well with any impls the user defined for their type, for example:

impl Unsquare {
    pub fn new(x: u32, y: u32) -> Self {
        Self {x, y}
    }
}

doesn't work anymore, and it is not checked either. It should become something

impl Unsquare {
    pub fn new(x: u32, y: u32) -> Self {
        Self(ConstrainedUnsqare{x, y}.checked())
    }
}

But that would need to be applied to every possible impl the user created, and there is no way to basically "check after any kind of write to this type".

Long story short, I don't know how this could be done in an ergonomic way that guarantees that the data keeps the constraints.

So I guess the best you can do currently is implement the checks manually and enforce your rules yourself.

2 Likes

That works for me. Purely as a matter of personal preference, I'd prefer just assert_constraint() and try_constraint(), with the compiler trying to prove what it can at compile time (throwing a compile error whenever it can prove a constraint is violated), and providing the appropriate runtime checks if it can't prove the constraint. This makes it easier to improve the compiler over time (initially, this could result in runtime assertions everywhere, with changes to hard compile errors over time).

Bikeshedding here... this feels like a recoverable error, so Result<(), ConstraintError>, or even Result<(), Box<dyn std::error::Error>> feels more appropriate.

That was what I was afraid of. So this really needs to be implemented and enforced by the compiler for it to be useful. Do you have a feeling for how close the RFC is to being accepted? Or how much effort might be involved in implementing it?

2 Likes

Sounds like you're looking for a privacy boundary, and thus a module. For example, maybe that example turns into something like

mod internals {
    pub struct Unsquare { x: u32, y: u32 };
    impl Unsquare { ... lots of methods here that can use the fields directly ... }
}
pub use internals::Unsquare;
4 Likes

Actually, I still got a somewhat usable pattern here, which could be built into a macro:

// #[constraint(self.x != self.y)]
pub struct Vec2 {
    pub x: u32,
    pub y: u32,
}
// generated by macro...
impl Vec2 {
    fn constrained(self) -> Result<ConstrainedVec2, String> {
        if self.x != self.y {
            Ok(ConstrainedVec2(self))
        } else {
            Err("constraint violated".into())
        }
    }
}

pub struct ConstrainedVec2(Vec2);

impl ConstrainedVec2 {
    #[must_use]
    pub fn set(&mut self, x: Vec2) -> Result<(), String> {
        x.constrained().map(|x| *self = x)
    }
}
impl std::convert::TryFrom<Vec2> for ConstrainedVec2 {
    type Error = String;
    fn try_from(x: Vec2) -> Result<Self, Self::Error> {
        x.constrained()
    }
}
impl std::ops::Deref for ConstrainedVec2 {
    type Target = Vec2;
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

Using this would look like this:

#[constraint(self.x != self.y)]
pub struct Vec2 {
    pub x: u32,
    pub y: u32,
}
// ConstrainedVec2 is now available

let mut c = ConstrainedVec2::try_from(Vec2{x: 1, y: 2}).unwrap(); // fine
c.set(Vec2{x: 3, ..*c}).unwrap(); // fine

ConstrainedVec2::try_from(Vec2 { x: 2, y: 2 }).unwrap(); // panic

let mut c = ConstrainedVec2::try_from(Vec2 {x: 1, y: 2}).unwrap(); // fine
c.set(Vec2{ x: 2, ..*c}).unwrap(); // panic

I mean, it works, but is not very handy. Instead of the macro it would be nice to have a syntax supported by the compiler to constrain types like this. I liked the syntax of where clauses after type, but that would change a lot about the type keyword I guess (which is meant for type aliases). Maybe we could have where clauses after struct definitions?

struct UnsqareVec2 {
    x: u32,
    y: u32,
} where |self| self.x != self.y;

struct OddU32(u32) where |self| self.0%2 != 0;

I have never worked on a compiler before, so I can not really say. I guess we would have to work out the exact syntax and semantics and then go from there, with the help of some more experienced rustc developers.

The semantics are basically just to implicitly check these kinds of types whenever their memory changes, right?

2 Likes

OK, I want to double check with everyone; are we now talking about using attributes instead of asserts? that is, would we (as programmers) be expected to convert our assertion-laden code to using the constraint attribute, or will the compiler be able to do this automatically for us?

I'm personally more comfortable with the idea of assert_constraint!() and try_constraint!() rather than attributes because they are often easier for me to read.

EDIT: I'm back, had to take care of my son.

The reason I'm concerned about attributes vs. macros is ergonomics. Consider the following code:

pub struct Decoded {
    // Lots of stuff here that makes this a complicated type
}

pub fn decoder(input: &[u8]) -> Result<Decoded, DecodedError>{
    assert_constraint!(input.len() >= 5);
    match input[0] {
        0 => {
            assert_constraint!(input.len() == 8);
            assert_constraint!(input[1] == 5);
            // Something 
        }
        1 => {
            assert_constraint!(input.len() == 16);
            assert_constraint!(input[1] == 4);
            assert_constraint!(input[2] == 4);
            // Something 
        }
        _ => {
           // Pretend the decoder keeps on going for a while.
        }
    }
}

This kind of decoder would be a real monster to write in attribute form, and would likely be even worse to have to maintain because the code that validates would be well separated from the code that does decoding. Worse, it would duplicate effort because even though the constraint attributes validated the inputs, you still need to go through all the branches in order to correctly decode what's coming in. If I'd wanted to, I could make the constraints also depend on the current state of self, just to make things even more complicated. This is why my vote is for things that look like macros rather than attributes.

1 Like

Since we're not proposing dependent types, which introduce considerably more complexity, I perfer the assert_constraint!() and try_constraint!() forms, in part for the reasons pointed out in the prior post.

For prior discussion by the compiler team of const dependent types, which is only a const subset of run-time-checked dependent types, see RFC: const-dependent type system. #1657.