An experience struggling to write a complex proc macro

Upfront: This is mostly a long ramble about a variety of gotchas I ran into while implementing a proc macro. Although I wouldn't mind if someone helped suggest workarounds, I'm posting this on internals because my main thought is "surely Rust could have made some of these things work better somehow?" And alas, I don't even have any concrete proposals, I'm just putting this out here in case others can make something of it.

It's not that important for me to get feedback on this; you don't need to read it unless you want to.

What I'm trying to do

I'm writing live-prop-test, a crate very similar to contracts. It lets you write functions and specify precondition and postcondition expressions on them:

#[live_prop_test(
  precondition = "*input > 0",
  postcondition = "*input > old(*input)",
)]
fn double_positive_number(input: &mut i32) {
  input *= 2
}

I have this part working! But of course, a natural extension is to apply such contracts to trait methods:

#[live_prop_test]
trait ContractedTrait {
  #[live_prop_test(
    precondition = "*input > 0",
    postcondition = "*input > old(*input)",
  )]
  fn double_positive_number(&self, input: &mut i32);
}

// later, in some other file...

#[live_prop_test(use_trait_tests)]
impl ContractedTrait for SomeStruct {
  // tests from the above trait are applied automatically
  fn double_positive_number(&self, input: &mut i32) {
    input *= 2;
  }
}

contracts also has a feature for this, but my goals are slightly different. contracts does this by rewriting the names of the trait methods, such that it's impossible to implement the trait without applying contracts. live-prop-test prefers to be less strict – you can write an impl without #[live_prop_test(use_trait_tests)], and its methods just won't be tested. So I can't just reuse the approach contracts took.

Since the trait and the corresponding impl may be in different files, they can't communicate with each other on the "proc macro runtime" level. So we probably have to make them communicate within the type system.

The way I would like to solve this

When you put #[live_prop_test] on a trait, it generates extra methods with default implementations, expanding to something like this (this is significantly simplified from my actual design, but good enough for this explanation):

trait ContractedTrait {
  #[live_prop_test(
    postcondition = "*input > old(*input)"
  )]
  fn double_positive_number(&self, input: &mut i32);
  
  fn __live_prop_test_setup__double_positive_number(&self, input: &mut i32)->(i32,) {
    // precondition
    assert!(*input > 0);
    // value of `old` expression
    (*input)
  }
  fn __live_prop_test_finish__double_positive_number(&self, input: &mut i32, olds: (i32,)) {
    // postcondition, with `old` value substituted in
    assert!(*input > olds.0);
  }
}

// later, in some other file...

impl ContractedTrait for SomeStruct {
  fn double_positive_number(&self, input: &mut i32) {
    let olds = <SomeStruct as ContractedTrait>::__live_prop_test_setup__double_positive_number(self, input);
    input *= 2;
    <SomeStruct as ContractedTrait>::__live_prop_test_finish__double_positive_number(self, input, olds);
  }
}

This expansion is fine for this particular case. One problem arises when we pass owned values rather than references. In that case, we'd have "use of moved value" errors.

So what do we do?

First approach: Use references

In practice, tests shouldn't be mutating the values. So, we could pass references into the testing functions, instead of values.

Here, the trouble comes from type checking in the precondition/postcondition expressions. If we replace the owned values with references, the types will be different, creating a stumbling block for the user. From the user's perspective, the name of an argument should always refer to the same type. It's extra work if they have to remember that the name refers to an owned value in the body, but a reference in the condition expressions.

When the proc macro is able to substitute the generated code directly into the function, we are free to have an owned value that is only used by reference:

fn example(value: OwnedNonCopyType) {
  // precondition
  assert!(value == SOME_CONSTANT);
  // `value` is later moved
}

...which is legal because it implicitly borrows value rather than moving it, but it still relies on value BEING an owned value, because if we replace value with &value, we get an error.

So, what I'd love to do is this:

fn __generated_test_setup_function_which_must_receive_reference(value: &OwnedNonCopyType) {
  [something like `let value = *value`, a statement which binds the name `value` to mean the owned value behind the reference, but does NOT move it];
  assert!(value == SOME_CONSTANT);
}

But as far as I know, there's no way to do this. (To be clear, I definitely don't think we should make a feature to do this, because it would normally just add confusion. This is just one symptom of Rust's lack of explicitness about when a value is moved versus just referred to by name… which can never be fixed because it would break backwards compatibility on basic syntax.)

Theoretically, my proc macro could also transform the condition expressions, replacing all instances of value with *value. However, I think it would take a lot of analysis to decide which instances of the token value are actually referring to that specific local variable rather than something else. And it would make error messages worse. So it's worth investigating whether there's a different approach.

Second approach: Move the values back and forth

Well, if we need the owned values in the testing functions, we could always move them into the testing functions, then move them back out.

trait ContractedTrait {
  fn method(&self, input: OwnedNonCopyType) -> OtherOwnedNonCopyType;
  
  fn __live_prop_test_setup__method(&self, input: OwnedNonCopyType)->(OwnedNonCopyType, OldValues) {
    ...
  }
  fn __live_prop_test_finish__method(&self, return_value: OtherOwnedNonCopyType, OldValues)->OtherOwnedNonCopyType {
    ...
  }
}

Wait, this has another problem, which was already present in my original hope. I can't actually infer the type of OldValues from the expressions provided by the user. Again, this was fine when all the tests were written within the same function:

fn example(input: &mut SomeStruct) {
  let old_value = input.field;
  ...original function body...
  assert(something about old_value);
}

...because type inference can apply there. But it can't apply across function boundaries.

But there's still a workaround: Instead of having separate setup and finish methods, we could have a single method that takes the original function as a callback, making the generated code be:

trait ContractedTrait {
  fn method(&mut self, input: OwnedNonCopyType);
  
  fn __live_prop_test__method(&mut self, input: InputType, original: impl FnOnce(InputType)->ReturnType)->ReturnType {
    ...precondition statements...
    ...old value statements...
    let result = (original)(self, input);
    ...postcondition statements...
    result
  }
}

// later, in some other file...

impl ContractedTrait for SomeStruct {
  fn method(&mut self, input: InputType)->ReturnType {
    <SomeStruct as ContractedTrait>::__live_prop_test__method(
      self,
      input,
      |self: &mut Self, input: InputType| -> ReturnType {
         ...original function body...
      }
    )
  }
}

No wait, we can't make a closure parameter be named self, because it's a keyword. And we can't use the original function body if the parameter is named something else.

Okay, maybe there's a different way to make it so we can use type inference for OldValues. How about the setup function returns a closure that does the finish function, using impl FnOnce?

trait ContractedTrait {
  fn method(&self, input: InputType) -> ReturnType;
  
  fn __live_prop_test_setup__method(&self, input: InputType)->(InputType, impl FnOnce(ReturnType)) {
    ...
  }
}

But wait, what if ReturnType is an impl Trait? This isn't allowed in trait methods at the moment, but it may be in the future, and also, as I develop my library further, there might be non-trait-method cases that still require calling into other testing code rather than embedding it directly. Then the way I wrote the FnOnce isn't valid either – we get "error[E0666]: nested impl Trait is not allowed". Strictly speaking, what we want to write is impl FnOnce(...the anonymous return type of method()...), but we can't do that because it's unnameable. RFC 2071 might help with this, but it hasn't been implemented yet.

Maybe we could rewrite it to use generic parameters?

  fn method(&self, input: InputType) -> impl ReturnTypeTrait;
  
  fn __live_prop_test_setup__method<R: ReturnTypeTrait>(&self, input: InputType)->(InputType, impl FnOnce(R)) {
    ...
  }

Wait, maybe this would actually work. Add one generic parameter per existential in the return type, and specify them as _ when you actually call __live_prop_test_setup__method. I only thought of that idea while writing this post, so meanwhile, I tried some other things.

But before we go on, let me point out another awkwardness here. When we get to the generated impl:

impl ContractedTrait for SomeStruct {
  fn method(&mut self, input: InputType)->ReturnType {
    let (input, finish) = <SomeStruct as ContractedTrait>::__live_prop_test_setup__method(self, input);
    let result = (||->ReturnType {...original function body...})();
    (finish)(result);
    result
  }
}

If the user wrote a tricky lifetime error in the body of the function, then the fact that we rebind input makes the error significantly more complicated; I have a simple example where it puts the main error message inside the generated code. This is clearly better than "it doesn't work at all", but seems pretty disappointing when it feels like we shouldn't have to do the rebinding in the first place. (Although, pragmatically, a weird error inside the body of the function – rather than inside the contracts – is the least bad case, because the user can just remove the proc macro attribute to see the regular error.)

Third approach: Three layers of methods

In the approach where we tried to take the original function as a closure, we could instead make it yet-another default-implemented trait method, making the generated code be:

trait ContractedTrait {
  fn method(&mut self, input: InputType)->ReturnType;
  
  fn __live_prop_test__tests_for__method(&mut self, input: InputType)->ReturnType {
    ...precondition statements...
    ...old value statements...
    let result = Self::__live_prop_test__original__method(self, input);
    ...postcondition statements...
  }
  fn __live_prop_test__original__method(&mut self, input: InputType)->ReturnType {
    // really this method is only used when it's overridden, but it needs a default impl for the sake of impls that don't use #[live_prop_test]
    panic!("Reached unimplemented original method")
  }
}

// later, in some other file...

impl ContractedTrait for SomeStruct {
  fn method(&mut self, input: InputType)->ReturnType {
    <SomeStruct as ContractedTrait>::__live_prop_test__tests_for__method(
      self,
      input,
    )
  }
  fn __live_prop_test__original__method(&mut self, input: InputType)->ReturnType {
    ...original function body...
  }
}

This seems like the cleanest approach so far. It's essentially the approach contracts takes (although contracts only needs 2 layers of methods because of its different design), and it should compile in almost all cases.

But there's another gotcha: What about non-live-prop-test attributes on the function? We now have both function and __live_prop_test__original__function. Should both of them get the original attributes? Or just the outer one? Or just the inner one?

  • If the user wrote #[cfg(unix)] on a function, then wrote another copy of the function with #[cfg(not(unix))], then those attributes MUST go on both the inner and outer versions, or you'll get a compile error.
  • If the user wrote #[rocket::get("/")] or #[partial_application_rs::part_app] on a function, then it must only go on the outer function, and must not go on the inner function.
  • If the user wrote #[function_name::named] on a method, it must go on the inner function, and doesn't care whether it goes on the outer function.
  • If the user wrote #[tracing::instrument], #[log_derive::logfn], or #[cached::proc_macro::cached] on a method, then it would prefer to go on only one of the copies of the function, but it's not obvious which. If I didn't have 2 copies of the function, then it would implicitly be like it went on the outer function.

For the ones that must go on the inner function, it looks like they also work if the inner function is literally inside the body of the outer function. So if we can do it that way, we only need to put the attributes on the outer function. (#[function_name::named] even prefers it this way.)

But can we do it that way? For the trait-method case, the trait testing function was relying on calling another function from the trait – or, wait, maybe now we can get away with passing in the function pointer/ZST as a FnOnce argument to the testing function... after converting any instances of impl Trait into generic parameters so that we can actually express the FnOnce bound.

Either way, in the general (non-trait) case, it's also not possible to declare the inner function inside the outer function, because the argument and return types might refer to generic parameters of the containing impl, and the proc macro does not necessarily know the containing impl! One possible workaround is to require the containing impl to also be tagged with my proc macro, so that the macro can be aware of the containing impl and thus able to create an analogous impl inside the outer function. That's a bit of extra complication for the user, but it's very straightforward complication (they get an error telling them they need to apply the macro to the impl, and then they apply it).

I might ultimately go with this approach. But just in case, I'm still considering other approaches:

Fourth approach: Inject the expressions using a macro_rules! macro

At the site of the trait declaration, rather than defining testing methods, I could define a macro_rules! macro that expands to the testing statements. Then, at the site of the trait impl, I could invoke that macro, thus directly injecting the precondition/postcondition expressions into the tested function. This would dodge all of the issues about the type system and duplicating the function.

I have actually implemented this system! But it turned out to have just as many gotchas as the other approaches:

  • At the impl site, there's no guarantee that the macro has been imported along with the trait. I must require the user to give the path to the trait explicitly.
  • Macros from other crates still can't be imported by relative paths, so you can't have two traits with the same name in the same crate – or I would have to require the user to specify disambiguating names. And the proc macro cannot be aware of the module path it's in, so I can't do any tricks like embedding the module path in the macro name.
  • Generated macros from the same crate can't be imported by absolute paths, per macro_expanded_macro_export_accessed_by_absolute_paths. Thus, I would have to force the user to tag all intervening modules with #[macro_use], which might also apply to other macros that they didn't want to spill across modules.

And that's just covering the issues that will affect the user, not the various complications about implementing it (like having to circumvent macro hygiene).

Conclusion

I don't have much of a conclusion, other than "waaaah, when you have to really examine Rust syntax structure, it's a lot less simple and consistent than you might think". And given the large variety of options available with different gotchas, it just bugs me that I have to choose between them, rather than one of them having not had any gotchas.

Anyway, thank you for listening to my bumbling :slight_smile:

4 Likes

Binding a name to an owned value but not moving it is impossible, as it is contradictory. It doesn't quite make sense. You either own the value or you don't – you can't simultaneously own and not own the value. Such a binding would need to behave by decaying into a reference upon every possible use in order not to move.

Which leads me to think that what you probably meant in that code snippet was to dereference the pointer, only to have its adress taken again (implicitly) by the PartialEq impl, since *lhs == rhs desugars to PartialEq::eq(&*lhs, &rhs).

All in all, this doesn't seem to have to do much with procedural macros or the lack of features in Rust. Similar problems can usually be solved by adding a layer of indirection, hiding it behind a trait, and adding a blanket impl for reference types (and maybe Box, etc.), then just plainly not changing the user's code at all.

That's a lengthy description of your adventures with procedural macros :smile:, and I do think it is interesting to have this laid out as it is, since context-less syntactic-only transformations do have their limits, especially when having special-case things such as self.

I think the OP was talking about a way to have sugar similar to C++'s references, or Rust closures that only capture by reference:

let s = String::from("...");
{
    let clone = Clone::clone(&s);
    stuff(clone);
} // <- this blocks has not moved `s`
// or, similarly,
let f = || {
    let clone = Clone::clone(&s); // refers to the name `s: String`
    stuff(clone);
}; // <- actually only captures `&s`.

That kind of syntactic code cannot be moved and factored out into an external function verbatim, since implicit closure captures are the only place in Rust where this kind of mechanism is possible:

let f = mk_f(&s);
// where
fn mk_f (s: &'_ String) -> impl Fn() + '_
{
    // Wops, this is now Copy-ing a `&String` instead of `Clone`-ing a `String`.
    //                                 vv         vvvvv -> error or different generic function call
    move || { let clone = Clone::clone(&s); stuff(clone); } 
      // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      //       copy-pasted verbatim by a macro
}

So there is indeed an annoyance in not being able to have complete freedom to move things around.

That being said, I do agree that such specific / special-case / niche annoyance, does not seem, to me, to be enough to warrant adding some kind of complex and/or magical feature in the language that would allow to do this.


Procedural macros are a tool that can help users implement their own magic into the language, which does have limits and may involve dirty hacks, but that's the price to pay to play the role of gods compiler/language designers :smile:

So albeit annoying, procedural macros do have to tank that complexity, imho.


So, back to the OP's issues, let's see how we can circumvent or tackle some of the issues having been raised.

Regarding ownership vs. borrows

The Rust book itself mentions that it would be possible, since (exclusive) ownership implies (exclusive access which implies) mutable access, to write mutation functions as follows:

fn emphasize (mut s: String) -> String
{
    s.push_str("!!"); // <- this has not taken ownership of s ...
    s // <- so the `s` binding still has ownership of it to be able to return it
}

So, given that there is no such "sugar to have a name have the type of the owned value and yet only have access to borrows of it, as mentioned just above, the logical alternative is to use this approach.

fn __live_prop_test_setup__double_positive_number (
    <self receiver>,
    $( , $arg_name: $ArgTy )*
) -> ( $($ArgTy ,)* )
{
    assert!(<body>);
    ( $($arg_name ,)* ) // (You could even bundle back `self`, if the initial method was using an owned `self`.
}

So, what may look like a hack traditional / idiomatic Rust code wise, is actually the most sensible solution macro / meta-programming wise.

At that point we have reached:

And indeed the workaround is to keep the code as close as possible to the mechanics involved in the initial code, so the closure workaround, as the next logical step, makes total sense (to me) too:

And then the problem is:

No wait, we can't make a closure parameter be named self , because it's a keyword. And we can't use the original function body if the parameter is named something else.

This is the point where things have become overly annoying, I reckon. Which is kind of the reason there are so few procedural macros that support interacting with traits :upside_down_face:

There is a nice write-up about this from David Tolnay:

So I'd be curious about that solution.

Regarding the mechanism to replace all self occurences by self_, it is suprisingly easy, thanks to the VisitMut-and-the-like utilities of ::syn:

use ::syn::{*, visit_mut::{self, VisitMut}};

struct ReplaceSelfWithSelf_;
impl VisitMut for ReplaceSelfWithSelf_ {
    fn visit_expr_path_mut (&mut self, expr: &mut ExprPath)
    {
        match *expr {
            | ExprPath { qself: None, ref mut path, .. }
                if path.is_ident("self")
            => {
                *path = parse_quote![ self_ ];
            },
            | _ => {},
        }
        // and recurse (not needed here (nothing useful to recurse to))
        // visit_mut::visit_expr_path_mut(self, expr);
    }

    fn visit_item_mut (&mut self, _: &mut Item)
    {
        // Do nothing: that is, do not recurse.
    }
}
ReplaceSelfWithSelf_.visit_block_mut(&mut block);

The other option is to use a helper method. Either through a helper trait, like I did with require_unsafe_in_body, or, incidentally, like you've done by using a "third" method:

That is, you haven't needed a helper trait since you have chosen to use the trait itself as the helper trait!

So, in a way, even though reaching that situation may have been a panful journey, as a proc-macro author myself, the final solution that you have figured out makes total sense, and despite it looking hacky, think that many library functions do hacky things under the rug, the difference with (procedural) macros is that people can "expose the ugly" by using cargo expand & co. :grin:

But now people can easily add {pre,post}-conditions to their methods and implementations, magically, and it will Just Work. Congratulations, you've added new features to the language for users of your library, without having had to request for the language itself to implement that feature! :raised_hands:

2 Likes

Thanks for the detailed feedback! The dtolnay post helped me catch yet another gotcha (my existing code had the same error in the "returning &mut field" case, I just hadn't tested for it :joy:).

I'm happy to report that I have now implemented this feature, and have a lot of working tests! I ultimately chose the trade-off where the user is forced to tag the containing impl, rather than the trade-off where the original function body has the token self replaced with a placeholder. Desugaring argument-position impl Trait into generic parameters was indeed necessary, but turned out to be a straightforward application of VisitMut.