Upfront: This is mostly a long ramble about a variety of gotchas I ran into while implementing a proc macro. Although I wouldn't mind if someone helped suggest workarounds, I'm posting this on internals because my main thought is "surely Rust could have made some of these things work better somehow?" And alas, I don't even have any concrete proposals, I'm just putting this out here in case others can make something of it.
It's not that important for me to get feedback on this; you don't need to read it unless you want to.
What I'm trying to do
I'm writing live-prop-test
, a crate very similar to contracts
. It lets you write functions and specify precondition and postcondition expressions on them:
#[live_prop_test(
precondition = "*input > 0",
postcondition = "*input > old(*input)",
)]
fn double_positive_number(input: &mut i32) {
input *= 2
}
I have this part working! But of course, a natural extension is to apply such contracts to trait methods:
#[live_prop_test]
trait ContractedTrait {
#[live_prop_test(
precondition = "*input > 0",
postcondition = "*input > old(*input)",
)]
fn double_positive_number(&self, input: &mut i32);
}
// later, in some other file...
#[live_prop_test(use_trait_tests)]
impl ContractedTrait for SomeStruct {
// tests from the above trait are applied automatically
fn double_positive_number(&self, input: &mut i32) {
input *= 2;
}
}
contracts
also has a feature for this, but my goals are slightly different. contracts
does this by rewriting the names of the trait methods, such that it's impossible to implement the trait without applying contracts. live-prop-test
prefers to be less strict β you can write an impl without #[live_prop_test(use_trait_tests)]
, and its methods just won't be tested. So I can't just reuse the approach contracts
took.
Since the trait and the corresponding impl may be in different files, they can't communicate with each other on the "proc macro runtime" level. So we probably have to make them communicate within the type system.
The way I would like to solve this
When you put #[live_prop_test]
on a trait, it generates extra methods with default implementations, expanding to something like this (this is significantly simplified from my actual design, but good enough for this explanation):
trait ContractedTrait {
#[live_prop_test(
postcondition = "*input > old(*input)"
)]
fn double_positive_number(&self, input: &mut i32);
fn __live_prop_test_setup__double_positive_number(&self, input: &mut i32)->(i32,) {
// precondition
assert!(*input > 0);
// value of `old` expression
(*input)
}
fn __live_prop_test_finish__double_positive_number(&self, input: &mut i32, olds: (i32,)) {
// postcondition, with `old` value substituted in
assert!(*input > olds.0);
}
}
// later, in some other file...
impl ContractedTrait for SomeStruct {
fn double_positive_number(&self, input: &mut i32) {
let olds = <SomeStruct as ContractedTrait>::__live_prop_test_setup__double_positive_number(self, input);
input *= 2;
<SomeStruct as ContractedTrait>::__live_prop_test_finish__double_positive_number(self, input, olds);
}
}
This expansion is fine for this particular case. One problem arises when we pass owned values rather than references. In that case, we'd have "use of moved value" errors.
So what do we do?
First approach: Use references
In practice, tests shouldn't be mutating the values. So, we could pass references into the testing functions, instead of values.
Here, the trouble comes from type checking in the precondition/postcondition expressions. If we replace the owned values with references, the types will be different, creating a stumbling block for the user. From the user's perspective, the name of an argument should always refer to the same type. It's extra work if they have to remember that the name refers to an owned value in the body, but a reference in the condition expressions.
When the proc macro is able to substitute the generated code directly into the function, we are free to have an owned value that is only used by reference:
fn example(value: OwnedNonCopyType) {
// precondition
assert!(value == SOME_CONSTANT);
// `value` is later moved
}
...which is legal because it implicitly borrows value
rather than moving it, but it still relies on value
BEING an owned value, because if we replace value
with &value
, we get an error.
So, what I'd love to do is this:
fn __generated_test_setup_function_which_must_receive_reference(value: &OwnedNonCopyType) {
[something like `let value = *value`, a statement which binds the name `value` to mean the owned value behind the reference, but does NOT move it];
assert!(value == SOME_CONSTANT);
}
But as far as I know, there's no way to do this. (To be clear, I definitely don't think we should make a feature to do this, because it would normally just add confusion. This is just one symptom of Rust's lack of explicitness about when a value is moved versus just referred to by name⦠which can never be fixed because it would break backwards compatibility on basic syntax.)
Theoretically, my proc macro could also transform the condition expressions, replacing all instances of value
with *value
. However, I think it would take a lot of analysis to decide which instances of the token value
are actually referring to that specific local variable rather than something else. And it would make error messages worse. So it's worth investigating whether there's a different approach.
Second approach: Move the values back and forth
Well, if we need the owned values in the testing functions, we could always move them into the testing functions, then move them back out.
trait ContractedTrait {
fn method(&self, input: OwnedNonCopyType) -> OtherOwnedNonCopyType;
fn __live_prop_test_setup__method(&self, input: OwnedNonCopyType)->(OwnedNonCopyType, OldValues) {
...
}
fn __live_prop_test_finish__method(&self, return_value: OtherOwnedNonCopyType, OldValues)->OtherOwnedNonCopyType {
...
}
}
Wait, this has another problem, which was already present in my original hope. I can't actually infer the type of OldValues
from the expressions provided by the user. Again, this was fine when all the tests were written within the same function:
fn example(input: &mut SomeStruct) {
let old_value = input.field;
...original function body...
assert(something about old_value);
}
...because type inference can apply there. But it can't apply across function boundaries.
But there's still a workaround: Instead of having separate setup and finish methods, we could have a single method that takes the original function as a callback, making the generated code be:
trait ContractedTrait {
fn method(&mut self, input: OwnedNonCopyType);
fn __live_prop_test__method(&mut self, input: InputType, original: impl FnOnce(InputType)->ReturnType)->ReturnType {
...precondition statements...
...old value statements...
let result = (original)(self, input);
...postcondition statements...
result
}
}
// later, in some other file...
impl ContractedTrait for SomeStruct {
fn method(&mut self, input: InputType)->ReturnType {
<SomeStruct as ContractedTrait>::__live_prop_test__method(
self,
input,
|self: &mut Self, input: InputType| -> ReturnType {
...original function body...
}
)
}
}
No wait, we can't make a closure parameter be named self
, because it's a keyword. And we can't use the original function body if the parameter is named something else.
Okay, maybe there's a different way to make it so we can use type inference for OldValues. How about the setup function returns a closure that does the finish function, using impl FnOnce
?
trait ContractedTrait {
fn method(&self, input: InputType) -> ReturnType;
fn __live_prop_test_setup__method(&self, input: InputType)->(InputType, impl FnOnce(ReturnType)) {
...
}
}
But wait, what if ReturnType is an impl Trait
? This isn't allowed in trait methods at the moment, but it may be in the future, and also, as I develop my library further, there might be non-trait-method cases that still require calling into other testing code rather than embedding it directly. Then the way I wrote the FnOnce isn't valid either β we get "error[E0666]: nested impl Trait
is not allowed". Strictly speaking, what we want to write is impl FnOnce(...the anonymous return type of method()...)
, but we can't do that because it's unnameable. RFC 2071 might help with this, but it hasn't been implemented yet.
Maybe we could rewrite it to use generic parameters?
fn method(&self, input: InputType) -> impl ReturnTypeTrait;
fn __live_prop_test_setup__method<R: ReturnTypeTrait>(&self, input: InputType)->(InputType, impl FnOnce(R)) {
...
}
Wait, maybe this would actually work. Add one generic parameter per existential in the return type, and specify them as _
when you actually call __live_prop_test_setup__method
. I only thought of that idea while writing this post, so meanwhile, I tried some other things.
But before we go on, let me point out another awkwardness here. When we get to the generated impl:
impl ContractedTrait for SomeStruct {
fn method(&mut self, input: InputType)->ReturnType {
let (input, finish) = <SomeStruct as ContractedTrait>::__live_prop_test_setup__method(self, input);
let result = (||->ReturnType {...original function body...})();
(finish)(result);
result
}
}
If the user wrote a tricky lifetime error in the body of the function, then the fact that we rebind input
makes the error significantly more complicated; I have a simple example where it puts the main error message inside the generated code. This is clearly better than "it doesn't work at all", but seems pretty disappointing when it feels like we shouldn't have to do the rebinding in the first place. (Although, pragmatically, a weird error inside the body of the function β rather than inside the contracts β is the least bad case, because the user can just remove the proc macro attribute to see the regular error.)
Third approach: Three layers of methods
In the approach where we tried to take the original function as a closure, we could instead make it yet-another default-implemented trait method, making the generated code be:
trait ContractedTrait {
fn method(&mut self, input: InputType)->ReturnType;
fn __live_prop_test__tests_for__method(&mut self, input: InputType)->ReturnType {
...precondition statements...
...old value statements...
let result = Self::__live_prop_test__original__method(self, input);
...postcondition statements...
}
fn __live_prop_test__original__method(&mut self, input: InputType)->ReturnType {
// really this method is only used when it's overridden, but it needs a default impl for the sake of impls that don't use #[live_prop_test]
panic!("Reached unimplemented original method")
}
}
// later, in some other file...
impl ContractedTrait for SomeStruct {
fn method(&mut self, input: InputType)->ReturnType {
<SomeStruct as ContractedTrait>::__live_prop_test__tests_for__method(
self,
input,
)
}
fn __live_prop_test__original__method(&mut self, input: InputType)->ReturnType {
...original function body...
}
}
This seems like the cleanest approach so far. It's essentially the approach contracts
takes (although contracts
only needs 2 layers of methods because of its different design), and it should compile in almost all cases.
But there's another gotcha: What about non-live-prop-test
attributes on the function? We now have both function
and __live_prop_test__original__function
. Should both of them get the original attributes? Or just the outer one? Or just the inner one?
- If the user wrote
#[cfg(unix)]
on a function, then wrote another copy of the function with#[cfg(not(unix))]
, then those attributes MUST go on both the inner and outer versions, or you'll get a compile error. - If the user wrote
#[rocket::get("/")]
or#[partial_application_rs::part_app]
on a function, then it must only go on the outer function, and must not go on the inner function. - If the user wrote
#[function_name::named]
on a method, it must go on the inner function, and doesn't care whether it goes on the outer function. - If the user wrote
#[tracing::instrument]
,#[log_derive::logfn]
, or#[cached::proc_macro::cached]
on a method, then it would prefer to go on only one of the copies of the function, but it's not obvious which. If I didn't have 2 copies of the function, then it would implicitly be like it went on the outer function.
For the ones that must go on the inner function, it looks like they also work if the inner function is literally inside the body of the outer function. So if we can do it that way, we only need to put the attributes on the outer function. (#[function_name::named]
even prefers it this way.)
But can we do it that way? For the trait-method case, the trait testing function was relying on calling another function from the trait β or, wait, maybe now we can get away with passing in the function pointer/ZST as a FnOnce
argument to the testing function... after converting any instances of impl Trait
into generic parameters so that we can actually express the FnOnce
bound.
Either way, in the general (non-trait) case, it's also not possible to declare the inner function inside the outer function, because the argument and return types might refer to generic parameters of the containing impl, and the proc macro does not necessarily know the containing impl! One possible workaround is to require the containing impl to also be tagged with my proc macro, so that the macro can be aware of the containing impl and thus able to create an analogous impl inside the outer function. That's a bit of extra complication for the user, but it's very straightforward complication (they get an error telling them they need to apply the macro to the impl, and then they apply it).
I might ultimately go with this approach. But just in case, I'm still considering other approaches:
Fourth approach: Inject the expressions using a macro_rules!
macro
At the site of the trait declaration, rather than defining testing methods, I could define a macro_rules!
macro that expands to the testing statements. Then, at the site of the trait impl, I could invoke that macro, thus directly injecting the precondition/postcondition expressions into the tested function. This would dodge all of the issues about the type system and duplicating the function.
I have actually implemented this system! But it turned out to have just as many gotchas as the other approaches:
- At the impl site, there's no guarantee that the macro has been imported along with the trait. I must require the user to give the path to the trait explicitly.
- Macros from other crates still can't be imported by relative paths, so you can't have two traits with the same name in the same crate β or I would have to require the user to specify disambiguating names. And the proc macro cannot be aware of the module path it's in, so I can't do any tricks like embedding the module path in the macro name.
- Generated macros from the same crate can't be imported by absolute paths, per
macro_expanded_macro_export_accessed_by_absolute_paths
. Thus, I would have to force the user to tag all intervening modules with#[macro_use]
, which might also apply to other macros that they didn't want to spill across modules.
And that's just covering the issues that will affect the user, not the various complications about implementing it (like having to circumvent macro hygiene).
Conclusion
I don't have much of a conclusion, other than "waaaah, when you have to really examine Rust syntax structure, it's a lot less simple and consistent than you might think". And given the large variety of options available with different gotchas, it just bugs me that I have to choose between them, rather than one of them having not had any gotchas.
Anyway, thank you for listening to my bumbling