I've never created an RFC or proposal before, some mentoring would be appreciate…d!
# Summary
The purpose of this proposal is to provide an ergonomic yet explicit way to get access to a set of variables deep in the call stack, or for a lot of sibling calls which need the same value. You may even call this feature, "*compile time, strongly typed, dependency injection*", or "*context variables*". This particular proposal is about an implementation based on the addition of an AST transform to give Rust some syntax sugar to achieve this.
# Motivation
There are a lot of times you want to propagate a function parameter deep into the call stack or to a lot of child calls at the same level in the stack.
- Recursive functions with a result accumulator (example below)
- Request scoped structs in a web application, eg. Request, User, Metrics, Tracer (example below)
- i.e. Structs which are used by basically every method
- Possibly in Future's [Context as an explicit param to poll](https://github.com/rust-lang-nursery/futures-rfcs/pull/2)
The goal of this proposal is to provide Rust as it is today, with some ergonomic wins. There should be minimal/zero drawbacks to this syntax sugar. Ideally the syntax would be extremely intuitive, if not, less than a page of documentation should be needed.
## Ergonomic Goals
There are two places this feature can effect readability, a **function's signature**, and a **function's implementation**.
##### Function signatures:
These change marginally, as such I don't think there is an ergonomic win or loss while reading function signatures.
##### Function implementations:
- Perspective of the writer:
- Goal: None
- Currently: This proposal happens to improve ergonomics
- Perspective of the reader:
- Goal: Improve ergonomics
- Currently:
- This proposal makes it harder to read a code sample for *the first time in that domain*.
- This proposal makes it easier to re-read code samples in familiar domains.
- Detailing the intents of the function rather than how the function is implemented.
- Eliminating variables which are not valuable to improving the readers understanding of the purpose of the function.
- i.e. This feature reduces noise, and improves signal, for all future re-reads.
A common starting point: I think we can all agree,`for x in ...` (vs it's legacy counterpart) is far more signal (i.e. more readable), even tho *how* we accomplish the iteration is no longer readable. This is because the *purpose was to iterate*, not to increment registers and dereference arrays.
Similarly, in some problem domains, while reading over the implementation of a particular function, **certain variables are noise, not signal**. Reading parameters which are **mechanically passed** over and over doesn't help me grok the **purpose of the function**.
Additionally, if you asked me, "In a web service, whats the purpose of the request variable if you can't see where it is used?"
1. I would likely be able to guess without looking at any part of the implementation.
- I know from years of experience what a request variable does.
2. If it was relatively new to me (i.e. first ever read), it is very obvious where to find that answer: the function's signature/docs.
- Having read the docs, the fact that it is used to call dependent services and construct a response becomes a given.
## Current solutions
- Utilize global variables
- Current usage: e.g. `env::var`, or `stdout`.
- Limitation: Read only / synchronization overheads
- Utilize thread local storage (TLS)
- Current usage:
- e.g. futures: 0.1 (I think removed in 0.2); tokio's implicit core.
- This solution has been extended in other languages e.g. Java with request scoped dependency injection, aspect oriented programming, and them together as aop-scoped-proxies.
- Drawbacks:
- Request scopes were built to ensure appropriate cleanup.
- With manual TLS, there is no mechanism which enforces cleanup.
- I've personally seen `User` objects stored in TLS and then not cleaned up before the next request
- This results in "leaking" private data to a different user!
- "This is a bad engineer", "better code reviews", Yes I agree!
- That said, aren't we all using Rust because **we believe the best engineers and the best process are still fallible?** That as much as we can, we should explore how to get our tools to help fix today's problems?
- Appropriate use:
- e.g. Logging: where a thread should do lock free batching of log messages before performing disc io.
- Inappropriate use: As a crutch for not wanting to pass around a few arguments.
- If this is a problem, lets give Rust an alternative, so we avoid abusing TLS.
- Utilize `Context` objects which encapsulate all the state needed
- This forces all functions to take the whole context. It would be nice to write idiomatic functions, which only take the data they need.
- A single Context object, needs to know about application specific types. Therefore it cannot improve the ergo for functions defined in a different crate (e.g. framework, plugin or middleware).
- Workaround: You would need one context per crate, and nest them
- Obfuscates parameters. i.e. Neither the caller nor the callee's function signature defines the used values.
# Explanation
Note: Good syntax is very important, but for this iteration, I am going to use placeholder syntax. We can discuss syntax after, if the general concepts of this proposal are more or less discussed and well received.
Functions will have special syntax to define auto-propagated params and with it an auto-propagation scope. Auto-propagated params with identical names and types will be automatically propagated by the compiler to any sub-call within scope. Functions within an auto-propagation scope, can have any or all of their auto-propagated params overridden. When called without an auto-propagation scope (usually at the top of the call stack) overrides can be used to set the initial auto-propagation scope.
The best way to explain this is with example sugar/de-sugar variants.
```rust
// Imagine the database client is a plugin to a web framework.
// It only knows about `database::Query`s and `io::Metrics`, not about `jwt::User` or `framework::Request`.
// A single struct would not be known to the database client implementor.
// Imagine that you're using JWT auth to extract user information.
// I'm going to ignore that this would probably be middleware for the moment,
// but the problem is the same.
// Framework/Library author implemented
// `database.call` = fn call(query: database::Query, context { metrics: io::Metrics })
// Service owner implemented
// fn query_builder1(context { user: jwt::User })
// fn query_builder2(context { user: jwt::User, request: framework::Request })
// fn build_response(result1: ..., result2: ..., with { user: jwt::User, request: framework::Request })
// With sugar
fn handle_request(context { request: framework::Request, metrics: io::Metrics }) {
// let's override the context to have a `user` parameter
let context user = jwt::User::from_bearer(request.auth_header());
let result1 = database.call(query_builder1());
let result2 = database.call(query_builder2());
...
return build_response(result1, result2);
}
// Without sugar
fn handle_request(request: framework::Request, metrics: io::Metrics) {
let context user = jwt::User::from_bearer(request.auth_header());
let result1 = database.call(query_builder1(user), metrics);
let result2 = database.call(query_builder2(user, request), metrics);
...
return build_response(user, request, result1, result2);
}
```
```rust
// With Sugar
fn find_goal_nodes(
root: &TreeNode,
context {
// This function expects these parameters, but understands
// they are generally passed down deep in the call stack, without modification.
goal: Predicate<...>,
results: &mut Vec<&TreeNode>,
}
) {
if (goal.evaluate(root)) {
results.push(root);
}
for child in root.children() {
// All parameters which share the same name, are auto-propagated to the sub-function.
// i.e. The parameters `goal`, and `results` are automatically propagated.
find_goal_nodes(child);
// Also possible to override some params, but still inherit others.
// i.e. `goal` is being overridden, but `results` is still auto-propagated.
// Note: This is just like shadowing any other variable, and the override to the context dies at the end of the scope.
let context goal = goal.negate();
find_goal_nodes(child);
// Disclaimer: Frequently overriding an auto-propagated param, would be bad practice.
// The line above overriding `goal`, would be a good "hint" that this feature is being abused on this param.
}
}
fn main() {
let root: TreeNode = ...;
let context goal: Predicate<...> = ...;
let context mut results: Vec<&TreeNode> = Vec::new();
// Initial call
find_goal_nodes(&root);
}
```
```rust
// De-Sugar the above example
fn find_goal_nodes(
root: &TreeNode,
goal: Predicate<...>,
results: &mut Vec<&TreeNode>,
) {
if (goal.evaluate(root)) {
results.push(root);
}
for child in root.children() {
find_goal_nodes(child, goal, results);
find_goal_nodes(child, goal.negate(), results);
}
}
fn main() {
let root: TreeNode = ...;
let goal: Predicate<...> = ...;
let mut results: Vec<&TreeNode> = Vec::new();
find_goal_nodes(&root, goal, results);
}
```
The real ergonomic wins start showing in large code bases. As such no example can really do it justice.
# Open Questions
- Requiring that the names match makes the feature less flexible and useful (because everyone has to agree on using the same names). There are also questions around how to handle name/type collisions.
# Drawbacks
- At initial glance the syntax sugar seems like magic. However, a single page of documentation would highlight a de-sugared example, and it will be very straight forward / intuitive for all. This is similar to a first encounter with `?` or `Into` or `self.___()`.
- However, unlike those there is no marker at the call site indicating implicitness.
- We could add a marker:
- `database.call(query_builder(_), _);`
- `database.call(query_builder(...), ...);`
- New syntax for "special" parameters. Users could be overwhelmed.
- Possibly, but seems unlikely, the mental overhead is far less than generics or lifetimes.
- Excessive use of generics can be an issue regardless of parameters being implicit, but this issue gets more-likely as we create more parameters for functions.
- This particular syntax (we can fix this when we discuss syntax):
- Basically introduces named params.
- The initial call is less ergonomic. But perhaps that is ok.
- Doesn't improve ergonomics for many sibling calls at the top level.
# FAQ
- Will this block any future features?
- The concept in general is just syntax sugar, and as such should not block any future features
- The syntax possibly could, but should be such that it doesn't block future enhancements to method signatures. We can discuss when we open up discussion about syntax.
# Alternatives
@burdges suggested that we could possibly use something similar to dependent types and type inference. I'm happy to explore this route if everyone prefers it. Although it likely will limit the context to have unique types.
@burdges has also suggested a possible `State` monad with the possible inclusion of `do` notation.
@mitsuhiko has a suggestion, which I understood as TLS with a tighter scope. Basically instead of "thread local storage", it could be "call stack local storage" (my name). His proposal seems to suggest access through a global like function e.g. `env::var`. I would still like to put the function's dependencies into the function signature somehow. And ideally, get some compile time checks, but without adding the context to all function's signatures, this is compile time check is impossible because of "no global inference".
# Conclusion
Adding a relatively simple AST transform would open Rust up to some pretty nice and easy ergonomic wins. The syntax would need to be discussed and could bring in more drawback, but at least for now the drawbacks seem minimal and within acceptable bounds.
I'd love to hear other people's thoughts:
- Do you like it? (+1s will do)
- What might your usecase be?
- Drawbacks
- Any of the drawbacks show-stoppers for you?
- Any drawbacks I didn't forsee?
- Compile/Lang teams
- Is this even feasible? I have no idea if this can actually be an AST transform.