Proposal: stabilizable, proc-macro based lints

abonander · July 14, 2021, 8:48pm

In the SQLx Discord last night, I was lamenting the loss of the legacy syntax extensions and thus the ability to just plug extra lints into the compiler. It seems like the "just add it to Clippy" solution is satisfactory for most of the community; however, I feel like there is still a lot of potential in supporting third-party lints.

Motivation(s)

SQLx

Users of SQLx often run into a pitfall that we can't easily fix with API design:

// check out a connection from the pool and begin a SQL transaction
let mut txn = pg_pool.begin().await?;

// locks the row to exclude concurrent updates
let something_to_update = sqlx::query!(
    "select * from foo where id = $1 for update",
    foo_id
)
.fetch_one(&mut txn)
.await?;

let new_bar = /* calculate from `something_to_update` */;

// perform our update
sqlx::query!(
    "update foo set bar = $1 where id = $2",
    new_bar,
    foo_id
)
.await?;

// tell the database to save the changes
txn.commit().await?;

However, what happens if someone forgets that last line? We can't leave the connection in a transaction indefinitely, because we want to release it back to the pool, so the transaction needs to be cancelled, or rolled back. The transaction type here will do that on-drop (by implicitly spawning a task since this is async).

We can't implicitly commit on-drop if there are changes because execution could have hit one of those early returns from the ? operator but we weren't able to make all the changes the user expected, so rolling back on-drop is the safer default.

We could fix this issue using a closure, which we do provide as an option, but that has its own usability issues: `Connection::transaction` is not usable · Issue #604 · launchbadge/sqlx · GitHub

And the lack of async traits means a nigh-unreadable signature and boxing the return type (since it uses async fn internally; even with async traits this might just end up a HRTB nightmare): Connection in sqlx - Rust

What we would really love is to be able to lint this, and warn when a transaction is dropped at the end of normal execution without being committed. #[must_use] isn't sufficient to catch this case as the transaction will be marked used by the .fetch_one(&mut tx) call.

There's a bunch more SQL(x)-specific lints that we'd love to be able to provide to users, but obviously they don't make sense being built into Clippy.

Serde

Most of our Rust usage at Launchbadge right now is implementing REST APIs that return JSON. To make it easier to interface with in languages like Dart (for Flutter mobile apps) that prefer camelCase names, we use #[serde(rename_all = "camelCase")] on all our API return types:

#[derive(serde::Deserialize)]
#[serde(rename_all = "camelCase")]
struct FooResponse {
    id: Uuid,
    name: String,
    created_at: time::OffsetDateTime,
    updated_at: Option<time::OffsetDateTime>,
}

#[actix_web::get("/v1/foo/{id}")]
async fn get_foo_by_id(id: Path<Uuid>) -> actix_web::Result<Option<Json<FooResponse>>> { ...}

However, to my knowledge, there is no way to globally (or by module) enforce usage of the #[serde(rename_all = "camelCase")] attribute, and it's embarrassingly easy to forget to add to a response type, even for seasoned developers.

Serde can concievably add something like a serde.toml that lets you configure this externally, but that's a non-local effect which could be really confusing when you're reading the code, and also we wouldn't want to necessarily apply this for the whole crate (since we use derived Serde impls elsewhere to interact with other web APIs that have different casing rules), so a lint makes the most sense to me.

It makes some sense that this could be integrated into Clippy since Serde is a de-facto standard crate in the Rust ecosystem, but it'd make even more sense for this to be in its own serde-lints crate along with lints that cover other potential pitfalls.

The Idea

I came to the realization that we basically have the building blocks for pluggable lints that we can stabilize, and we don't have to stabilize the entirety of rustc_hir to do it.

I'm conceiving of lints that fit into the existing proc-macro scheme and use the existing proc_macro API (or proc_macro2 and syn built on top of it):

sqlx-lints/lib.rs (a proc-macro = true crate):

// #[warn(sqlx_lints::uncommitted_transaction)]
#[proc_macro_lint]
fn uncommitted_transaction(input: TokenStream) {

}

Where the lint will parse the TokenStream and then use Span::lint() (which behaves like Span::error() but is modified by the configured lint level) to call out the patterns it's looking for.

Notice that the function returns (); because lints are idempotent and cannot modify the code, they can be run in parallel.

However, the sqlx_lints::uncommitted_transaction lint would probably want type information so it can know what each variable is. I'm thinking that it can ask for expanded/desugared input like so:

#[proc_macro_lint(typed, desugar(method_call)]

So the above SQLx example would be presented something like:

let mut txn: ::sqlx::Transaction<'static, ::sqlx::postgres::Postgres> =  ::sqlx::Pool::<::sqlx::postgres::Postgres>::begin(&pg_pool).await?;

// not gonna bother typing out the in-between calls

::sqlx::Transaction::commit(txn).await?;

The spans of the pretty-printed tokens here would link back to the original source so Span::lint() still works correctly. desugar() could take other options like deref_coercion or overloaded_ops if it cares about those sorts of things, but each option that we introduce could be stabilized separately.

The Serde example wouldn't need any of this, of course, it could just be passed the source verbatim.

Additionally, lints will probably want to be able to specify which AST nodes they care about, e.g.:

#[proc_macro_lint(visit(fn))]
fn sqlx_uncommitted_transaction(_: TokenStream) {}

#[proc_macro_lint(visit(struct, enum))]
fn serde_non_camel_case(_: TokenStream) {}

Which simplifies parsing as the lint only has to attempt to parse those specific AST nodes. I think I would actually say that the visit() meta item would be required as otherwise there's not really a sensible default except maybe the visiting the whole crate at once, which would be pretty resource-intensive. As with desugar(), other options for visit() can always be added and stabilized later.

Additionally, it would be really helpful if the lints crate could declare lint groups, which makes sense to be a macro to me:

proc_macro::lint_groups! {
    // `#[warn(sqlx::recommended)]`
    recommended = [uncommitted_transaction, <...>],
    pedantic = [too_many_bind_args, bad_style, <...>],
    // should we always have an implicit `all` group unless defined?
    all = [...recommended, ...pedantic]
}

Of course, this design has some fundamental flaws; it's probably going to be pretty slow since we're introducing even more pretty-printing and reparsing, although lints can be run in parallel where possible which would help with that.

It also doesn't cover every possible use-case for lints, as we still can only work with the current AST/HIR node being linted whereas lints in Clippy and the compiler can look up anything in the HIR tree.

However, it's something, which is better than what we have now, and quality-of-life improvements for lints can always be added to the proc_macro API later.

matklad · July 14, 2021, 9:15pm

For your own code, you can write a #[test] that walkdirs rs files and does arbitrary validation. We do a lot of that in rust-analyzer:

github.com

rust-analyzer/rust-analyzer/blob/ade91158646902801229d6fac350698c9d8fe95b/crates/rust-analyzer/tests/slow-tests/tidy.rs#L64

    
      
          Actual hash:   {:x}
          
          
Please adjust docs/dev/lsp-extensions.md.
          ",
                      expected_hash, actual_hash
                  )
              }
          }
          
          
#[test]
          fn files_are_tidy() {
              let files = sourcegen::list_files(&sourcegen::project_root().join("crates"));
          
          
    let mut tidy_docs = TidyDocs::default();
              let mut tidy_marks = TidyMarks::default();
              for path in files {
                  let extension = path.extension().unwrap_or_default().to_str().unwrap_or_default();
                  match extension {
                      "rs" => {
                          let text = read_file(&path).unwrap();
                          check_todo(&path, &text);

abonander · July 14, 2021, 9:16pm

Yeah, that would do okay for the Serde case, but the SQLx case would be a pretty naive search for a .begin().await? call and then a .commit().await? call, which could have weird false-positives if another library has a similar API. It'd generally be really easy to confuse the checks with that approach, I think.

dhm · July 14, 2021, 11:03pm

FWIW,

is one of the best if not the best solution at the moment for this; while it does require re-running compilations with the toolchain of the rust version the lint was written for (but which luckily is implicitly done by the dylint framework; the two only actual drawbacks being incompatibility with code having too high of a MSRV, and "lint time" (but note that one great thing about dylint is that if multiple different lints are written targeting the same Rust version, then the lint passes will be batched together, hence ensuring a somewhat tame lint time in practice for internally-defined lints)).

The complex thing is understanding the internals of rustc. I find that anything control-flow related such as your "lack of explicit drop" lint would be quite difficult to implement since it involves dealing with the MIR (since it's where actual flow graphs with tracked ownership happens), which has a very bare-bones and scarcely documented API.

FWIW, I personally plan on writing a general purpose #[deny(implicit_drops)] lint, so if I were successful you should be able to easily fork my lint and make it target your sqlx types specifically.

I even plan on writing a special convenience API for dealing with MIR stuff, should I become knowledgeable enough of these things, so as to feature a MirLintPass trait or something along those lines

A more classic proc-macro solution could be to have users annotate functions they'd want to be checked with a proc-macro that will act as a preprocessor.

It could, for instance, setup a special collection when entering the function (empty at that time), and feature a magical begin!(…) macro that would yield a txn but also register it within that collection. It could then unsugar ? operators to iterating over the collection-registered txns and rolling them all back (and, similarly, feature a bail! / yeet! macro that would do the same thing). That way, you would be able to feature something like panic! on implicit drop again, which is hacky, but would get the job done; and people could feature-gate the usage of these "runtime lints" so as to be able to disable them in performance-critical scenarios.

jhpratt · July 14, 2021, 11:41pm

I was actually just thinking about how to do library lints last night. My thought was to have some API that could be stabilized (providing type information in addition to the raw syntax). Lints could be compiled to wasm and the actual tool could be provided a list (unsure on the exact method) of .wasm dylibs.

I think this would work, but haven't bothered to write anything down let alone try to implement it. The main problem would presumably be getting an API that can be exposed and be capable of being transformed from the internal API.

felix.s · July 15, 2021, 8:57am

Why not design an API with which it is impossible to forget it?

impl Transaction {
    async fn commit(self) -> Result<(), TxError> { /* ... */ }
    async fn rollback(self) -> Result<(), TxError> { /* ... */ }
}

impl Pool {
    async fn new_transaction(&self) -> Result<Transaction, TxError> { /* ... */ }

    pub async fn in_transaction<T, E, F: Future<Output=Result<T, E>>>(
        &self, func: impl for <'a> FnOnce(&'a mut Transaction) -> F)
            -> Result<Result<T, E>, (TxError, Option<Result<T, E>>)>
    {
        let mut tx = self.new_transaction().await.map_err(|e| (e, None))?;
        let result = func(&mut tx).await;
        match match &result {
            Ok(_) => tx.commit().await,
            Err(_) => tx.rollback().await,
        } {
            Ok(()) => Ok(result),
            Err(e) => Err((e, Some(result))),
        }
    }
}

abonander · July 15, 2021, 9:23am

In the original post I mention that we do already provide an API like that and also briefly go into why that isn't the greatest option:

The signature as you wrote it won't even compile anyways. The linked issue goes into some details on the difficulty of implementing something like this, and I believe the original signature was close to what you wrote.

The core problem is that there's no way to tell the compiler that two different higher-ranked lifetime parameters actually represent the same lifetime:

async fn takes_async_closure<F, Fut>(closure: F) 
where
    // we'd need to be able to tell the compiler that `'a` is the same lifetime
    // in both places here, but still have it be a higher-ranked bound because
    // the actual concrete lifetime is defined by the called function, not the caller
    F: for<'a> FnMut(&'a mut Transaction) -> Fut,
    Fut: for<'a> Future + 'a
{
    // ...
}

We did end up resolving that issue, sort of, by requiring the returned future to be boxed, which eliminates the extra higher-ranked trait bound but at the cost of boxing and dynamic dispatch:

async fn takes_async_closure<F, Fut>(closure: F) 
where
    // this version is fine because it's only one HRTB
    F: for<'a> FnMut(&'a mut Transaction) -> Pin<Box<dyn Future +'a>>,
{
    // ...
}

programmerjake · July 15, 2021, 8:16pm

abonander:

async fn takes_async_closure<F, Fut>(closure: F) 
where
    // we'd need to be able to tell the compiler that `'a` is the same lifetime
    // in both places here, but still have it be a higher-ranked bound because
    // the actual concrete lifetime is defined by the called function, not the caller
    F: for<'a> FnMut(&'a mut Transaction) -> Fut,
    Fut: for<'a> Future + 'a
{
    // ...
}

Can't you just create a new trait to combine the requirements?

trait TakesAsyncClosureCallback<'a>: Sized {
    type Fut: Future + 'a;
    fn callback(&mut self, tx: &'a mut Transaction) -> Self::Fut;
}
impl<'a, This, Fut> TakesAsyncClosureCallback<'a> for This
where
    This: FnMut(&'a mut Transaction) -> Fut,
    Fut: Future + 'a,
{
    type Fut = Fut;
    fn callback(&mut self, tx: &'a mut Transaction) -> Self::Fut {
        self(tx)
    }
}
async fn takes_async_closure<F>(closure: F) 
where
    F: for<'a> TakesAsyncClosureCallback<'a>,
{
    // ...
}

RustyYato · July 15, 2021, 8:36pm

Here's a working playground example: Rust Playground

abonander · July 15, 2021, 8:37pm

That does work but it's also suboptimal because the compiler can no longer infer the argument type of the closure, you have to specify it:

takes_async_closure(|txn: &mut Transaction| async move {
    // ...
})

It gets even worse if you want to make the error type generic:

takes_async_closure(|txn: &mut Transaction| async move {
    // because the `?` desugars to an `.into()` call, 
    // the compiler can't infer the expected error type 
    // of the future from this expression
    // so you need to fully qualify the result type somewhere,
    // like in the return position
    // or have a `return Err(MyError)` somewhere
    sqlx::query!("blah").execute(&mut *tx).await?;


    Result::<(), MyError>::Ok(())
})
// the compiler could only infer the error type here if it's in return position
.await?;

This is a lot of annoying boilerplate and extra right-drift for a transaction, which is something you use quite commonly in REST APIs.

I'll be honest, it's really frustrating to post a proposal and then most of the replies are just suggesting alternative solutions to the problem cases, that oftentimes have already been considered, and are completely orthogonal to the proposal itself.

programmerjake · July 15, 2021, 9:19pm

Related: Custom lints for my crates - help - The Rust Programming Language Forum

programmerjake · July 15, 2021, 9:24pm

we'd probably want some way to query the compiler for the definition of a particular named item, since lints are quite likely to depend on how the type is defined and/or what traits a type implements

abonander · July 15, 2021, 9:26pm

Yeah, like I said at the top this isn't a complete solution, it's not going to cover all possible use-cases for lints but API improvements can be added incrementally.

abonander · July 15, 2021, 9:27pm

This actually doesn't work with a closure, by the way: Rust Playground

RustyYato · July 15, 2021, 10:09pm

Fwiw, I think that proc-macro style lints would be a nice way to handle linting for the normal case. It is also readily extensible, so we could provide type driven lints in later iterations.

InfernoDeity · July 15, 2021, 10:30pm

I'd like to see a stablized mechanism for providing lints. Things like dylint are going to be really fun, and potentially subject to the N+1 problem as competing rust impls are developed with vastly different internals.

mathstuf · July 16, 2021, 12:26am

So this is a common theme in many projects I've worked on. When coming in with a proposal, it is best to be up front about what you have tried and what didn't work in such cases. Yes, it's a lot more work, but getting these things out of the way avoids the feelings that come along with "why can't people see that I've tried everything and this is where I'm at?" in these kinds of discussions that can end up taking even more time overall. Sometimes. there are other ways that one did not think of, but not everyone can know what those might be, so they get suggested. Additionally, it avoids the XY problem (or people thinking you have an XY problem) and it might spark an idea by someone else as to how to tweak one of the failed attempts into a working one.

I think it's useful for everyone to remember that we're here to help each other and improve Rust to be able to solve different problems (and, ideally, ergonomically at the same time) in the process.

carbotaniuman · July 16, 2021, 4:47am

I mean we're just XYing right now. Debating whether sqlx could solve one issue without considering the other use cases of this (async drop is not a problem unique to sqlx) brings the discussion off tangent, and presumed bad faith on the part of the poster. A valid critique might be the feature doesn't pull it's weight, but given the author specifically addressed using a closure, I don't think it's valid the way it's presented here.

jhpratt · July 16, 2021, 5:38am

I'd just like to bring this back up. dylint was mentioned, but that seems a bit clunky under the hood.

Would others be interested in discussing my idea further? I think it would be an interesting test of how well suited the current rust-lang libraries are to various tasks and could result in a high-quality, flexible, performant linter.

pickfire · July 18, 2021, 2:16pm

Looks like bevy wants that as well. Feature request: custom library-specific lints · Issue #6872 · rust-lang/rust-clippy · GitHub

Topic		Replies	Views
Idea: unwinding and linear types in async context language design	2	231	March 8, 2025
Towards a Stable Compiler API and Custom Lints	28	2481	November 3, 2021
Update on the await syntax after May 23 meeting	53	4580	September 11, 2019
Compiler support for Compile-time reflection language design	12	1581	November 13, 2022
A clippy lint to warn on default match binding modes compiler	14	1558	October 11, 2019

Proposal: stabilizable, proc-macro based lints

Motivation(s)

SQLx

Serde

The Idea

Related topics