Pre-RFC: `#[must_bind]`

Confusing let _ = ...; and let _a = ...; seems like a common beginner mistake that leads to a buggy program instead of a compile-time error. (example mention). It is natural to think that once I bound it do something, it will live up to the end of the block and do its work.

#[must_use] does its job well for Result-returning functions, but when the main point of return type is its Drop implementation, #[must_use] may be not enough.

I propose an enhanced version of must_use: #[must_bind] attribute.

Such functions or types should issue a compile-time warning not only if they are not bound to anything (a_must_bind_fn();), but also if they are bound do _: (let _ = a_must_bind_fn();). This (and a helpful compiler message which would be shown on violation) would protect against situation mentioned in the linked article.

In order to really ignore #[must_bind] and immediately drop the value, drop(a_must_bind_fn()); should be suggested by the compiler warning message.

(This attribute name was suggested on IRC).

19 Likes

To give some additional context here (which I didn't know myself until I tried it out on the playground): let _ and let _name have different behavior here. let _ gets dropped immediately. let _name does live until the end of the block, and the _ just suppresses the warning about an unused binding. So, the warning can suggest changing _ to _some_name.

And yes, I completely agree that we should add this attribute, and that we should apply it to a few things in the standard library that have RAII semantics.

12 Likes

The standard design guideline is to proxy access through your RAII guard, rather than having your guard just alongside the guarded data.

That said, I'd still support a #[must_bind]/#[must_retain], since the "alongside" versions do exist (and often must exist to serve as the implementation primitive for the wrapping version).

5 Likes

Just to stave off another minor but existing misconception: let _ = .. is just a no-op and does not force anything to be dropped by itself.

I'll put in this example.

{
    let s = String::new();  // s owns the string.
    let _ = s; // No-op. `s` still owns the string..
} // s goes out of scope, so the string is dropped here.

"let _" is "don't hold this". In some situations not holding something is what causes a value to drop. :slight_smile:

12 Likes

What in the standard library would have this attribute? The standard libraries locking primitives don't allow access to values unless you hold the guard.

In the example post, the right API for a high level Semaphore primitive in Rust would be for it to be a Semaphore<T> which returns a guard that dereferences (non-mutably) to T. The command function then would be best implemented as a method of some type that is guarded by the semaphore, rather than a free function.

In cases where the raw version is exposed, it usually involves no RAII at all. For example, parking_lot's RawMutex/RawRwLock type do not return RAII guards. I'd like to see concrete examples where an intermediate "partial-RAII" API like this would really be the best choice.

4 Likes

With actual merit, or just someone's hobby project and/or they didn't know about RAII guards and/or didn't know how to write RAII guards? I ask because it seems like a real waste to not have rustc leverage its type system if it is in fact feasible for the problem at hand.

If it's the former case I agree, but in the latter case, I see no need to accommodate bad code in this way. In that case I think it's better to have rustc suggest refactoring the code to use RAII guards i.e. attacking the issue at its root rather than just slapping a patch on it.

Do you consider Drop-only use of RAII an anti-pattern?

1 Like

I assume you mean APIs like the Semaphore in the linked post. I'd like to see an example where they are recommendable as a public API. They provide strictly less checking than using a full-RAII guard, and don't guarantee at all that what the user intends (that an event is sequenced in the critical path) actually occurs.

It can make sense for certain kinds of private scope guards where the full implementation would be overkill and you can trivially, locally verify correctness (but need a dtor to handle potential panics). But I think those examples would not benefit from must_bind anyway, because they are hyperlocal and the author either already knows the binding rules or doesn't.

2 Likes

I personally do not see any real use for must bind. However, playing devils advocate, maybe an example for its use would be a timer that finds the duration between its construction and its drop? Of course, this example has issues because drops may not be guaranteed to run, but besides that problem must bind would be helpful in preventing using the timer incorrectly.

The article mentioned by @vi0 uses a tokio::sync::Semaphore. Its acquire() method acquires a permit and returns a guard. When the guard is dropped, the permit is released again.

This is similar to Mutex/MutexGuard, except that a mutex can only be modified through the MutexGuard, so there's no risk of accidentally dropping it too early (the only risk is dropping it too late, which can cause a deadlock).

1 Like

There are quite a lot of uses of Mutex<()> out there:

https://grep.app/search?q=Mutex<()>

(Off-topic: I only just now learned about the existence of that code search site, by googling 'github search regex' for the nth time over the years, expecting to be disappointed. But I wasn't! It's only been around for a few months, and it seems pretty neat.)

10 Likes

MutexGuard. Mutex<()> sometimes needs to be used to control external resources that can't be wrapped in the mutex itself. I use it to control access to a directory within the process. Unix can't have OS-level directory locks.

tokio's Semaphore could be difficult to use if it was Semaphore<T>, since it may be used for things like throttling incoming requests across multiple server endpoints, which isn't access to a specific object, but more like controlling access to CPU and memory in general.

Distributed tracing has "span guards" to log how long operations took. This is usually tied to code flow, not resources. Plus it needs to be easy to add after the code is written, so again Span<T> wouldn't be convenient.

7 Likes

I learnt this non-binding let _ = construct a few days ago (cost me more than an hour of self-doubt) when developing a PR for tracing-subscriber concerned with the logging of “spans”. These constructs do not offer resources (there is one feature, but it is optional and rarely used), so the Entered handle is almost never used by subsequent code, its sole purpose is to delimit the scope during which the tracing span is active.

Discovering that there is a semantic difference between let _ = and let _a = was the first and so far the the only “what the heck?” moment with Rust for me. Unless this wart is removed (by making let _ = binding for the rest of the scope), something like #[must_bind] on the Entered type looks like the only way to spare fellow Rustaceans my little Odyssey when using tracing spans (where let _ = span.enter() is always a bug). Or is there a better way to express such spans in the type system?

11 Likes

I wrote a Wiki page about the drop order of _ patterns. I didn't find any documentation about it, so I did a lot of experiments in the playground. I hope the information is correct, if you find mistakes, please fix them!

1 Like

But I'm wondering if some of these couldn't be better handled with something like a ZST with methods guarded by the mutex, instead of using free function. It's not a big improvement since you can always go around the mutex pretty trivially, but it would be an alternative way to avoid this footgun that doesn't involve adding an attribute to the language and support for it in the compiler.

3 Likes

... that doesn't involve adding an attribute to the language and support for it in the compiler.

I don't expect it to be a complicated attribute.

It's possible, but it needs special attention to this issue and extra code. It seems relatively far from the most straightforward/lazy solution. Also as a library author I can't stop people from using () payload, so it depends on users' diligence each time, not just good API design that makes users do the safest thing. But as a library author I could add must_bind to at least catch the known failure case.

How about making let _ = emit a warning? (and ideally an error, for code that is not macro-generated)

As far as I can tell it serves no useful purpose beyond orthogonality with let (foo, _, bar) = and similar constructs.

3 Likes

The only reason I'm even aware of let _ = is because it's already a standard way to silence a #[must_use] warning when you really do want to not use the returned value. Apparently this is true in Swift too, so we're not being especially weird or anything.

There's a reasonable argument that this isn't a good convention (I happen to think it is a good one), but either way making let _ = a warning at this point would be way more hassle than it's worth.

14 Likes

Note however that in Swift let _ = foo() is equivalent to let _a = foo() when _a is unused. (except the latter emits a warning, to switch to _) Both 'drop' the result of foo immediately in optimized builds, and at the end of the scope in unoptimized builds.

2 Likes