Improving the usefulness of `#[must_use]`

Kixunil · February 13, 2025, 9:32pm

I have two ideas for improving #[must_use] that also benefit from combining them.

The first (and possibly logical to do first chronologically) is to have the warning trigger if you're returning it from a function that does not have #[must_use].

Example:

#[must_use]
fn foo() -> u32 {
    42
}

fn bar() -> u32 {
    foo()
}

This code silently drops #[must_use] and I think it's not great. If the result of foo was sufficiently important to warrant a warning shouldn't that warning be propagated if that value is returned from other function? I believe it should. Thus it'd make sense to trigger a warning saying that bar() returns a value produced by a #[must_use] function but itself doesn't propagate #[must_use]. This warning should be a new one that can be silenced with a proper allow attribute on the function.

Secondly, consider this code:

let mut i = 0;
i += 42;
let mut vec = Vec::new();
vec.push(42);

If you try to compile it you will get a (very appropriate) warning about unused write to i but you won't get that for Vec::push despite it being very similar - the vec is useless unless something reads its items. The side effect (allocation) is very uninteresting in all cases except maybe some allocation test code. I think I could find many more such functions in std alone. Basically anything that just pushes into a collection or modifies something without a side effect.

I suggest simply allowing #[must_use] on reference arguments, triggering a warning if the caller didn't later use that reference or whatever the reference was pointing to. This is obviously somewhat hairy since a reference could come from a function argument - and this is where a warning similar to the above would make sense too - warn about the argument not also having #[must_use]. If the reference was returned from a function it's weird but I'd say just ignore it and perhaps in the future find a way to link it to the true source (e.g. for Pin).

Notably, it appears as if this only makes sense for &mut and &T where T is not Freeze however there's at least one counter example I can think of: clone(). If you don't use the original you're just wasting performance.

A special case is things like Vec::pop. There you need to use the vec or the returned value (or both). This proposal cannot express that but it could be done with something like #[must_use(self | return)]. This seems a bit complicated so perhaps it could be done later.

I think even doing the basic version would be already quite helpful.

scottmcm · February 13, 2025, 9:44pm

I think the biggest thing we need is to also have a #[may_ignore].

Then we could have the compiler use heuristics to warn about unused return values fairly often, with the opt-out of letting people put may_ignore on functions where those warnings are inappropriate.

I think the current direction of spamming it on basically everything is silly, and an indication that we need something more than just adding it more places.

idanarye · February 14, 2025, 12:59am

I'm unable to understand what you mean by that. Can you elaborate and/or give an example?

binarycat · February 14, 2025, 3:54am

there's already an rfc for that

josh · February 14, 2025, 10:07am

For the second one (Vec::push), I completely agree. I think one simple solution for that would be an annotation on methods that semantically are only a write but not a read, such that if you only call such methods but never any other methods, the object should be considered "written but never read". I can imagine heuristics trying to guess whether to apply such an attribute, but for a start, let's have the explicit attribute and use it on methods like Vec::push and HashMap::insert.

I expect that an RFC proposing such an attribute would be fairly easy to accept, and I expect that the most annoying part of proposing and reviewing such an RFC will be naming the new attribute.

For the first, I would expect many false positives: #[must_use] on foo() means "you should do something with this", but returning it as bar()'s return value is doing something with it. Whether the caller of bar() cares about that return value is something that bar probably knows better than foo.

To the extent that the #[must_use] is something that "follows the value", I'd expect many such cases to be covered by types that have #[must_use] annotations.

Kixunil · February 14, 2025, 1:21pm

@scottmcm do you have in mind having must use by default and make it opt out instead?

@idanarye Consider this code:

impl<T> Vec<T> {
    fn push(#[must_use] &mut self, item: T) { /* ... */ }
}

fn push_two_items(vec: &mut Vec<i32>) {
    vec.push(42);
    vec.push(21000000);
}

Here the author forgot to add #[must_use] on the vec argument of push_two_items. As a result, the caller could forget to use vec afterwards. Having a warning about it would help propagate this up.

@josh yeah, even simple start would be nice, we can extend it later. But it seems to me that an attribute on parameters rather than methods would be more useful. For instance core::mem::swap should arguably mark both parameters (if you don't read them why swap them and if you only read one why not simply assign?).

Honestly, I think nobody can tell for sure unless we do crater run or some very clever grepping. I've seen code where the same reason why you need to use the returned value applied to which ever function was just calling it and returning the result. But it had something to do with std::io::Read so that case is already busted by Result being "broken". Anyway, I think that most cases do need propagation but I certainly respect it's just an opinion and we don't have hard data.

If there's a cheap way to gauge the numbers I'd like to look into it. I suppose crater run would take a lot of resources, right?

josh · February 14, 2025, 1:26pm

The crater run itself would be worthwhile and worth the computing resources; the more limiting resources would be that someone would need to look in detail at the many many many lints it would produce and attempt to characterize what fraction of them are false positives.

Regarding annotating parameters vs methods: I see what you're getting at now. The annotation I was suggesting would largely be equivalent to annotating the &mut self parameter, and annotating parameters in general does seem like it would be more general, as it would handle write-only parameters.

I think swap is a borderline case, though, because it does both read and write both parameters, so it does legitimately count as a read and shouldn't make a value count as unread. There's some analysis that might be useful there, but it seems like that'd get into the realm of more complex dataflow analysis.

For more obvious cases, an attribute indicating that a parameter is "not semantically read" would make sense. (It doesn't literally mean that the contents are never read, because for instance, Vec::push will read the capacity and length. But semantically it doesn't count as a read operation.)

The reason I'm carefully making that distinction is that for other purposes, people sometimes ask for a &mut where the value is literally never read from, for purposes such as uninitialized memory. We have better solutions for that (and may want even better ones in the future), but I wouldn't want whatever attribute we devise here to confuse people who are looking for literally "this is never read from and might be uninitialized".

#[semantically_write_only]?

Kixunil · February 14, 2025, 2:10pm

I would be willing to characterize a bunch of them from top crates, I obviously cannot promise all of them.

Yes, but

as you write later, Vec also reads stuff but that doesn't mean it shouldn't have the attribute
the implementation of swap is effectively: let tmp = a; a = b; b = tmp; That operation contains writes at the end. Just like x += 1 reads the value but writes to it at the end and still has the warning (correctly).
Consider all three cases when at least one of the argument is unused.

let mut a = foo();
let mut b = bar();

if a < b { swap(&mut a, &mut b); } // this kind of thing is somewhat common

// neither is used here

What's the point of the swap? Both values will be dropped, you can delete swap and then you find out you can also delete the variables. Or maybe you forgot to use the values.

let mut a = foo();
let mut b = bar();

if a < b { swap(&mut a, &mut b); }

use_value(a);
// b is unused

Could be written as:

let mut a = foo();
let mut b = bar();

use_value(if a < b { b } else { a });

The third case when a i unused is just a mirror image of this one, so I'm omitting it.

Which I think demonstrates that swap with an unused value is pointless.

Definitely!

Quite long IMO. Also it doesn't apply to niche use cases when not using an immutable reference is also weird. Like calling clone and not using the original. Supporting shared reference is needed because of interior mutability and if we support them then it doesn't seem to be useful to make the distinction between Freeze and !Freeze as it unlocks warning on clone but that also implies write is misleading term. That's why I picked #[must_use] for now. It's already known and behaves very similarly. I've read somewhere that references (borrows) are almost like a syntax sugar of taking the value and then returning it from the function. From that perspective it makes a lot of sense to use the same mechanism.

josh · February 14, 2025, 2:29pm

Re swap, I see what you mean: it should count as a read of the previous values of the variables, but it should count as a write of the current values of the variables.

Re must_use: the potential confusion there would be that #[must_use] &mut self could just as easily mean that you must use the value inside the method (e.g. "this method must use self, internally, warn if it doesn't").

But making use of that theme, how about must_use_output? It's effectively doing for an out parameter (or an in-out parameter) what must_use does for a return value. How does that sound?

And that also makes the swap case clearer: you must use the output of each parameter. (An example of a case where that might still produce a false positive: consider a loop where you swap in each iteration. In the last iteration, you might ignore a value you swapped out. That seems like a potentially reasonable false positive to have, but we'd need to consider that case when marking swap's parameters as must_use_output.)

Kixunil · February 14, 2025, 2:55pm

I'd prefer something shorter if possible but at least semantically it makes sense. But if we want to be super nitpicky, #[must_use] would be more clear on the return type, not on the function

Having += 1 in a loop already doesn't fire lints (presumably because the compiler cannot even determine that's happening), so I think it's fine. However something like this does come up in macros: a few times I wrote things like $(do_stuff($foo, index); index += 1; )* which warned at the last "iteration" (and also on unused mut if th input was empty). Honestly, that's reasonably rare and easy to #[allow], so I really don't mind these FPs. Making the lint smart to understand that a macro had to produce it would be nice to have but really, I'd rather people spend time on more productive things.

josh · February 14, 2025, 3:34pm

Interested in writing the (very short) RFC for must_use_output? I'd be happy to review and support it.

Kixunil · February 14, 2025, 4:08pm

Yes, I was thinking about doing it. I might be able to do it tomorrow, we'll see.

mathstuf · February 15, 2025, 6:31pm

Doesn't clippy already check for this?

kornel · February 16, 2025, 12:39am

Unused/unread variables are tracked only locally, and it's a simple check:

let a = result();
if false {
    let b = a; // that's a use
}

In this example a is "used", even though it's not really. The dataflow analysis behind it is very basic, and doesn't extend through b.

Having a warning for unused/unread vec.push() would be awesome, but it's a very different thing. If it's implemented in a similar way to unused variables, it would have to be limited to trivial cases of variables owning a Vec and dropping it in the same function. A more widely applicable check that could handle &mut Vec would need some inter-procedural data flow analysis.

josh · February 16, 2025, 10:39am

The proposed #[must_use_output] would be enough for simple cases like "you have a Vec field in this struct, and you push to it various times, but nothing in the program reads from it".

Kixunil · February 16, 2025, 12:14pm

Took a little longer than I expected but it's up now: Add `must-use-output` attribute by Kixunil · Pull Request #3773 · rust-lang/rfcs · GitHub

It does but if simply slapping an attribute on a method parameter can do the same thing I see no reason not to do it.

@kornel I'm not proposing a full-blown analysis, though it can be added later. Even something as simple as the must_use check is already valuable.

josh · February 16, 2025, 1:05pm

Thank you for working on this! I've nominated it for upcoming discussion.

scottmcm · February 20, 2025, 5:46pm

I don't have any exact behaviour in mind. I just know three things,

People keep asking for must_use on more and more functions -- there's something like 1822 instances in the standard library, based on a quick rg.
Most of those have nothing useful to say (unlike, for example, the one on split_off that can point you to use truncate instead, rather than give a generic message)
Yet somehow people apparently don't want to turn on unused_results lint that does tell you about every unused result.

So if people want it on most things, but apparently don't want it on everything, then it sound like the right answer is to infer it in more cases, but have a way to correct things if it's wrong.

zackw · February 20, 2025, 6:32pm

Speaking only for myself, I didn't know that lint existed.

I turned it on experimentally for the program I'm working on right now. It flagged zero genuine bugs; i.e. every case where a result was being discarded, that was the intent, and there isn't a better alternative API as far as I know. It did flag a few cases where let _ = ... could be worthwhile as an annotation, e.g.

// SAFETY: This function writes one `uid_t` quantity to each of its
// three pointer arguments; we have supplied references to `mut uid_t`
// variables.  It cannot fail and always returns zero.
unsafe {
    libc::getresuid(
        ptr::from_mut(&mut r),
        ptr::from_mut(&mut e),
        ptr::from_mut(&mut s),
    );
}

The comment already tells you that the return value is being ignored because it's always zero. Writing let _ = unsafe { ... }; would underline that. I don't think this is super important; I wouldn't turn on unused_results just for the sake of catching future cases like this one.

unused_results also flagged a whole lot of places where I'm using expect or expect_err in tests, where the thing being tested is simply "this operation should have succeeded/failed". Simplified example:

#[test]
fn bad_packet()
{
    let packet = construct_invalid_packet();
    super::decode_packet(packet)
        .expect_err("decoding should have failed");
}

Using let _ for these would obscure the actual goal. assert_matches! might be a suitable replacement in some cases but I'm not sure if it would suit all of them. In normal code (as opposed to tests), though, the unwrap family probably is high on the list of existing std functions that aren't marked must_use but it usually is a bug to discard their return value. I dunno. How would you write a test like this?

kpreid · February 20, 2025, 10:31pm

I think these problems can be summed up as: in order for unused_results to be practical to use outside of ultra-rigorous code, std and other Rust library API surfaces would have to be designed to work with it (e.g. by offering something for your expect_err situation), and they aren't, because hardly anyone uses it so hardly anyone thinks about it.

Topic		Replies	Views
Criteria for adding `[must_use]`?	7	1016	July 21, 2020
Use of #[must_use] in the standard library libs	18	2798	January 13, 2022
Pre-RFC: `#[must_bind]` language design	51	6091	March 24, 2021
#[must_use] for Result::Ok?	5	1724	April 14, 2022
Must_use and linear types language design	22	3240	October 30, 2020

Improving the usefulness of `#[must_use]`

Related topics