Must_use and linear types

For example, I've used a few times let _ = fs::remove_dir_all() to clean up a temporary directory. This is a case where I want the deletion to always run, but I really don't care about the result, since there's nothing meaningful I can do about failures, and there's no value to use on success.

I could get behind something like fs::remove_dir_all().ignore_result(), but there needs to be an out.

3 Likes

There are cases where, abstractly, one ought to use a value, but practically, there isn't anything you can do with it, and/or attempting to do the "correct" thing with it will only make the situation worse. The one I know about, off the top of my head, is write errors on stderr: suppose you go to print an error message and the OS reports that the disk holding the log file is full. Now what? If you try to report that error you'll get stuck in an infinite loop of failing to write out "hey the disk is full" error messages ...

(I haven't tried it but, looking at the code, it appears that the only thing stopping the stdlib implementation of panic! from going into an infinite loop when it can't write to stderr is the "if this is the third nested panic, give up and abort" backstop at https://github.com/rust-lang/rust/blob/master/src/libstd/panicking.rs#L503 .)

2 Likes

I'm wondering why #[must_use] doesn't just mean "MUST USE"? Why should there be any out at all?

What do you mean by "use". Does drop(must_use_ret()) considered as using? Is passing to user-defined function (that unfortunately forgots to use its argument) considered using it?

I mean (as I explained in my comment above) that either it means that a function/method must be called that takes ownership before the scope ends, or, depending on how we attempt to define, some other type of call (mutable borrow, shared borrow, or specific subset of methods/trait methods).

Yes, due to the need for local analysis that may only depend upon the function signatures and annotations/attributes; however, function arguments that go unused already cause warnings via compiler or other lints, so, I think that would be an acceptable implementation of "must use". I think the stronger, and better definition, would be to say what methods being called constitute "using it". That way the contract is clear.

That should maybe be a #[should_use] annotation or simply that is implied by the fact that there is a return value. If a function returns a value, it is somewhat implicit that you should (ought to) use it, but a "MUST USE" is stronger contract indicating it is truly and error for you not to use it.

I would say that is a NO.

Yes. Unless the #[must_use] indicates that only specific methods constitute "usage" in that context. (see my EDIT's on my original comment above where I elaborate).

Maybe you want linear types?

You either disable logging and start logging to stderr instead or just stop logging altogether or you panic/abort. Either your application requires logging to function or it doesn't. If it doesn't, then, simply disable logging and move along. If it does require it, then, the only thing sensible/reasonable to do is panic/abort (or return an error up the chain at the very least).

I don't see why simply ignoring the return value is useful. If the API says, "MUST USE", it should be "MUST USE". An API should say that if it doesn't make sense that the caller must use it.

For example, this hypothetical API could be written such that the return value, that simply indicates whether or not logging is successful, and, if not, why should NOT be "must use" and instead should leave it up to the caller to care or not care about whether logging as successful. If the caller cares, they use the return value. If the caller doesn't they don't. In either case, the API should not be "must use" (assuming my modified definition).

Yes, I mentioned that. I believe that is ultimately what "must use" should ideally be. Perhaps also with limited "Type State". Currently, mutable vs shared, is a kind of "Type State" that is part of the signature of return values, struct members, and function parameters involving references. Lifetimes can also be thought of as a kind of "Type State" (also in signatures, though, sometimes elided). Perhaps it would be good to have a "Linear Used/Unused" type-state that could be similarly indicated for a type. Maybe something like:

fn foo( a : 'a A, b : 'b ''unused B, c : 'c ''used C );

The above function signature indicates that it takes an a which has lifetime 'a' which is of type A and a b with lifetime'b, linear type-state ''unused, and type B and a c with lifetime c, linear type-state ''used and type C.

However, that seems like overkill because the type-state could simply be handled by having Type C1 when returned from some method have a requirement to call some method C1.consumer( self : Self, ... ) -> C2, where C2 represents the consumed/used version of C1.

Unfortunately you probably do not want actual linear types in Rust just for these use cases: https://gankra.github.io/blah/linear-rust/

The way #[must_use] is applied today, it is simply not worth enforcing to the incredibly dogmatic extent that you are describing. There are plenty of completely valid scenarios where explicitly throwing out the result is the best option.

The discussion here should not be about making #[must_use] (antagonistically!) airtight- that serves no one! Rather, the question is around how to make sure people (including beginners) accurately understand when a result is intentionally discarded. And more specifically, when this changes the lifespan of a guard object.

4 Likes

It's not clear to me why that is a given. I don't get the reasoning for calling it "MUST use" and then saying, "well, if you want to ignore that advice, OK'. That seems like a contradiction. Perhaps this annotation should be called #[should_use] and #[must_use] should be stronger?

It is not a given, you're just ignoring or dismissing all the examples people have already shared. The ability to intentionally discard a result was a requirement of the design from day one. If that seems contradictory with the word "must," then perhaps "should" would have been a better choice at that time, but there's no sense in changing it now.

5 Likes

Would not the "sense" be to make the meaning of the words align with what it actually means as used? This could be done by deprecating the existing annotation in the next Edition and introducing the new one, and then, in the following edition, redefining the meaning (for usage for linear or relevant types). This could also be supported via "rustfix" for the conversion. So, I don't see a technical reason that would prevent a name change, but there is a good case to be made that the meaning of the words used for things like this should align with how they are actually used. I think this is the same or similar reason that "allowed_list" is a better name than "white_list" because the meaning of the words used better aligns with the intended meaning.

1 Like

You probably want to at least print an error message saying that you couldn't clean up the directory, but yes, you don't want to terminate the program or bubble the error up with ?.

1 Like

That is entirely application specific. If you have an application, where logging is a legal requirement, you absolutely would want to terminate the application if logging is unable to succeed/continue.

For "Logging" in particular, I don't see why the API would ever return a value that should be "must use". If alternative logging mechanism is available, and the user of the library either allowed (or at least didn't decline) that alternative, then the API should simply automatically switch to the alternative mechanism and the return value should simply indicate the switch (if it is even needed at all which is debatable), but that return value would not be "must use". If the application absolutely must use logging (say it is required for legal compliance, like HIPAA) and the logging mechanism can no longer work at all, it should be a configurable parameter of the framework or a passed parameter or a different API call that chooses between: 1) return a value indicating success or failure with reason for failure (that is not "must use", though perhaps is "should use", 2) panic/abort, or 3) the function called has a "must use", but, that means it is "MUST USE", not "SHOULD USE".

I just don't see how something like a logging API (or any other I can think of for that matter) should have a "must use" that didn't actually mean "MUST USE". I also think that the notion of "should use" (which is what I believe is being modeled by the current "must use" annotation) is superfluous because, if an API defines a return value, it is implicit that you should, at the very least, use it in some fashion. If that isn't needed by the API, why return a value in the first place. Why not just return () if there is nothing relevant for the caller to use.

I guess what I'm saying, is any value returned from a function should be implicitly treated as "should use" and if the value being returned is truly superfluous, then, the API author should correct that bug by return () instead. If there truly is some need to have a function/API that returns values that it really doesn't matter if the caller uses them in any fashion or not (which I am very skeptical of), perhaps there should be a #[superfluous_return_value] annotation instead to quiet the warning for unused return value that should occur normally.

In the case where the return value really MUST be used for correct usage of the API, that would be the purpose of the #[must_use] annotation. In that case though, it would be good to have a way to specify clearly what constitutes "usage". Also, it wouldn't be a warning to not use it, but a compiler error. The kind of thing that would constitute defining clearly what is real "usage" would be one of:

  • Without qualification would mean simply that the return value must not be dropped before doing something else within the scope after the value is returned AND that it would be an error to pass ownership of the value to another method or function. This would be used for things like "Scope Guards".
  • A list of 1 or more inherent or trait methods one of which must be called on the return value that takes ownership of the value as "Self" (preferred) before the end of the scope. In this case, calling any other function or method that takes ownership of the value would be a compiler error. Also, with this type, the compiler would require that something is done within the scope between the return value and the calling of the method that consumes the value. This would be used to model "Relevant Types" that could not be placed in a container without the intervention and knowledge of the API author (including a Box).

An extension, perhaps, would be to provide a #[use_guaranteed] annotation that could apply to parameters that take ownership. In the case, the function/method is making the guarantee that could be used to take ownership in case 1, but not case 2, to allow the "scope" of use to be passed/handed off. A parameter of a function annoted with #[use_guaranteed] would have the same requirements to "use" the value as if it were returned from a function that used #[must_use] (without further qualification). I'm not sure I can think of a useful reason to have this though.

This is exactly how Swift works, and I think that's a very nice solution. The default is to warn, and there's @discardableResult for unimportant "FYI" return values.

However, it's probably too late for Rust to change the defaults now. It could cause a torrent of spurious warnings about all existing code.

It's less about the logging API itself but inside the logging implementation. Write::write returns Result (which is #[must_use]). Dropping failed writes is a valid semantics.

Saying #[must_use] should be "MUST use" is saying that every Result-returning operation MUST in some way use that result, even if "give up if it didn't work" is a perfectly valid approach.

So I assume you always check the return value of HashMap::insert? Never Vec::pop a value off of a stack without looking at it (such as a stack of Drop guards)?

Any "pure" function can and probably should be "should use". The "@discardableResult" case really is a minority. But it's wrong to suggest that HashMap::insert is superfluous to return the old value from the map, because the common case isn't going to do anything with it (other than drop it).

Rather than e.g. have HashMap::insert(&mut self, K, V) -> () and HashSet::replace(&mut self, K, V) -> Option<V>, it makes more sense to have just one API and just discard the result if you don't need it. You gain little for duplicating the API (and with a smart enough inlining optimizer, nothing).

5 Likes

Couldn't this be fixed at an edition boundary?

In addition, making must_use MUST USE can be trivially defeated, no matter how smart you try to make the code that determines what constitutes a use.

1 Like

I feel like the more the 2021 deadline approaches, the more everyone wants to fit everything into an edition change, but we should try to remember that edition-boundary changes aren't free.

I think the status quo works well enough for 90% of cases.

2 Likes