Rust should warn on impl Trait returning unit

Pzixel · December 23, 2024, 10:50pm

Here is an example bug from the codebase I'm working on atm and which was live in codebase for a couple of months:

    pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
        self.items.iter().map(|x| x.id_string.clone()).collect::<Vec<_>>();
    }

The intent behind this implementation was "We don't want to disclosure our internal Id type, the only contract we provide is that you can use it as a key, but that's it". Originally it was more on the line of self.some_key but then our new developer changed it according to new requirements and made a typo. As you can imagine, the original intent was to write:

    pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
        self.items.iter().map(|x| x.id_string.clone()).collect::<Vec<_>>()
    }

This is a small and barely noticeable difference but it has huge implications on everything downsteam.

Since this code falls into "Formally correct but almost certainly wrong" I have a strong opinion that we need a compiler warning for such cases. It's highly unlikely that user wrote impl SomeTrait and wanted a () as a result. This warning can be suppressed for cases where it's really intended (if they exist, I'm trying really hard to think of them but if it doesn't come from trait implementation I can't imagine why anyone will want to write impl Trait instead of just ()), but in majority of cases it just points out that there is probably a type and you added a semicolon where you didn't actually intend to.

steffahn · December 23, 2024, 11:10pm

I fully agree that this is a reasonable thing to lint against; the rare cases where this is deliberate should be encouraged to write

pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
    self.items.iter().map(|x| x.id_string.clone()).collect::<Vec<_>>();
    ()
}

That being said, your concrete code example should have also triggered unused_must_use, right?

warning: unused return value of `collect` that must be used
  --> src/lib.rs:11:9
   |
11 |         self.items.iter().map(|x| x.id_string.clone()).collect::<Vec<_>>();
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: if you really need to exhaust the iterator, consider `.for_each(drop)` instead
   = note: `#[warn(unused_must_use)]` on by default
help: use `let _ = ...` to ignore the resulting value
   |
11 |         let _ = self.items.iter().map(|x| x.id_string.clone()).collect::<Vec<_>>();
   |         +++++++

warning: `playground` (lib) generated 1 warning

The specifics of such a lint are a bit non-trivial unfortunately. E.g. one could question

whether returning () from a function that produces () should be allowed
- this is relevant because certain API designs might do in-place mutation with a function name that is ambiguous about whether it operating in-place or returning a new result;
  and realistically, such API design might rely on the fact that users expecting the wrong behavior would get a type error^[1] (or an unused_must_use^[2]).
what’s the intended interaction with control-flow expressions that have optional ; anyways
- e.g.
```
 pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
     if condition {
         foo;
     } else {
         bar;
     }
 }
```
  would probably clearly be problematic, too; but what if only one of foo or bar is without the ; (and evaluates to (), too)? So for example
```
 if condition {
     foo;
 } else {
     bar;
     ()
 }
```
  looking at which… I don’t think that’s in any way problematic anymore.
- there could also be differing levels of enforcement between rustc and clippy, e.g. clippy could still lint the last example above for bad style, and always require an explicit ; ()/() at the end
```
 if condition {
     foo;
 } else {
     bar;
 }
 ()
```

what if the () already made it into a local variable? E.g.

pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
    let calculated_id = {
        let excluded = "special";
        self.items
            .iter()
            .filter(|x| x.id_string == excluded)
            .map(|x| x.id_string.clone())
            .collect::<Vec<_>>();
    };
    self.other.do_something_else();
    calculated_id
}

should one possibly even aim to handle cases where the () is nested into something else, e.g.

pub fn unique_id(&self) -> (impl Hash + PartialEq + Eq, Kind) {
    let calculated_id = {
        let excluded = "special";
        self.items
            .iter()
            .filter(|x| x.id_string == excluded)
            .map(|x| x.id_string.clone())
            .collect::<Vec<_>>();
    };
    (calculated_id, Kind::StructuralId)
}

here, we still at least have the impl Hash + PartialEq + Eq being instantiated by (), but it could also become part of another thing, e.g.:

pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
    self.items
        .iter()
        .map(|x| {
            self.item_watcher.log_visited_item(&x.id_string);
            x.id_string.clone();
        })
        .collect::<Vec<_>>()
}

producing a Vec<()> accidentally due to a semicolon, so the impl Hash + PartialEq + Eq itself isn’t () directly.

if they expected a value and called an in-place operation ↩︎
if they expected an in-place operation but it returns a value instead ↩︎

elidupree · December 24, 2024, 12:01am

tbh I'd think we should also warn-by-default in all cases where someone explicitly assigns to a local variable and it's implicitly given the type ()!

(Maybe even "implicitly making a Vec<()>" too, although that one's more of a stretch.)

Pzixel · December 24, 2024, 12:06am

Interesting. I've checked out codebase and I found one difference, our developer wrote this instead:

impl Foo {
    pub fn unique_id(&self) -> impl Hash + PartialEq + Eq {
        self.items.iter().map(|x| x.id_string.clone()).collect_vec();
    }
}

And if you check it it won't warn this case: Rust Playground . I will create an issue/PR in the itertools repo, but I think this more highlight the issue rather than being the issue

I agree that such a lint need to thought through before implementing.

cuviper · December 24, 2024, 1:33am

clippy::unit_return_expecting_ord is similar, but specific to Ord.

kpreid · December 24, 2024, 4:16am

clippy already has a default lint for that: let_unit_value.

kpreid · December 24, 2024, 4:19am

How about this definition? Lint when:

the declared return type of a function is impl SomeTrait,
the concrete return type is (), and
the () was not produced by an explicit () expression at the end of the function.

This way, it doesn't need any control flow analysis. I think this is reasonable because if you do want to intentionally return (), the clearest way to do that is at the end; anything else is arguably obfuscating the fact that you're returning ().

CAD97 · December 24, 2024, 7:15pm

Clippy would lean towards answering no here, e.g. given f: fn() clippy doesn't like Ok(f()) and wants f(); Ok(()) instead. I agree this is generally^[1] good style.

My known exception to prove the rule is f: fn() -> Result<()> where sometimes I do want to write Ok(f()?) because I'm logically still returning the result of the f() computation, not doing f() and indicating that control flow completed successfully. And Ok(_?) is a much better indicator of that than _.map_err(Into::into) or even _?; Ok(()) imho. (try functions fix this) ↩︎

SUPERCILEX · December 27, 2024, 8:14pm

One idea could be to ban semicolons at the end of expressions whose return type is not () (including generics). Of you did want to return the empty tuple, you'd have to write it explicitly.

kpreid · December 30, 2024, 4:44pm

I wrote a Clippy issue for this:

github.com/rust-lang/rust-clippy

Lint against return position `impl Trait` returning `()`

opened 04:43PM - 30 Dec 24 UTC

kpreid

A-lint

### What it does If `Trait` is implemented for `()`, then a function returning …`impl Trait` can accidentally return `()` as that implementor simply by a misplaced semicolon. Clippy could have a lint against * any return position `impl Trait` * whose concrete type is `()`, * but where that `()` value did not come from a literal `()` expression in tail position or `return`. This lint would fit into the `suspicious` category. ### Advantage Users will be able to find such bugs quicker, without needing to first test the behavior of the returned opaquely-typed value and find that it is wrong. See also [prior discussion on IRLO](https://internals.rust-lang.org/t/rust-should-warn-on-impl-trait-returning-unit/22068/). ### Drawbacks An author might be intentionally passing through `()` by calling another function that currently returns `()` but will change in the future, and not want to satisfy the lint by adding `; ()` because that would disconnect the relationship between the two. ### Example The lint should fire on this code, because it returns `()` even though it looks like it returns something more meaningful: ```rust fn key() -> impl Eq { [1, 10, 2, 0].sort() } ``` The lint should not fire on this code: ```rust fn key() -> impl Eq { [1, 10, 2, 0].sort(); // pretend I wrote something more useful here () } ``` It should also lint *any part* of the return value that is `impl Trait`; for example, the `impl Trait` might be wrapped in `Result` (or a tuple, or a `Vec`, or many other things): ```rust fn try_get_key() -> Result<impl Eq, Error> { Ok([1, 10, 2, 0].sort()) // should fire } fn is_on_purpose_really() -> Result<impl Eq, Error> { Ok(()) // should not fire } ``` It is debatable whether it should fire on this code: ```rust fn hmm() -> impl Eq { let x = (); x } ``` but the `let_unit_value` lint fires in that case anyway.

zackw · January 3, 2025, 2:57pm

It's not a realistic change from the current language, but if I'd been involved at the beginning of Rust I would have made { a; b; } return the value of b. The semicolon after b would have been required, and you would have had to write { a; b; (); } if you wanted the block to return unit.

... I kinda wonder if we could lint our way there over the course of a few editions.

FZs · January 3, 2025, 5:35pm

That would be annoying in cases where two branches of a control flow structure have tail expressions of different types, but you don't want to return them.

fn do_something() -> i32;
fn do_something_else() -> &'static str;

fn main() {
    if condition {
        do_something();
    } else {
        do_something_else();
    };
    ();
}

would not compile.

I also like the fact that tail expressions have a nice marker that they are different (the lack of a semicolon).

As a beginner ~~coming from JavaScript and having used eval()~~, one could reasonably expect the following to return 0 if the condition is not met^[1]:

{
    0;
    if condition {
        1;
    }
}

...while currently you can't write this:

{
    0
    if condition {
        1
    }
}

which makes it clear that Rust returns the value of the syntactically last expression, not the last expression that was actually evaluated, which makes reasoning about easier.

That was my experience anyway, to me the status quo of semicolon-lacking tail-returns feel very natural. But I can see that this reason might be more subjective than objective.

I remembered that wrong, JS eval doesn't return 0, it returns undefined there ↩︎

zackw · January 5, 2025, 7:12pm

FZs:

That would be annoying in cases where two branches of a control flow structure have tail expressions of different types, but you don't want to return them.
fn do_something() -> i32;
fn do_something_else() -> &'static str;

fn main() {
    if condition {
        do_something();
    } else {
        do_something_else();
    };
    ();
}
would not compile.

Yeah, this hypothetical altered language would need some additional sugar surrounding type unification -- specifically, if the value of any sort of conditional expression is not used then the arms of that expression would not be required to have the same type. Probably other such cases. I'd also still allow omission of the semicolon after any expression that ends with a block, when that expression is not part of a larger expression. So in this case you would write

fn main() {
    if condition {
        do_something();
    } else {
        do_something_else();
    }
    ();
}

and that would compile.

I see where someone might get confused here but I think their confusion would be cleared up as soon as they saw the error message. (E0317: if may be missing an else clause - same as you'd get now for leaving the semicolon off the 1 but not off the 0).

CAD97 · January 5, 2025, 7:54pm

Such a design probably also wants to differentiate between fn f() { (produces "nothing") and fn f() -> () { (produces ()) — while producing a useful value is common, especially in more functional/pure style API design, subroutines run slowly for their side effect aren't uncommon, and writing (); to indicate that yes, this function doesn't return anything would get a bit annoying for functions that don't return anything (as opposed to those which return () deliberately).

While it doesn't generalize to compound types which contain (), this does suggest a fairly simple style rule for clippy to push users toward: functions which are declared without -> $ReturnType should end with an empty expression (or expression-with-block which itself ends with such an expression) (i.e. with a semicolon), and functions with a syntactical return type shouldn't be allowed to end with an implicit () (i.e. an empty expression as prior or a function call which is declared without -> $ReturnType). This will, as specified, make a lint-only distinction between the functions which produce "nothing" versus producing () deliberately as a value, based on syntax, while still keeping most semantic benefit of unifying the two.

Topic		Replies	Views
Help test impl Trait! announcements	36	10619	March 25, 2019
Rust book 2018 `impl Trait`part need fix documentation	4	882	March 25, 2019
[pre-pre-RFC] `impl trait` inside `impl`, treating trait impls as sort-of inherent items language design	10	1522	June 26, 2021
`#[fundamental]` impl Trait for Type language design	8	704	October 21, 2023
Should this code be accepted? language design	6	1033	October 23, 2020

Rust should warn on impl Trait returning unit

Related topics