Pre-RFC: Catch data structures that are written but never read

Consider the following code:

fn main() {
    let mut m = std::collections::HashMap::new();
    m.insert("hello", "world");
}

This code compiles and runs with zero warnings.

By contrast, something like this:

fn main() {
    let mut m = 0;
    m += 1;
}

gives multiple useful warnings:

warning: variable `m` is assigned to, but never used
 --> src/main.rs:2:13
  |
2 |     let mut m = 0;
  |             ^
  |
  = note: `#[warn(unused_variables)]` on by default
  = note: consider using `_m` instead

warning: value assigned to `m` is never read
 --> src/main.rs:3:5
  |
3 |     m += 1;
  |     ^
  |
  = note: `#[warn(unused_assignments)]` on by default
  = help: maybe it is overwritten before being read?

The difference is, in the first case the compiler just sees that HashMap::insert accepts a &mut self, but doesn't know that it has no side effects other than modifying the data structure.

I'd love to have an attribute (something like #[write_only]) that could be added to a method, which specifies that the method modifies the data structure but doesn't have any other side effect, and shouldn't count semantically as "reading" the data structure. Then, the compiler could emit warnings for code that only calls those methods but doesn't actually do anything with the resulting structure.

15 Likes

Imagine Rust had an effects system tracking all side-effects. Then detecting this kind of situation would be trivial. /s

^^^^^ To all those people saying that keep claiming that tracking and restricting side-effects would be useless.

2 Likes

Do we need that? Seems like a hints system a la must_use is all that's needed.

The ā€œ/sā€ means, I was being sarcastic :wink:. Adding an effects system to Rust is very much nontrivial ā€“ if not impossible ā€“ and also most likely not desirable/desired anyway.

It might also be useful to have a complementary lint or attribute (smth like #[read_write]) which would give you warnings for anything unused but not write_only, so you can enable it to find all of your (supposed) side-effecting mutations. This would be helpful for locating all of the useful places to consider using the #[write_only] attribute.

This feels like #[must_use], in that it'd very, very often be what's actually wanted. It'd be great to find a way to reduce that annotation burden to handle most cases automatically.

As a first shot in the dark, could it just be a #[this_is_a_normal_datastructure] that gets put on HashMap, giving all &mut self methods on it this behaviour?

3 Likes

Here's a function wherein the return of insert is not "used" but none-the-less has side effects.

use std::rc::Rc;
use std::hash::Hash;
use std::collections::HashMap;

fn count_unique<I, T>(iter: I) -> usize
where
    T: Hash + PartialEq + Eq,
    I: IntoIterator<Item=T>
{
    let counter = Rc::new(());

    let mut hm = HashMap::new();
    for item in iter {
        hm.insert(item, counter.clone());
    }

    Rc::strong_count(&counter) - 1
}

fn main() {
    let v = vec![3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
    println!("{}", count_unique(v));
}

(Playground)

3 Likes

That won't work, because iter_mut/get_mut/index_mut exist, and allow you to read from the data structure. (As well as mutate it)

3 Likes

I know IntelliJ can do this for Java because Java has a common interface for nearly every collection. It's possible to abstract over Rust collections and do the same thing here I think, and just have the annotations/hardcodint on a few trait methods.

2 Likes

The problem in this example is not the HashMap, it's the counter.clone() call which increments the strong count. The HashMap it's just there to prevent the cloned Rcs from getting dropped in the loop (which would decrement the strong count again).

Rust could detect that iter_mut / get_mut / index_mut return a value borrowing self, and check if that value is used anywhere.

I think that using the #[must_use] attribute for this is most fitting:

fn foo(#[must_use] bar: &mut i32) {
    *bar = 42;
}

And now analysis is more complex or you are adding manual annotations. If it's the latter we may as well embrace it and not use #[this_is_a_normal_datastructure]

Who doesn't like manual annotations? We don't infer return values.

They do get dropped when there is duplicates. The mechanism of the HashMap upon insertion is being used, even though none of the returns of calls to HashMap methods are. Unless Drop::drop counts as a use, in which case the proposal would only apply to T: !Drop.

I thought of a couple other examples: You can use retain (which takes a closure and has no return; the closure can modify its captures), and you could also use a custom allocator to have some effects (of rough granularity).

For Vec you can do something like:

pub fn all_under_ten<I: UnwindSafe + IntoIterator<Item=usize>>(iter: I) -> bool {
    catch_unwind(|| {
        let mut v = vec![(); 10];
        for i in iter {
            v[i] = ();
        }
    }).is_ok()
}

In this case, it's true you are using the return from the IndexMut trait. But does this mean the following is also "used"?

let mut v = vec![0, 1, 2];
v[0] = 8;

The overall point is, without an actual full-blown effect system, there are going to be holes, false positives. And that could be okay for a lint, if the juice is still worth the squeeze; you could #[allow(unused)] the oddities. But it's something to be aware of. It also means you can't use this for optimizations in the general case, as removing the "unused" data structures would change program behavior.

(I'm not familiar enough with the compiler to know how much squeeze it would take to get the juice.)

Lack of side effects sounds very similar to const fn to me. I wonder if we could already start by implementing such a lint in that case? A lot of builder pattern data structures can be implemented with const fn.

This could generalize for free if we ever have complex enough const eval (const heap, maybe const drop) to apply it to HashMap::insert.