Should Fn(Mut) arguments be always accepted by move/ownership?

Let's take a look at Iterator::filter. It doesn't store the predicate argument, but it needs to mutably called it, so it should take it by mutable reference, like this: predicate: &mut P, but instead, it takes it by movement of ownership!

EDIT: filter actually needs to take ownership here, because it's lazy and you can return the filtered iterator to elsewhere without consuming it immidietely, so you actually need to be able to store the predicate. A better example to illustrate my point would be Vec::retain, which is eager, and doesn't need to store the predicate, so it shouldn't need to take it by value.

This happens to work, because &mut FnMut implements FnMut with the same generics, but it seems non-idiomatic. So what could be the possible reasons?

Possible reason 1: Performance

I'm no expert on how compiler optimizations work, but let's imagine a simple call like this: .filter(|item| {*item >= 0}). Since the FnMut doesn't close over any variables, it is a normal lambda function (not a closure) with size 0, so moving it into the filter function will be a no-op.

But passing it as a &mut reference would require to pass a reference to an empty unit struct, which might get optimized to no-op by the compiler if it is smart enough, but it might also not (AFAIK).

Possible reason 2: Ergonomics

  1. You don't have to bother with the &mut borrow, writing .filter(|item| { ... }) is slightly more convenient than writing .filter(&mut |item| { ... }).
  2. You don't have to annotate the arguments, because the .filter(&mut |item| { ... }) syntax doesn't actually work. The compiler cannot infer the types, so you have to manually annotate it (e.g. .filter(&mut |item: *i32| { ... })) with the correct amount of *s.

Possible reason 3: Lifetime annotation

Sometimes you need a lambda that returns a value that is borrowed from one of it's arguments. And unlike in point 2.2, there are no manual annotations that will save you (that I know of).

This GitHub issue goes into more detail on that, including some proposed syntax like <'a>|text: &'a str| -> &'a str { ... } for example, as well as a "workaround hack" using a cast function:

fn cast<T>(x: T) -> T
    where T: for<'a> Fn(&'a str) -> &'a str
{
    x
}

Some examples

Notice that this "hack" is only needed when you want to declare you lambda and use it later. If you use it straight away, you don't need it. But if you want to use it strait await as a reference, points 2.2 and 3 combine and force to use this cast function: Example that takes Fn

Notice that the last line only works because process_strings function takes Fn, and not &Fn. Example that takes &Fn.

Back to idiomatic design

The idiomatic way (at least as I understand it) to take arguments is:

  • If you only read from it and don't store it, take a shared reference to it (e.g. &T).
  • If you need to write to it and don't store it, take an exclusive reference to it (e.g. &mut T).
  • If you store it, take ownership of it (e.g. T).

But that doesn't seem to apply to Fn / FnMut for the reasons listed above. For these 2 traits, the answer seems to be "Always take Fn / FnMut by ownership, even when you don't store them", which seems at odds with the rest of Rust's ownership model design.

But what about other / custom traits?

The real question is: how will this "creep" into other traits? For example, &T implements Display if T implements Display. That makes sense, but I have seen someone argue that you should take Display values by ownership instead of references (even when you are not storing them), because "It will work for references too".

Unfortunately, I cannot find the post anymore, but I feel that the design of Fn and FnMut actually encourages this kind of thinking.

Possible solutions

Adding a way to annotate lifetimes in lambdas/closures explicitly (e. g. <'a>|text: &'a str| -> &'a str { ... }) would be great, not only for this reason.

Making argument inference work when the lambda / closure is behind a reference (e. g. .filter(&mut |item| { ... })) would help with this problem, but not with much else probably.

With these 2 things, we could properly accept &Fn or &mut FnMut in functions by references without putting undue restrictions on the caller (at worst an additional & / &mut would be needed).

However, existing function would need to stay the same (the braking change wouldn't be worth it), so then you would have old functions taking Fn / FnMut but ownership, and new functions taking them by references, leading to an even bigger mess than we have now.

The only "real" solution seems to be to add a 4th line to the "idiomatic ways to take arguments" list:

  • If it is a Fn / FnMut, take it by ownership anyway.

There is a similar API guideline for std::io::{Read, Write}. It makes sense to me as well that you should take an impl Display by value rather than reference.

1 Like

There is a similar API guideline for std::io::{Read, Write}.

From reading that paragraph, the problem it is addressing is caused precisely because the functions for reading/writing take Read/Write by ownership and not by a mutable reference, as they should (as far as I understand).

If they take the values by ownership even thought they are not storing them, it is not only non-idiomatic, but also confusing for new users who will just lose ownership of their value unexpectedly, exactly as the paragraph describes. So why not just take those arguments by a mutable reference?

It makes sense to me as well that you should take an impl Display by value rather than reference.

But where does that end? Any "read-only" trait (accepts Self type only as a shared reference &self) should (ideally) have a blanked implementation for a shared reference to a struct implementing that trait. So does that mean you should always take arguments that are generic over these traits by ownership?

That seems really counter-intuitive. Same with "read-write" trait (accepts Self type as a shared reference &self or exclusive reference &mut self, but never by ownership self) that should have (again, ideally) a blanket implementation for &mut references. Do you always accept those by taking ownership even if you don't store them?

It would be really confusing to look at a function signature and understand what it does. Will this function store this value in one of it's mutable arguments? Or is taking an ownership of it because it just happens to work with references? Expressing user logic at the type level is one of the biggest strengths of Rust IMO, and just taking ownership when you don't need it just because it works with references anyway is not a direction I want to see libraries take, because it will be harder to understand what is going on.

Note that in the filter (and generally iterators) examples taking closures by ownership allows you to return ownership of that iterator somewhere.

6 Likes

It can't if the function is generic over the value (without internally boxing it). It would also have to note that in the lifetimes by taking something like impl FnMut() + 'static (which would block you from passing in an &mut impl FnMut() to a local).

I do not know what do you mean by boxing it. I made a little demo with 2 functions:

fn foo<T: Display>(data: &mut Vec<T>, display: T)
fn bar<T: Display>(data: &mut Vec<T>, display: T)

One of them stores the display argument in the data vector. Can you tell me which one? Rust Playground

Of course, you would never want to write bar like that, you would use 2 independent types instead. But consider this example instead. Again, from their signature, can you tell me whether foo or boo will store the item? And there is no boxing going on in either example.

I will try the + 'static trick to see if it get me anywhere, but it seems like a gimmick. If I want to express that the function will not store the value, why not just take a reference (that's what references are for) instead of this + 'static weirdness?

Those examples aren't very reminiscent of how this sort of anonymous FnMut or Display is normally used. If you look at Vec::retain or a non-polymorphic DebugStruct::field instead:

impl<T> Vec<T> {
  fn retain(&mut self, f: impl FnMut(&T) -> bool);
}

impl DebugStruct {
  fn field(&mut self, name: &str, value: impl Debug) -> &mut Self;
}

You can see that the other &mut's don't know what the anonymous type is, so they can't be storing it directly in their fields. If they wanted to they would need to Box the value into a Box<dyn FnMut(&T) -> bool> or Box<dyn Debug>, which also requires the value to outlive the function call needing either a lifetime to relate to the struct or 'static:

impl<'a> DebugStruct<'a> {
  fn field(&mut self, name: &str, value: impl Debug + 'a) -> &mut Self;
}

The two main advantages I see with this style is:

  • reducing optimizer work to eliminate references
  • simpler to read code
fn should_keep(item: &T) -> bool { ... }

vec.retain(&mut should_keep);
vec.retain(&mut |item| item.is_good());
f.debug_struct("BrotliEncoder").field("compress", &"no debug").finish()
// https://github.com/Nemo157/async-compression/blob/9d6c4158911c64a6a793944e77222d4120355013/src/codec/brotli/encoder.rs#L105

vec.retain(should_keep);
vec.retain(|item| item.is_good());
f.debug_struct("BrotliEncoder").field("compress", "no debug").finish()

In the first examples the compiler has to generate all the code for calling the two &mut <anonymous type>s, including passing the references through, while the second examples goes directly to calling on the ZST tokens for the types. The user also has to make a reference even though they don't want these values afterwards, they're single use temporaries where it's simpler to hand off ownership.

By taking impl Trait directly and relying on the forwarding impls you allow full flexibility to the user, if they have something you can only borrow then they can borrow it to you, if they have some transient or cheaply constructed value then they can just hand off ownership and don't have to worry about owning it themselves.

1 Like

Yes. But I have to look at other fields and their types and possibly for their lifetime annotations as well just to tell if the value is getting stored or not. Simply looking at one single argument (is it a reference or not?) is so much simpler than looking at all these things.

Yes, you managed to recap my points 1 and 2.1 from the original post, so we agree on that. Where we disagree is whether or not these benefits outweigh the downsides.

I will admit that performance isn't the biggest priority for me, but I understand that it can be to others. If the compiler really cannot optimize those references away, then yes, taking impl SomeTrait directly can be faster than taking &impl SomeTrait.

As to the second point, I do not think that adding a & or &mut to the argument is a burden, even when you are constructing the argument on the spot. If anything, it helps you (at least it helped me when I had to do it) to learn how references work. The value is allocated on the stack (or in the program's source code if it's a constant), and the reference to it is passed to the function.

But allowing both owned values and references to be passed just creates this "magic" feeling of it working both ways, which just obfuscates what is going on. I cannot tell you how many time I have been confused about what the actual type is in an iterator filter because of all the magic dereferencing going on.

Having to explicitly write * or & when needed can be annoying, but it is also more explicit, and I personally think that is better, because it allows the reader to more easily understand what is actually going on.

Finally, can you provide an answer to one question I had about the paragraph about Read and Write?

This is true for references too, the equivalent annotations for your example if they took references would be (full playground):

fn foo<'a, T: Display + 'a>(data: &mut Vec<&'a T>, display: &'a T)
fn bar<'a, T: Display + 'a>(data: &mut Vec<&'a T>, display: &'a T)

I don't think it's much of an issue in practice. You can write unexpected code with it, but generally it's obvious from the API naming/structure without looking into the details. Code that takes an impl Foo and internally converts that to a Box<dyn Foo> is rare and will probably document that (normally it would just take the Box<dyn Foo> directly).


I consider taking values that implement traits with forwarding impls directly to be idiomatic, and something that new users will have to learn about. Personally I consider the readability win in closure heavy code worth it, and it reduces the difference between functions like Iterator::map that return a wrapped type around the closure and functions like Vec::retain that call it directly.

Every time I try passing a closure to a function and get

    = note: expected mutable reference `&mut dyn for<'r, 's> FnMut(&'r str, &'s mut ProcessLinesActions)`
                         found closure `[closure@src/docbuilder/rustwide_builder.rs:558:28: 558:43]`

or even worse with this example it actually breaks type inference:

error[E0282]: type annotations needed
558 |             .process_lines(|line, _| {
    |                             ^^^^
help: consider giving this closure parameter an explicit type

I'm annoyed that I now have to remember this function is special and needs an &mut dyn FnMut instead of an impl FnMut like the majority of functions.

2 Likes

Sometimes I question the API guideline of taking impl Read instead of &mut impl Read (and similar for Fn(Mut) and etc), though I do make an effort to adhere to it. Consider:

use std::io;

pub fn read_file(mut r: impl io::Read) -> io::Result<Vec<Vec<Vec<u8>>>> {
    (0..10).map(|_| {
        (0..20).map(|_| {
            let mut buf = vec![0; 30];
            r.read_exact(&mut buf)?;
            Ok::<_, io::Error>(buf)
        }).collect::<io::Result<_>>()
    }).collect()
}

Suppose you want to refactor this by splitting out some functions. In order to call the functions multiple times in these loops, you're forced to take another borrow:

pub fn read_file(mut r: impl io::Read) -> io::Result<Vec<Vec<Vec<u8>>>> {
    (0..10).map(|_| read_section(&mut r)).collect()
}

fn read_section(mut r: impl io::Read) -> io::Result<Vec<Vec<u8>>> {
    (0..20).map(|_| read_line(&mut r)).collect()
}

fn read_line(mut r: impl io::Read) -> io::Result<Vec<u8>> {
    let mut buf = vec![0; 30];
    r.read_exact(&mut buf)?;
    Ok::<_, io::Error>(buf)
}

If read_file receives R here, then read_line gets monomorphized with &mut &mut R. In my own code, I find it easier to work with &mut R in the majority of the implementation, and simply write an impl Read wrapper at the public API boundary:

pub fn read_file(mut r: impl io::Read) -> io::Result<Vec<Vec<Vec<u8>>>> {
    _read_file(&mut r)
}

fn _read_file<R: io::Read + ?Sized>(r: &mut R) -> io::Result<Vec<Vec<Vec<u8>>>> {
    (0..10).map(|_| read_section(r)).collect()
}

fn read_section<R: io::Read + ?Sized>(r: &mut R) -> io::Result<Vec<Vec<u8>>> {
    (0..20).map(|_| read_line(r)).collect()
}

fn read_line<R: io::Read + ?Sized>(r: &mut R) -> io::Result<Vec<u8>> {
    let mut buf = vec![0; 30];
    r.read_exact(&mut buf)?;
    Ok::<_, io::Error>(buf)
}

In a similar vein, it's outright impossible for a recursive function to take impl FnMut and call itself multiple times, which is what I believe sometimes motivates unseasoned rust developers to define public API functions that take &mut impl FnMut. The same technique as above works for this as well:

pub struct Tree {
    pub value: i32,
    pub left: Option<Box<Tree>>,
    pub right: Option<Box<Tree>>,
}

impl Tree {
    /// Call a function on each value.
    pub fn walk(&self, mut visit: impl FnMut(i32)) {
        self._walk(&mut visit)
    }

    fn _walk(&self, visit: &mut impl FnMut(i32)) {
        if let Some(child) = &self.left {
            child._walk(visit)
        }

        visit(self.value);

        if let Some(child) = &self.right {
            child._walk(visit)
        }
    }
}
6 Likes

I would say it's ergonomics. Requiring someone to say

.retain(&mut |x| x > 0)

is just horrible.

The implementation can always borrow internally if it needs to, and people can still pass &muts to it if needed, as you pointed out.

FWIW, we have such an example in the docs: https://doc.rust-lang.org/nightly/std/ops/enum.ControlFlow.html#examples

2 Likes

There's a similar mismatch with foo<P: AsRef<Path>>(path: P) and friends being the norm in the ecosystem, instead of foo<P: AsRef<Path> + ?Sized>(path: &P). These functions can never call <Path as AsRef<Path>>::as_ref with a &Path and instead call <&Path as AsRef<Path>>::as_ref with a &&Path, for example.

They can take ownership of values instead, but the usual pattern is to just convert to a &Path and drop it, in which case this opens the door to unnecessary cloning [1]. On the flip side, if you don't need your owned value any more or it is Copy, you don't have to add a & at the callsite.

The standard library used to use the ?Sized pattern, but was changed to the form that is more common today. The reasons given were that

for libraries in general it's more ergonomic to not deal with ?Sized at all and simply require an argument P instead of &P

and that ?Sized bounds are "unsightly".

Personally I don't think we should balk at ?Sized bounds and feel it was a net loss.

But I also admit it doesn't really matter much in practice, and consider it a tempest in a teacup level peeve I happen to have. AsRef is all about ergonomics here, or you would just take a non-generic &Path.


  1. I think the compiler is better about not suggesting this now though ↩ī¸Ž

1 Like

I'll note that this can be a very handy technique for structs, too. It's often tidier to have

struct Foo<T> {
    a: T,
    b: usize,
    c: String,
}

that you happen to always use as Foo<&T> than to have

struct Foo<'a, T: ?Sized> {
    a: &'a T,
    b: usize,
    c: String,
}

and be forced to mention that lifetime all over the place.

4 Likes

It's worth noting another thing: if you take impl Fn(), then I can give you &dyn Fn(). If you take &impl Fn(), then I have to give &&dyn Fn() to use dynamic dispatch.

You can "solve" this by taking &(impl ?Sized + Fn()), but now you're adding a lot of noise to do something already covered by the reference blanket implementation.

It's a delicate line to balance. I agree that if you don't need the owned value, it's "more correct" to take by reference, and that most std APIs taking impl AsRef should have taken &impl AsRef instead.

But on the other hand, removing lifetimes from signatures (covering them inside the trait implementations) is one of the few ways we can simplify signatures and avoid lifetime overload in APIs. It's all too easy to end up with an opaque API with too many lifetimes to keep track of by chasing the perfect abstraction when a much easier to understand API with fewer lifetimes is possible.

API design is hard because it's a human problem. We're all learning and improving as we do it more, but there's no objectively correct answer, and the community consensus will shift over time as to what designs are ideal.

In cases like this, it's usually just better to match the conventions used by stable std APIs, because consistency of only having to learn one way of doing things is usually better than learning both the old way and the newer "better" way of doing things.

1 Like

I cannot speak for (other) unseasoned rust developers, but I for once would never think about taking &mut impl FnMut instead of impl FnMut because some other recursive function somewhere has to take &mut impl FnMut.

Unless you mean in those recursive functions specifically. In that case, There, having the "public" function take impl FnMut and then call the "real" private function with a &mut impl FnMut can actually cause the inner type to be &mut &mut FnMut if the user only has a &mut Fn to begin with.

And I thought taking FnMut directly (by ownership) was supposed to reduce then number of extra references, not increase them! You could, of course:

  1. Rely that the compiler will optimize it to a single &mut FnMut reference.
  2. Say that people pass an anonymous lambda/closure most of the time.

In the end, I think consistency should win. It's not even (at least for me) about the ergonomics of having to write an extra &mut in front of lambdas/closures, it's about not having to write it some times and having to do so other times.