Pre-RFC - `Cow::map` function

I couldn't find any prior discussion of this feature. Here's a playground that has this implemented with an extension trait.

Essentially, I'd like to add a Cow::map function that looks a little like this:

fn map<S: ToOwned + ?Sized>(self, f: impl for<'b> FnOnce(&'b T) -> Cow<'b, S>) -> Cow<'a, S>
where
    T::Owned: AsRef<T>,
{
    match self {
        Cow::Borrowed(borrowed) => f(borrowed),
        Cow::Owned(owned) => {
            let result = f(owned.as_ref());
            Cow::Owned(S::to_owned(&result))
        }
    }
}

(I haven't hashed out the exact trait bounds, but hopefully this is Pretty Close).

It would be used something like this:

REGEX_1.replace(s, "A").map(|s| REGEX_2.replace(s, "B"))

where Regex::replace returns a Cow<'_, str>.

In the case where self is already owned, this maps to another owned Cow. However, in the case where self is borrowed, this simply returns the result of f, allowing for efficiently calling many Cow-returning operations in a row.

Motivation

I had a string that I wanted to apply two regex substitutions to. If I was just doing a single replace call, it would be fine:

fn replace_once(s: &str) -> Cow<'_, str> {
    REGEX_1.replace(s, "A")
}

No issue here. The trouble comes when I want to add a second .replace naively:

fn replace_twice(s: &str) -> Cow<'_, str> {
    let temp = REGEX_1.replace(s, "A");
    REGEX_2.replace(&temp, "B")  // error, referencing temp, which is dropped
}

This doesn't compile, since the return value potentially references the owned string allocated by the first replace, which is dropped after the function returns.

A trivial solution is to make this function return a String, but that seemed unsatisfactory to me. In my specific use case, the substitution only happened very rarely, so ideally I could just return the reference to the string passed in.

Instead, I wrote this extension trait in the playground, and have found it useful for this case.

There is some prior art here, in particular, this is a fairly standard monadic operation, comparable to Result::map (Borrowed is comparable to Ok, Owned is comparable to Err) and Option::map. The code for a fully generic version is also somewhat tricky (has HRTB and quite a lengthy trait/function signature), which means many users will "just allocate here", when it's not really necessary. In my codebase, there were many similar snippets when the author did in fact just make the function return a String, which is unsatisfying.

One of the things I love about Rust is the way it provides safe and ergonomic abstractions that allow me to avoid allocations, without feeling like I'm jumping through hoops. IMO, this function fits with that philosophy.

There are, of course, reasons not to include such a function in std:

  • it increases maintenance burden
  • it's a somewhat niche operation
  • potentially other reasons I'm not aware of?

Though, since I'm writing this, my feeling ATM is that the benefits outweigh the cost. What are others' opinions? Does this seem like something that would be worth having in std? If so I'd be happy to work on a PR or RFC (I'm not sure if this is a significant enough change to need an RFC, my guess is no? I don't know what the "rules" are on this, just an impression I get by skimming the RFC book).

Thanks :grin:

2 Likes

FYI, you can abbreviate the HRTB using lifetime elision rules for Fn.. trait bounds so

impl for<'b> FnOnce(&'b T) -> Cow<'b, S>)

becomes

impl FnOnce(&T) -> Cow<'_, S>)
1 Like

The is no point in adding something like that. Cow<'_, T> dereferences to &T, so you can just do f(&*cow) with the same effect.

It also sounds quite inefficient, since the owned variant is always transformed via a reference, which means that even if S::Owned could be constructed from T::Owned via a move, we would still need to do a clone.

So the proper signature of the method map should accept two functions: f: FnOnce(&T) -> &S and g: FnOnce(T::Owned) -> S::Owned. At this point you would be better served by writing a simple match expression. You would also have no issues with moves, async and early returns.

By the way, why are you calling it map? map is usually an application of the functor, which means that its signature should be something like FnOnce(T) -> S (but this won't work for unsized types). FnOnce(&T) -> Cow<'_, S> certainly feels wrong, by analogy with Option and Result this should be called and_then or or_else, or something like that. Certainly not just map.

3 Likes

even if S::Owned could be constructed from T::Owned via a move, we would still need to do a clone.

I hadn't considered this. Initially I had only considered Cow<str>. What about requiring S: From<T>, that way there's an opportunity for a cheaper conversion? No idea if this is good API design, but it seems to potentially solve that issue?

By the way, why are you calling it map ?

You're right, looks like I got the names mixed up :man_facepalming: and_then seems like a more consistent name.

so you can just do f(&*cow) with the same effect.

I'm not sure I understand this. :confused:

(Explicitly expressing no opinion on the proposed API.)

The current path for something like this would probably be

  1. Get some feedback on irlo
  2. File an API Change Proposal.
1 Like

This is deref coercion. If a value of type &T is provided, but &U is expected, and T: Deref<Target=U>, then the compiler is free to change x: &T into x.deref(). Similarly, *x may be changed into *Deref::deref(&x).

Actually, it should be just f(&cow), the dereference was redundant.

You're well familiar with deref coercion even if you don't know it by name. It's what allows to pass x: &String into a function fn f(_: &str) as just f(x). Since String: Deref<Target=str>, this desugars into f(x.deref()).

1 Like

You don't need the AsRef bound as ToOwned already requires a reverse Borrow bound, and using Cow::Owned(result.into_owned()) will be more efficient in the case where f returns an Owned variant (playground).

It does seem bad that if the first transform returns an owned variant, and the subsequent transforms return a borrowed one, that you need to keep cloning for them all. Especially in this usecase where a borrowed variant means there was no modification done.

2 Likes

The map/and_then method as proposed allocates for each operation after any match is found. This gets worse as more map operations are added.

REGEX_1.replace(s, "A")
    .map(|s| REGEX_2.replace(s, "B"))
    .map(|s| REGEX_3.replace(s, "C"))

If REGEX_1 matches, it creates an owned string. Even when subsequent regexes do not match, each map clones the owned string.

The efficient form for this would return the most recent owned value or borrow the original value if no owned value is created:

fn replace_thrice(s: &str) -> Cow<'_, str> {
    let temp1 = REGEX_1.replace(s, "A");
    let temp2 = REGEX_2.replace(temp1.as_ref(), "B");
    let temp3 = REGEX_3.replace(temp2.as_ref(), "C");
    if let Cow::Owned(_) = &temp3 {
        temp3
    } else if let Cow::Owned(_) = &temp2 {
        temp2
    } else if let Cow::Owned(_) = &temp1 {
       temp1
    } else {
        Cow::Borrowed(s)
   }
}

To me this suggests that the solution for this specific problem would be an alternative Regex::replace method that takes a Cow and returns it unchanged if no replacement is done.

3 Likes

Indeed (and your example code can be simplified to this:)

fn replace_thrice(s: &str) -> Cow<'_, str> {
    let mut result = REGEX_1.replace(s, "A");
    if let Cow::Owned(temp) = REGEX_2.replace(temp.as_ref(), "B") {
        result = Cow::Owned(temp);
    }
    if let Cow::Owned(temp) = REGEX_3.replace(temp.as_ref(), "C") {
        result = Cow::Owned(temp);
    }
    temp
}

So maybe the ideal API is a convenience function for the above. As you say, this convenience function should be provided by regex rather than by Cow (providing it in Cow could be confusing because it's only useful given the Regex::replace-specific assumption that if it returns Cow::Borrowed, it represents the original value unchanged).

So the API would be either

pub fn replace_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, rep: R) -> Cow<'t, str> {
    if let Cow::Owned(result) = self.replace(text.as_ref(), rep) {
        Cow::Owned(result)
    }
    else {
        text
    }
}

or

pub fn replace_in_place<'t, R: Replacer>(&self, text: &mut Cow<'t, str>, rep: R) -> Cow<'t, str> {
    if let Cow::Owned(result) = self.replace(text.as_ref(), rep) {
        *text = Cow::Owned(result)
    }
}

which seems simple enough to propose for addition to regex

1 Like