Clone by re-evaluation?

So an oddity I ran into just now is that nom has a method with the following signature

pub fn recognize<I, O, E, F>(parser: F) -> impl FnMut(I) -> IResult<I, I, E>

Which I want to use to produce a parser like so:

let digit_recogniser = recognize(many1(one_of("0123456789")));

However, in my parse tree, I need to recognise digits in several different places. No problem, I think, I'll just clone digit_recogniser... but I find I can't, because although I know that all of the inputs are 'static and cloneable, the impl Trait return value means the compiler loses track of that fact.

And I know that this expression is cloneable, because the following works just fine:

let make_digit_recogniser = || recognize(many1(one_of(digits)));
(make_digit_recogniser.clone(), make_digit_recogniser())

For most of my time writing Rust, I've been feeling that the borrow checker is merely enforcing rules that need to be adhered to anyway, but this is the first instance I've encountered where it's unambigiously limited - a similar construction in something like Haskell would work exactly as expected. I don't really know what a big-picture solution would look like, but surely there's something that can be done to make this more ergonomic?

I appreciate that these two code samples don't have identical semantics (any sort of construction logic is repeated, and the compiler doesn't prove these functions pure) and you might well conclude the issue that nom is not returning impl Clone in the first place. However, the function can only do that when the inputs themselves are Clone, and AFAIK there is no way to conditionally propagate that bound for an impl Trait. In that case, my question becomes, how do things need to change to allow the nom authors to write a method that produces a Clone output iff the inputs are Clone?

2 Likes

With the current features of impl Trait return types, there’s no good around requiring a separate combinator for clonability, e.g.

use nom::error::ParseError;
use nom::IResult;
use nom::Offset;
use nom::Parser;
use nom::Slice;
use std::ops::RangeTo;

pub fn recognize_cloneable<I: Clone + Offset + Slice<RangeTo<usize>>, O, E: ParseError<I>, F>(
    mut parser: F,
) -> impl FnMut(I) -> IResult<I, I, E> + Clone
where
    F: Parser<I, O, E> + Clone,
{
    move |input: I| {
        let i = input.clone();
        match parser.parse(i) {
            Ok((i, _)) => {
                let index = input.offset(&i);
                Ok((i, input.slice(..index)))
            }
            Err(e) => Err(e),
        }
    }
}

(you can easily define this yourself by the way, as shown above; to make the whole thing work, one would probably also need corresponding alternatives for many1 and one_of)

As far as I know, only auto traits gain, (also quite implicitly,) the ability to be (usably) conditionally implemented for a impl Fn… return type, depending on the trait implementations of other parameters they may have captured.

As far as I’m aware, a way to specify conditional trait bounds is possibly desirable anyways, as it would help with traits. While for concrete functions, impl Trait return types, e.g. impl Future (also often written more concisely with async fn) do get implemented automatically and implicitly; as soon as you want to put async fn into a trait, you would need some way to manually say “hey, implementors of this trait, please ensure your future is Send if the parameter is”… or perhaps even less so than necessarily requiring this for all implementors, one might want to ask if for specific implementors, as a stricter bound. (I will skip on going into this any deeper or coming up with code and pseudo-code… back to the actual example at hand…)

For impl trait, I could imagine something like a way to specify a conditional implementation: let’s say using a => operator (for logical implication). Then the signature could specify

pub fn recognize<I: Clone + Offset + Slice<RangeTo<usize>>, O, E: ParseError<I>, F>(
    mut parser: F,
) -> impl FnMut(I) -> IResult<I, I, E> + (F: Clone => Clone)
where
    F: Parser<I, O, E>

(yeah… this syntax maybe isn’t optimal…) which would give rise to the knowledge of the opaque return type of recognize, let’s call it recognize::<I, O, E, F>::Output, that it offers somewhat of an

impl<I, O, E, F> Clone for recognize::<I, O, E, F>::Output
    where F: Clone;

Reviewing syntax… (any I know syntax is not super important…) last time I was annoyed that for<'a> Trait<'a> bounds don’t have where clauses, I thought – on the topic of where to put the where clause – I thought maybe just inside of the for, like for<'a, where T: 'a> Trait<'a>. If so, then the implication syntax could be a special case of this syntax:

pub fn recognize<I: Clone + Offset + Slice<RangeTo<usize>>, O, E: ParseError<I>, F>(
    mut parser: F,
) -> impl FnMut(I) -> IResult<I, I, E> + for<where F: Clone> Clone
where
    F: Parser<I, O, E>

Two more notes…

The problem here has nothing to do with borrow checking in particular… it’s just checking trait implementations, and specifically also about opaque (impl Trait) return types.

Note that, as you can see from the way recognize is implemented (and it looks like it’s the same for many1 and one_of), that – at least for these functions in particular – the outer call does actually nothing, so calling recognize more often is not really a thing one needs to worry about.

This reminds me a bit of cases where an async fn foo(&self, arg: Foo) is supposed to be spawned, and you cannot just to spawn(x.foo(v)) because that wouldn’t meet the 'static bound, so instead you re-pack the future to influence its type’s properties a little bit (in this case, making it capture x by-value and becoming static) via spawn(async move { x.foo(v).await }.

I think a similar approach may work here, too; if you want a directly clonable Parser, then you just write

let digit_recogniser = |i| recognize(many1(one_of(digits)))(i);

and (as far as I can tell) this should not actually have any overhead. (I don’t know if perhaps other nom combinators would make this approach a bad idea – I’m not actually too familiar with the crate.)

Edit: Actually… nevermind, this approach only works at all if digits is Copy, or if you can afford doing digits.clone() for every input, by adding a .clone() call in there.

Sorry if I was unclear, I know there's ways to make nom do what I want, either with my own recognise combinator or whatever, I'm more bringing this up as an example of a general principle. Intuitively, the code I posted looks like it ought to work, so what I'm really talking about is what principles have to change to allow it to work as you'd expect.

Note that, as you can see from the way recognize is implemented (and it looks like it’s the same for many1 and one_of), that – at least for these functions in particular – the outer call does actually nothing, so calling recognize more often is not really a thing one needs to worry about.

I know I don't need to worry about it in practice, what I was thinking about was akin to the transform that promotes constants to 'static so you can say let x = &5 and the reference isn't dangling. It would have been nice if rust could do something similar where non-Copy/Clone values that are used after moving could be "cloned" by re-evaluating the expression that defines them, basically automatically substituting the expression into each use-site. It would work in this particular case, but being able to do it in general would require determining which expressions are pure, and AFAIK nothing in the language makes a determination like that ATM.

(I’m not 100% certain whether you are implying/suggesting this in the first place, but…)

I would’t like the idea of allowing this to happen automatically/implicitly to this case. After all, the idea of impl Trait is to hide implementation details, so even if the compiler could determine pureness and/or clonability automatically, if it isn’t explicitly specified in the signature, then future versions of nom are (in principle) allowed to introduce impureness, or a capture of some none-clonable value, respectively. (Leakage of auto traits through impl Trait goes against this principle a little, but at least that exception has a relatively a well-defined scope [assuming third-party auto traits (with this same power) never become a thing].)

2 Likes

Yeah, the impl Trait construct has some unfortunate restrictions. Arguably, that's the point of writing return position impl Trait in the first place - to hide irrelevant details and avoid overpromising in API. Unfortunately, even if you'd like to make a stronger promise, there is no way to do it.

There is no way to say "the return type is Clone/Debug/Send if and only if these arguments are". Arguably, that's a problem similar to autotrait bounds in async traits, where there is no good way to specify that the returned futures are Send/Sync.

Specifically in the case of Nom, the workaround would have been to avoid impl Trait syntax in public API, and instead use explicit structures, like the combinators on Iterator or Stream types. In that case, the library could properly conditionally derive all necessary traits on that structure, just like it works for iterators.

It does make the code quite verbose, unfortunately.

There often is, and it's pretty prevalent for certain paterns like iterator combinators: you return an explicit parameterized type with bound-based implementations. But it's not available on stable for the Fn traits, as implementing those is unstable.

(Even when available it's a lot of boilerplate.)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.