Method-cascading and pipe-forward operators proposal


#40

Thank you. Your criticism was wague and it inspired me to rethink current design.
I understand your position, however still exists some points I don’t agree:

I think that there always will be programmers that feels itself unproductive and uncomfortable (like me) with reading or writing such code:

    let open_options = OpenOptions {
        write: false,
        read: true,
        ..Default::default()
    };
    let file = open_options.open("location")?;
    let mut collection = file.to_some_collection();
    collection.sort();
    return collection;

My thoughts about it:

  • Syntax for OpenOptions is worst, since it cumbersome and additionaly requires Default implementation; you also creates two OpenOptions instances.
  • Every binding here just holds temporary state, and don’t serves any documenting purposes.
  • That code does exactly the same that its “fluent” alternative does and nothing more.
  • It sacrifices in ergonomics and gains nothing (except simplified debugging) from that.

Is it only your own preference?
Are there “technical” reasons to write code this way?
If that’s only because “fluent interfaces” are redundant - I agree (see my next response):

I’m too.
But “fluent interfaces” are already in core language, everywhere. They are convenient way to do things.
And there are reasons for hate them:

  • Two kinds of APIs: that allows them and that don’t allows them and it’s not fun when you have both in codebase
  • It’s not clear which is better when you writes your own API
  • Incompatibility with functions that don’t returns self - that’s annoying
  • Returning self feels like simple hack when proper solution might exist
  • It’s additional boilerplate that defaces APIs and function signatures

And just because of that I’ve proposed this change


About method piping: it was scary and now it’s also reworked from scratch


#41

But problem is that they must be implemented around some data structure - my task is to avoid that by providing common operators that may replace need in macros in some cases.
When EDSLs implemented this way - everybody should learn new language in language; macro is magic otherwise.

Also macros have poor user experience, completion and highlighting on them lags and I’m trying avoid them everywhere when possible.

Less code - less space for problems.


#42

But you realize that’s not how you’re supposed to use OpenOptions, right?


#44

In this case it would probably be better to tell me where it was unclear and I could clarify what I meant.

I feel sorry for them.

Implementing Default is literally #[derive(Default)] in 99.9% of the time. I don’t buy that it’s a burden. Creating two instances of the struct could be a valid concern, however the “rest” (..) pattern is trivially caught by a compiler optimization. (Even if it’s not hard-coded into the compiler as a rule at the MIR stage, for example, I’d still expect LLVM to eliminate redundant allocations and copies/moves. Most Default-implementing types don’t require more than a single stack allocation.)

The open_options binding could be thrown away altogether in the above (hypothetical) code, since a struct literal is a perfectly valid value to call a method on. The other binding does document that the collection is being mutated while it is being sorted. That’s exactly the kind of information I don’t want to lose or be hidden by magic syntax.

Excuse me? It was exactly my point that the “fluent” alternative does exactly same as this code, and thus the “fluent” style is not necessary, as it doesn’t add any value.

What exactly does it sacrifice in terms of ergonomics?

If it feels simple that’s because it’s simple. Simpler and more understandable than a second, special kind of function call. It’s an entirely “proper” solution, which works perfectly well and doesn’t require any additional language features. There’s nothing hackish about it.


#45

The proposed syntax doesn’t really seem intuitive at all, and discarding the result seems a dubious design

Maybe something like this could work:

hash_map.{
    .insert(a, b);
    if .len() == 3 {
        .insert(c, d);
    }
}

But overall doesn’t seem worth it: you can just do “let x = hash_map” and then only repeat “x” for each invocation:

{
    let x = &mut hash_map;
    x.insert(a, b);
    if x.len() == 3 {
        x.insert(c, d);
    }
}

or with match:

match hash_map {
    ref mut x => {
        x.insert(a, b);
        if x.len() == 3 {
            x.insert(c, d);
       }
   }
}

Could maybe consider some syntax sugar for that, like this:

with x = hash_map {
    x.insert(a, b);
    if x.len() == 3 {
        x.insert(c, d);
   }
}

but that also doesn’t seem worth it.


#46

I find I do this sort of thing when needed in, for example, Java or C#. I don’t see how any of the proposals is much of an improvement on this.


#47

I came up with yet another idea that might actually be somewhat reasonable.

Copy the “let x in” syntax from functional programming languages:

let x = &mut hash_map in {
    x.insert(a, b);
    if x.len() == 3 {
        x.insert(c, d);
    }
}
foo(let x = &mut hash_map in {x.insert(a, b); x.len()})

“in” is already a keyword in Rust, and this syntax is already established in Haskell and other languages.

Also it could be seen as a variant of “if let” for non-refutable patterns.

It doesn’t differ much syntactically from just putting the let inside the block though so might not be worth having two ways of doing the same thing.


#48

Builders also can be derived in such cases and probably optimized by compiler.
And code on them is more ergonomically simpler. If OpenOptions will be modified and construction will require builder implementation (e.g. build() - to perform some validation logic) - you must change it everywhere; that not happens when you uses builder initially.
Also jumping to builder implementation in IDE is also simpler that finding Default.

Right, it might be better

But my point was that “imperative” style is not necessary

More characters/keywords to read/type, code is not fluent

It’s “proper” solution because there is no better. It not always works and additionally requires programmer to write boilerplate that is simple enough to be replaced by operator - that’s hackish


#49

Citation needed.

I don’t follow this. If you use the rest-syntax with structs from the beginning and you don’t change to any other “pattern”, you can also make this argument.

There’s nothing imperative in constructing a value from two other values. In fact, it’s a pure operation. (Haskell programs do it all the time.)

Typing just doesn’t matter. And in terms of reading, it’s usually not the amount of code that matters, but the understandability. And 10-something characters less of magic is less understandable than a little bit more non-magic code.

There’s not a single piece of boilerplate in it that I could identify, just like a case when it “doesn’t work”.


#50

Exactly my thoughts.


#51

Second version

Completely different than previous, features:

  • Revealing of hidden mutations inside of method call chains
  • Flexible syntax for method cascading
  • Redesigned pipe-forwarding
  • EDSLs

1. Explicit mutations on method call chains

Purpose of that is to make regular functions call consistent with features that will be introduced in next sections, and to prevent temporary mutable bindings to stay accessible in whole scope.

It forces functions taking &mut self to prepend ~ when invoked.
Don’t confuse that with previous ~ - it’s different and doesn’t change flow but acts more like annotation on method call.

Syntax is:

  • mutable.~function() - to indicate possible mutation on mutable value

where mutable expresses mut binding or function call chain. That’s to not write it twice and the same convention is applied further in proposal.

Using ~ on functionts that mutates frees us from introducing temporary mut bindings inside of method call chain, because it already shows all relevant information:

  • Which functions does mutations and on what value
  • Where anonymous mutable bindings are introduced

I think that ~ is good sigil here, since it’s short, easy to remember and type, also is consistent with features that will be introduced in next sections, and looks like creased dash which brings some associations with mutation.
Yes, it’s another operator and programmers must learn it. However, reading TRPL is mandatory for every rustacean, so once operator will be defined and documented - nobody will be confused.

Code will look like this:

    let mut collection = get_collection(); // `mut` is still required here.
    collection.~sort();                    // `~` shows where we do mutation.
    return Type::new() // Anonymous mutable binding introduced here.
        .~mutate1()    // This function mutates return value of `new`.
        .~mutate2()    // And this mutates return value of `mutate1`.
        .transform();  // But this don't mutates anything.

Also, that’s breaking change.
Is it worth it - for discussion (but read whole proposal before disclaiming).

2. Side-effects on method call chain

That’s supposed to be alternative to method cascading.
It allows to apply batch of functions that takes self, &self or &mut self to value without breaking chain.

Syntax is:

  • value.>(function(,),) - to call one or multiple functions on value and ignore result of them
  • mutable.~>(function(,),) - if one or more of functions needs value to be bound as mut

where trailing comma means that other functions/arguments might be applied.

Let analyze all parts:

  1. ~ has the same meaning as in previous section - function mutates. Also nicely combines with > into ~> arrow.
  2. ~> and > neatly shows direction where value is passed.
  3. . is required to show that method call and (possible) (de)referencing occurs. It also fills space for proper alignment and keeps chain syntax consistent (I loathe how e.g. |> breaks chain visually).
  4. ( and ) shows that we have limited scope where we can operate on value.
  5. ( and ) are chosen over { and } to not confuse that syntax with expressions and to not add new symbols in call chain syntax.
  6. , separated functions - it’s to group method calls to not search through chain of .~>/.>

It implies that function takes self by reference or Clone is implemented for it, since you can’t continue chain or apply other side-effects if subject value was moved somewhere.

How it looks on practice:

    let hmap = HashMap::new().~> ( // Anonymous mut bining introduced by `.~>`.
        insert("key1", val1),      // `insert` is called on `mut HashMap`.
        insert("key2", val2),      // Also is called on `mut HashMap`.
    );                             // HashMap is returned from parentesis.
    let mut collection = get_collection(); // `mut` is required on binding.
    collection.~>(sort()); // `.~>` takes `mut collection` into `sort`.
        .iter()…           // We can continue chain on `mut collection`.
    return OpenOptions::new()
        .~>(write(true), read(true)) // Single-line representation.
        .open("location")?
        .to_some_collection()
        .~>(sort());                 // Single side-effect method called.
    let value = get_value()
        .> (action_1(),     // Value can't be taken as `&mut self` here.
            action_2())     // Proper aligning.

Don’t confuse it with various withers, since it don’t allows to run arbitrary expressions with implicit context.
Only methods chains - nothing more.

3. Calling external functions on value

That’s supposed to be alternative to pipe forwarding.
Here I will start with reasons that shows great need behind it:

  1. Functional programmers uses it extensively, modern languages have it in arsenal, it helps, and not providing it in Rust is rather restrictive.
  2. Dense usage of meaningless bindings, let, mut, ; in code, necessity of imperative style - here is reason why Rust is considered as harsh and verbose language, not abundance of operators.
  3. Side-effects on method call chain will be allowed to apply external functions on subject value. This is very important to keep code modular and to make EDSLs introduced in next section extensible.
  4. Declarative programming style is less verbose and more descriptive, thus safer - that’s why Rust should adopt and promote it.

Syntax is:

  • value.function(,in,) - to pass value into function. Use &in to pass by reference
  • mutable.~function(,&mut in,) - to pass mutable into function by mut reference
  • value.> (,side_effect(,in,),) - to pass value into side_effect. Use &in to pass by reference
  • mutable.~>(,side_effect(,&mut in,),) - to pass mutable into side_effect by mut reference

where trailing commas means that other arguments/functions might be added.

It don’t differ too much from regular function call.
But that’s not very important to know, which kind of function we call: associated or external, so visibility is sufficient (and it’s actually good when function has one argument).

in is choosed because it’s short, descriptive, already used as keyword, and has syntax highlighting.
There can be other placeholder, e.g. it, this, that - decision might be changed.

Examples:

    return collection.iter()
        .apply_mapping_combinators(in) // Iterator moved to `apply_mapping_…`
        .apply_logging_combinators(in) // Iterator moved to `apply_logging_…`
        .collect()
    return Type::builder()
        .~apply_common_settings(&mut in) // To mutate builder.
        .build()
    long_name_binding
        .borrowed(&in, &in, &in) // This is allowed for all types.
    long_name_binding
        .copied(in, in, in)      // This is allowed only if `Copy` is implemented.
    let text = String::new()              
        .~>(put_default_text(&mut in),
            ecranize_shell(&mut in))
        .> (debug(&in),
            send(&in));

Experimental

Support for macros:

    get_value()
        .start_chain()
        .>(println!("start: {}", &in)) // Macros treaten as regular functions.
        .continue_chain()
        .println!("continue: {}", &in);

Support for constructors:

    get_value()
        .take_enum(Some(in))
        .take_struct(Struct { x: in, y: { in_is_not_visible_here } });

4. Splitted chain

This section is most interesting in whole proposal and has best examples.
It’s idea is simply to allow to call functions further on side-effects results.

Usecases are:

  1. Error handling from external functions: mapping, unwrapping, using ?
  2. EDSLs or Kotlin-like type-safe builders: without returning self, lambdas, scoping issues, prepending ~ on each &mut self-taking function, implicitness, and additional boilerplate around

Examples:

    return PathBuf::from("base")
        .>(fs::create_dir_all(&in).unwrap())  // Here we actually splitted chain.
        .~push("filename")                    // Don't forget - it mutates.
        .File::open(in);
    let content = String::new()
        .~>(file.read_to_string(&mut in)?); // `?` is applied to read_to_string
    return Tree::root(0).~> (    // `top-level` started.
        branch(1),               // Added `branch1` to `top-level`.
        branch(1).~> (           // Other `branch1` added to `top-level`.
            branch(2),           // Added `branch2` to `branch1`.
            branch(2),           // ...
            branch(2).~> (
                branch(3),
                branch(3),
            ),
            branch(2),
            add_other_branches(in), // External function is called on branch
        )
    );
    let matches = App::new("My Super Program").~> ( // Prototype from clap's README
        version("1.0"),
        author("Kevin K. <kbknapp@gmail.com>"),
        about("Does awesome things"),
        arg("config").~> (
            short("c"),
            value_name("FILE"),
            help("Sets a custom config file"),
            takes_value(true),
        ),
        arg("INPUT").~> (
            help("Sets the input file to use"),
            required(true),
            index(1),
        ),
        subcommand("test").~> (
            about("controls testing features"),
            author("Someone E. <someone_else@other.com>"),
            arg("debug").~> (
               short("d"),
               help("print debug information verbosely"),
            ),
        ),
    ).get_matches();

Summary

  1. Don’t looks that bad
  2. Introduces three edit: two operators, however their usage is very intuitive
  3. Fixes some existed problems and reduces boilerplate
  4. Introduces breaking change
  5. Promotes different programming style

#52

It doesn’t address the issue of having to memorize, understand, distinguish between, and correctly use several different forms of method/functions calls, though. I don’t really want to see at least three (or four if we count the trailing closure syntax) different forms for a function call. It’s more distracting than clarifying.


#53

I love Haskell syntax, but I’m not sure it’s right for Rust. I do really like the boxing-unboxing TypeClasses like functors, monads et. al., and it would be fun to explore how they would apply to Rust, and whether new operators would help.

I also think that the changes to the compiler (the query pattern) would make it pretty easy to experiment with different syntax whilst still using the same typeck, borrowck etc. I think it would be really fun to try to create a ML-style syntax for Rust, not saying it would ever get endorsed by the rust lang devs tho :stuck_out_tongue:


#54

Side note: I love how anyone can come along and suggest changes to the language, and they will get taken seriously by the core team. Don’t get that with C#! :stuck_out_tongue:


#55

Side-side-note: It’s actually the lang team :wink:


#56

Ok, I found that some parts are really missing and updated second proposal, also added better examples.
Does that clarified something?


#57

I appreciate the effort you’re putting into this. That said I don’t like the proposed operators and here’s why:

  • This calling style is most useful for places where the builder pattern is used. However, the builder pattern can already be implemented with the tools that the language offers today. And it looks great!
  • The resulting code is only marginally more compact.
  • It’s unnecessarily complicated:
    • Three (!) new operators: ~, ~> and ->. Too much magic punctuation can be a real pain.
    • let bindings can already do this. The creation of a temporary binding with a short name is easy and the code is only slightly longer.
    • I imagine that people who are not using Rust everyday and are not familiar with every part of the language will groan when they encounter these operators. They’ll immediately get a feeling for what the code is probably doing, but they also see that there is some weird syntax going on which they need to learn to fully understand the code.
    • Status quo is better: Currently there are just function calls and even they have some hidden details (Deref) that one has to be aware of. This system introduces more details. You can only keep so much in your head while coding. I think keeping things nice and simple is better for beginners and experts.

#58

You provided great summary from point of view of person who already involved in Rust and knows all nuances.
I still feel that proposal is important because I had different experience.
There is answer to 3 quotes:

Rust is great language that is pleasure to work with.
Probably little misunderstanding or my idealistic style of thinking might influence that. But for me status quo syntax was magic and rather confusing when learning.
And here’s story about how I come to this proposal:

It begins with my first steps with Rust when I’ve just learn that all mutations should be explicit and compiler will take care of it. I said to myself:
“Awesome! But probably because of that Rust code is pretty verbose. Interesting, how looks builder pattern usage?”.
When I checked, I was surprised that it looking exactly the same as in other languages.
“Hmm… Probably some complex borrow-checking magic here… Or it’s cloned everytime. It’s better to accept it as it is for now”.

And that was fine, until… I’ve written method with single pretty iterator chain which turns into collect(), then I realized that it’ll help me with debugging if Vec returned from that method will be sorted.
So, I decided to temporarily add sort() and was surprised: “Why it don’t returns self? Now I must add let mut collection, collection.sort(), collection in different places and again remove all that. Never expected!”

I’ve started googling “why Rust sort don’t returns self” and found Github issue which explains that it’s to make mutations in method chains explicit.
That was fun enough and another question arrived: “So, if method returns self then mutation in it might be hidden?”
Well… I’ve convinced after experimenting, and felt deceived and disappointed.


I can deal with that thoughts, but might someone else experience the same?
If there will be better syntax it’ll be not confused 3 times before understanding and there will be some gratification from possibilities.

As you see intention was not only to make it more compact but also to make it explicit and less confusing/restrictive.
It’s more like writing ? instead of try!. Only marginally compact - it in many ways better experience, since you read/type actual code and not language items.

That’s true and that’s definitely reason to not add it, but also reason to experimenting further.

I’ve just realized that new operators count might be reduced to two: .-> was choosed because it looks “nice” and without other meaningful reasons.
It might be simplified to .> and that will be fine - when you need to ignore result of function you uses .> and when you need to ignore result function that mutates you uses .~>.
That’s more intuitive and there will be no problem with distinguishing between two forms.

@Centril proposed very similar syntax in this post and I was inspired by it.

I’ve edited my post where replaced .-> with .>


#59

True. Who knows, maybe there is a good syntax out there. However, currently I’m very skeptical about this feature. The example where you transformed clap’s API to use this syntax looks less clear than it does without. That’s unfortunately not reassuring.

About collecting into a sorted vector, here’s a recently opened thread by @matklad.

Also I’ve found this crate that makes implementing builder patterns easy.


#60

I simplified syntax a bit.

Here functions that takes &mut self just called with ~ instead of .

    let mut collection = get_collection(); // `mut` is still required here.
    collection~sort();                     // `~` shows where we do mutation.
    return Type::new()  
        ~mutate1()     // This function mutates return value of `Type::new`. 
        ~mutate2()     // And this mutates return value of `mutate1`.  
        .transform();  // This don't mutates anything.             

Side-effects syntax also was changed to .{ fn_call; ... } and ~{ mut_fn_call; ... }

    let hmap = HashMap::new()~ {
        insert("key1", val1);    // `insert` is called on `mut HashMap`.
        insert("key2", val2);    //  This also is called on `mut HashMap`.
    };
    return OpenOptions::new()
        ~{read(true); write(true);} // Single-line representation     
        .open("location")?;     
    let mut collection = get_collection(); // `mut` is still required on binding.
    collection
        ~{sort();}
        .for_each(action);
    fn draw_scene(drawing: DrawApi) -> DrawApi {
        drawing. { 
            set_color(Color::Red);
            draw_circle(Circle::defined());
            draw_line(0, 0, 100, 100);
            draw_rectangle(specific_regtangle());
        }
    }
    fn set_abc(&mut self, (a, b, c): Tuple) -> Result<(), Error> {
        self.inner.{ set_a(a)?; set_b(b)?; set_c(c)?; };
        Ok(())
    }

In syntax for calling non-associated functions in is changed to it. That will be more intuitive

    return collection.iter()
        .apply_mapping_combinators(it) // Iterator moved to `apply_mapping_…`
        .apply_logging_combinators(it) // Iterator moved to `apply_logging_…`
        .collect();
    return x.sum(it).abs(it).sqrt(it).round(it, 2);
    return Type::builder()
        ~apply_common_settings(&mut it) // To mutate builder.
        .build()
    long_name_binding
        .borrowed(&it, &it, &it); // Multiple borrowing is allowed for all types.

    long_name_binding
        .copied(it, it, it); // Copying is allowed only if `Copy` is implemented.
    let message = new_message(). {
        send_a(&it);
        send_b(&it);
    };
    let text = String::new()~ {
        put_default_text(&mut it);
        put_additional_text(&mut it);
        debug_text(&it);
    };

And chain splitting will look like on following examples

    return PathBuf::from("base")
        .{fs::create_dir_all(&it).unwrap();} 
        ~push("filename")   
        .File::open(it);
    let content = String::new()
        ~{file.read_to_string(&mut it)?;};
    return Tree::root(0)~ { // `top-level` started.
        branch(1);          // Added `branch1` to `top-level`.
        branch(1)~ {        // Other `branch1` added to `top-level`.
            branch(2);      // Added `branch2` to `branch1`.
            branch(2);      // ...
            branch(2)~ {
                branch(3);
                branch(3);
            },
            branch(2);
            add_other_branches(&mut it); // External function is applied
        },
    };
    let matches = App::new("My Super Program")~ {
        version("1.0");
        author("Kevin K. <kbknapp@gmail.com>");
        about("Does awesome things");
        arg("config")~ {
            short("c");
            value_name("FILE");
            help("Sets a custom config file");
            takes_value(true);
        };
        arg("INPUT")~ {
            help("Sets the input file to use");
            required(true);
            index(1);
        };
        add_other_args(&mut it);
        subcommand("test")~ {
            about("controls testing features");
            author("Someone E. <someone_else@other.com>");
            arg("debug")~ {
                short("d");
                help("print debug information verbosely");
            };
        };
    }.get_matches();