General Syntax for Functions w/ Multiple Output Types (`gen`, `try`, etc.)

Yokin · April 30, 2024, 9:45pm

This might be premature, but I've been thinking about a general output syntax for gen/try-style functions. Just wanting to put these ideas out there.

General Syntax

Broadly, the arrow syntax (-> T) would indicate type-wise output in general, rather than exclusively the return type. Output types can be introduced by a keyword to specify which output they refer to. As usual, the return type would be introduced without a keyword; defaulting to unit if unspecified.

For gen functions, the yielded output type would be introduced by the keyword yield.

gen fn all_ints() -> yield u64 {
	for i in 0.. {
		yield i;
	}
}

For try functions, the error output type would be introduced by the keyword throw.

try fn assert_nonzero(input: i32) -> throw Error {
	if input == 0 {
		throw Error::Zero;
	}
}

To specify multiple output types, the output can instead be a semicolon-separated sequence of output types; wrapped in curly brackets. The general return type is the last type in the sequence if it isn't introduced by a keyword. Bracketing is required as it avoids ambiguity with associated items in traits.

try fn validate(input: String) -> { throw Error; String } {
	if input == "bad" {
		throw Error::BadInput;
	}
	input
}

try gen fn function(input: i32) -> {
	throw Error;
	yield State;
	i32
} {
	assert_nonzero(input)?;
	yield State::Pending;
	input - 1
}

Yielded types must be specified for gen functions, and error types must be specified for try functions. If the function isn't introduced by the respective keyword, the respective output type can't be specified.

This helps to prevent confusion from syntax such as gen fn function() -> T, where otherwise the reader may assume that T is the yield type (when it's actually the return type).

^{Syntax (Functions - The Rust Reference)}

...

FunctionReturnType :

-> (OutputType | OutputTypeList)

OutputType :

Type | yield Type | throw Type

OutputTypeList :

{ OutputType (; OutputType)^* ;^? }

Advantages

Requires no extra keywords (like plural forms - throws, yields).
Easily extended to any similar future possibilities.
Unambiguous to parse and read.
- Types like fn(T) -> U and impl Fn(T) -> U require no parentheses.
- Output types are clearly delineated, unlike: try fn validate() throws Error -> String ("throws error to string"?).
- The open bracket draws attention to output broken onto multiple lines, which may be common.
- Reads as a type-wise complement to the function body (e.g. { yield A; B } to { yield a; b }).

Disadvantages

If bracketing is forgotten for the output of a required function in a trait, the latter output types would be evaluated as associated items.

This would hopefully always error, but it could leave the door open for future associated items with type-ambiguous syntax to cause confusion.
```
fn function() -> yield A; B; // `B` is evaluated as an associated item.
```
Limits the ability to use { .. } as syntax for a type.

Anonymous structs for example; although those could, and probably should, be introduced with a keyword (e.g. struct { .. }).
Readers might confuse a function's bracketed output for something else.
- It could be confused for a type, but this is unlikely since { .. } isn't used for any existing type.
- It could be confused for the function's body, but this is unlikely since the output should generally be small and involve Pascal-case types.
Using the same keyword is less "greppable" - looking for value-wise usage of throw or yield would turn up semi-false positives. Functions that throw or yield would still be searchable using gen and try.

Alternatives

The intuitive syntax for gen and try functions is usually provided as gen fn f() yields Y -> R and try fn f() throws E -> R, using the keywords yields and throws.

The intention of the syntax proposed in this post is to be a less ambiguous and more extensible form of the same idea.
Could avoid bracketing by disambiguating output types from associated items by keyword, but this would be more fragile to parse and ambiguous to read: gen fn f() -> yield Y; R;.
Could use a different bracketing style.
- (yield A; B), [yield A; B] - May be confused for a type, particularly a tuple or array.
- <yield A; B> - May be confused for generic parameters.
- Some other token, like how closures use | for their parameter list.
Could use a different separator between output types.
- , - May read like a keyword applies to each of the following types (yield A, B).
- | - May appear like an anonymous sum type syntax (yield A | B).
- :, =>, >> - May read like the output type evaluates to the next type in some way (yield A => B).
This would also hurt the ability to read the output "as a type-wise complement to the function body".

hyeonu · May 1, 2024, 3:11am

How about in this form?

try fn foo() -> String, throw Error {

IMO in this kind of cases String should be specified first as it's the main return type of the function. I'd read it as "a fallible function foo with no args that returns String, but also may throw Error".

Yokin · May 1, 2024, 5:26am

That works too - it seems unambiguous and readable. Also the "alternative intuitive syntax" I mentioned for try functions is backwards, cause I think it's usually provided as try fn foo() -> R throws E (error after return type as well). Both are probably backwards in a sense; I think it's more intuitive for yield to come before the return type and for throw to come after.

The only problem I can see is the where clause being possibly ambiguous with the output list, especially if a trailing comma is allowed. It would probably be fine, it might just confuse beginners at worst.

try fn validate<T>(input: T) -> T, throw T::Error, where T: Input, T::Error: Error {
	if input.is_bad() {
		throw Error::BadInput;
	}
	input
}

I'm not sure if others share the interpretation, but in Rust I've always read function return types as a "reduced" version of the body; just due to Rust being expression-based and having implicit returns. With that interpretation I think the "type-wise complement" syntax reads more intuitively, but I don't feel super partial either way as long as everything is unambiguous and documentation is easy to read consistently.

Nemo157 · May 1, 2024, 8:29am

For a try fn you don't have an "error" type to specify, for the external interface you need to specify the residual:

fn foo() -> impl Try<Output = String, Residual = Result<!, Error>>

fn bar() -> impl Try<Output = String, Residual = Option<!>>

Then, for the internal interface you need to specify which FromResidual implementations are required, and what their behaviors are, which the only way I know to do so is by specifying an actual concrete Try type.

Yokin · May 1, 2024, 5:05pm

My understanding is that a -> T, throw E might effectively just be syntax sugar for -> impl Try<Outout=T, Residual=Result<!, E>>, and if more control is desired you would be able to write the impl Try signature manually like with async functions. I think it's very much undecided though so fair enough.

josh · May 1, 2024, 8:46pm

See Rethinking Rust's Function Declaration Syntax for a different take on this problem.

My personal take: I would be very hesitant to adopt a special syntax for multi-output functions, and would prefer to spell out the actual type being returned (e.g. Result<T, E>). Any proposed alternative syntax that doesn't include Result<T, E> would have to have a really clear reason why it's important to obscure that type from the user.

Yokin · May 1, 2024, 10:30pm

I think try fn requiring an explicit type like Result<T, E> would seem a little inconsistent if the function automatically Ok-wraps its output. IMO the -> T syntax should remain as the type of the value that the function returns in its body, like how async fn works; describing how the output can be used, but not exactly what it is. At best this would future-proof the output type to be swapped to something else, but I think that would require a Try::into_error method?

Either way, the ideas floated in that blog post are very interesting.

dlight · May 2, 2024, 1:32am

On the other hand, async fn obscured the return type anyway.

I think one way to justify this is that async fn is synthesizing a new type and as such you don't gain any expressivity by spelling impl Future<Output = T> (apart from being able to put bounds on it), but Result is a concrete, existing type and a try fn could potentially work with a different MyResult type.

As such, the earlier fehler library implemented try fns by obscuring the return type, and the newer culpa library provides a syntax to let you write Result directly. I think culpa looks better here, and I think its #[try_fn] fn f() -> Result<..> could just be stabilized as try fn f() -> Result<..> .

But a hypothetical gen fn would create a new type just like async fn and as such it makes sense to omit the full type (or if not, it would make sense to revise the async fn syntax in a later edition)

mathstuf · May 3, 2024, 3:53pm

Some questions that I have (considering edge-cases, not the meat of the proposal):

How do these compose? For example, a generator that yields a Result<T, E>; can this use a throw form to specify it. I see OutputTypeList could have this as -> A, yield Y, throw E, but…what is this type in the end?
The grammar allows A, throw E1, throw E2. Is this a post-parse error?

With the comma syntax, the comma needs hidden from the turbofish if used in a generic context in some way forcing a bracketing when embedded. macro_rules may need to consider this? Also, I have a feeling that this runs into some of the (unstated) rules about grammar complexity and such.
Nesting also needs to be considered; how do I say "I return a fn () -> String or throw E" versus "I return a fn () -> String, throw E" (basically forced bracketing even for the -> { throw E } case with either separator)?

Yokin · May 3, 2024, 5:29pm

I don't think reusing throw for yielding errors would work. I guess you could chain the keywords (yield throw T), but I don't think that would work in the body since throw T and yield T are both expressions that evaluate to unit. If desirable enough it would probably require a new keyword.
I was thinking this would be a post-parse error since it could be annoying to have a specific order (pub async unsafe gen try fn), but an enforced order would probably be more future-proof and better for documentation.

For nesting I don't think bracketing would ever be required for singular outputs, but that's a good point. If this syntax were ever desired for function pointers you could use parentheses, otherwise assuming a single output:

fn a() -> (fn() -> String, throw E) {
    todo!("return a `fn() -> String, throw E`")
}
fn b() -> fn() -> String, throw E {
    todo!("return a `fn() -> String` or throw `E`")
}

but on the other hand general bracketing requires no assumption, so is less surprising

fn a() -> fn() -> { throw E; String } {
    todo!("return a `fn() -> { throw E; String }`")
}
fn b() -> { throw E; fn() -> String } {
    todo!("return a `fn() -> String` or throw `E`")
}

CAD97 · May 5, 2024, 9:44am

One potential reason for try fn to not specify a specific type (i.e. Result) is if it didn't return a specific type. Less circularly, a possible world has it where try fn f() -> { T, yeet E } desugars into fn f() -> impl Try<Output=T, Residual=Yeet<E>>^[1], with the only practical way to call f being as f()?; at the point you want to reify the "effect" you "handle" it with a try block.

A notable limitation of try blocks as currently implemented is their type inference. IIUC the idea there was to have plain try {} blocks require all ?ed types have the same Residual::TryType (or however "stem preservation" ends up implemented). try fn producing opaque anonymous impl Try types at least has an impact on this, although I can't say if it'd be positive or negative at the moment. Returning the caller's choice would allow try fn to unify into any impl Try, although then the try effect wouldn't be RPIT sugar like the async and yield effects are… but maybe that could be desirable, given how Try works?

An actually quite interesting possibility could be to remove the error conversion from FromResidual<Result<!, _>> for Result and move it to only be on FromResidual<Yeet<_>> instead. I don't know how good the idea is, let alone if it's even practical, but it's at least interesting. The most common complaint I've seen about Try (after it just not being stable yet) is that the residual machinery is overcomplicated and it should just split into the output type and the type which is thrown into do yeet.

Ultimately, though, I'm mostly currently in agreement that Result doesn't need to be lifted to an effect and it's an implementation detail of the function whether it uses try, so I weakly favor allowing something more like fn f() -> Result<T, E> = try { … } instead of try fn, if try fn doesn't modify the signature.

There's a really interesting discussion available to be had about the difference between the combination of effects and the composition of effects. But the short version of it is that -> { (), yield T, throw E }, -> { (), yield { T, throw E } }, and -> { { (), yield T }, throw E } all communicate slightly different things. Rust doesn't currently have the vocabulary to talk about the first which combines the effects instead of composing them.

Failure is often the least interesting effect for combination, because you can generally make a decent enough case for composition order with resumption effects — if failure is transient and you can resume again to get a different result, then failure is composed inside the resumption effect, and if failure is terminal and you can't resume afterwards, then failure is composed around the resumption effect. But if you have multiple resumption effects to combine, it gets more difficult to say that they compose cleanly.

The least wrong composition order is probably -> Ready(Some(Ok(T) | Err(E)) | None) | Pending, but this is mostly just an arbitrary choice since the different data compositions are isomorphic. This order is already stably observable in std, though, via impl Try for Poll<Option<Result<T, E>>>. I won't complain about this impl yet again except to say I still dislike it and ? should've branched away the outermost layer (Pending), not the innermost one (Err(E)).

A more direct reified translation of the return specification would actually be -> impl Try<Output=T> + FromResidual<Yeet<E>> instead, but as an opaque RPIT this is essentially unusable. It'd become a lot more interesting if the caller was allowed to choose the impl Try container, though… ↩︎

InfernoDeity · May 7, 2024, 12:24pm

try is kinda weird here among async and gen in that try presumably doesn't just return an impl Try.

I think it would be very weird, though, if gen functions were inconsistent with async functions, since those do return impl <CorrespondingTrait>, and you omit specifying the RPIT, so it probably does need something here for Yield

system · August 5, 2024, 12:24pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Throwing Functions language design	39	7691	March 25, 2019
No return for generators language design	16	1918	January 16, 2020
An Alternative Syntax for Async Functions	43	3809	March 2, 2022
[no-rfc] rust function format, fn foo(x: i32) -> i32 = const { language design	16	1381	August 19, 2023
Syntax for returning early with an error language design	62	8318	July 21, 2021

General Syntax for Functions w/ Multiple Output Types (`gen`, `try`, etc.)

General Syntax

Advantages

Disadvantages

Alternatives

Related topics