Unbaked Idea: DRY using "parameterized inline modules"

In a discussion about RFC #2377, Item-level-blocks, @fknorr floated the idea of selling that RFC as "anonymous modules" which:

[..] has precedent with C++'s concept of anonymous namespaces, where namespace {} introduces an unnameable scope with the parent automatically importing all definitions.

Syntactically, you would then write:

#[cfg(debug_assertions)]
mod {
    // ...
}

On its own however, being able to write mod { .. } just so that you may #[cfg(debug_assertions)] did not seem develop consensus.

However, I have found another motivation for this syntax: not repeating type parameters. This idea is inspired by the using (x : a, y : a, xs : Vect n a) notation in Idris.

Simply put, we call mod { .. } an inline module (IM), since all of its contents are inlined into the enclosing scope. We then extend this with the ability to quantify and bound variables:

mod<'a, T, U: Copy, const N: usize> {
    ...
}

Items enclosed in this parameterized inline module (PIM) are all prepended with the list of parameters in the mod<...>. A simple example:

mod<T: Default> {
    fn foo<U>() -> T { T::default() }
}

fn main() {
    assert_eq!(0, foo::<u32, bool>());
}

This is imagined as primarily useful for generics heavy code that work within the same generic context over and over again. An example of this is diesel, which repeats DB: Backend over and over again...

Some things to note:

  1. structs, enums, unions and impls, inside mod<..> must all use all the quantified parameters inside mod<..>. Otherwise, there are unused or unconstrained parameters, which Rust does not allow.

  2. statics are not allowed inside PIMs because doing so would introduce generic statics, which can cause soundness issues due to the same sequence of concrete types applied, at different places, to the generic static ending up with different addresses in memory. This is perhaps a fundamental limitation, but perhaps not.

  3. This is one stop short of ML modules. Introducing those would be done by allowing you to name and quantify at the same time: mod foo<T: Bound> { .. }. While being an interesting idea, this is not proposed at this time. It might however be that taste or expectation develops for those over time due to their inline cousins being added.

  4. Implied bounds alleviate some of the problems PIMs are trying to solve, but not fully.

  5. If people abuse this feature, by adding too many things in the PIM, understanding can get harder, but this is not automatic. PIMs, like other current or would-be language features, are tools at your disposal, so use them tactfully. The same goes with cyclomatic complexity in fn bodies, overly complex types, and so on.

  6. Prepending was chosen as the thing I thought was most obvious; but if there is consensus that Appending is more natural, I'm totally down with that. I don't have a strong feeling either way.

    Regarding the arbitrary nature of making a choice at all, I think it is fine to have such conventions, as long as they are applied consistently.

    An argument can be made that prepending is better for const dependent types (const generics) but that appending is better for turbofishing since the context could be assumed to be more inferrable.

  7. Is inline in PIM the right word? Is anonymous (PAM) more appropriate?

Right now, this is a very unbaked idea. But I thought I'd spark a discussion. Thoughts?

5 Likes

To me, it looks quite logical extension, so it would depend on the details (eg. like things forbidden inside), but on the first glance it sounds good.

In that sense, would it make sense to ask about a more generic thing? What everything in Rust has curly braces and what all annotations can be placed on them? If mod is the only thing that canā€™t have generic parameters, then it would make sense just for the consistency. And, does it really need to be a mod { }, or just a block/curly braces would be enough? These are mostly brainstorming ideas, just throwing them to the table to see where these could lead.

About the nameā€¦ if the items inside ā€žspillā€œ to the parent namespace with all their modifiers (eg. mod { pub struct Stuff } is visible from outside the outer module), then inline is probably a better name.

Great!

I believe this is the situation on that front:

  • mod can't have generic parameters (fixed in part by this ~proposal, but not for named mods),
  • static can't have generic parameters -- for reasons I touched on before,
  • const can't have generic parameters, but you can have generic consts indirectly by use of inherent or trait impls. Changing this is not so hard I think and could be useful in the future.
  • type inside traits and impls can't have generic parameters (GATs fix this).
  • extern can't have generic parameters, and they shouldn't, because what does a generic FFI function mean? (this means that we have to disallow extern { .. } inside mod<T> { .. } as well, or at least not let them reference the quantified parameters...).

All other forms of items can have generic params, AFAIK.

So that would be <T, const N: usize> { .. }? Unless there is some syntactic ambiguity (none comes to mind), it should be technically feasible. But I feel as tho there should be something to hang the quantification on. Just having <T, U> { .. } feels a bit empty. Furthermore, { .. } is not permitted at the top level at the moment. We could allow mod { .. } as a statement (we currently allow (named) modules inside fn), which means that you can do:

fn foo() {
    mod<T> {
        fn bar(x: T) -> T { x }
    }

    bar::<usize>(1);
}

which is a boring example, but it was all I had time to think of ^,-

If you want to quantify for lambdas and such, perhaps for<T> { .. } should be considered as in?:

let identity = for<T> { |x: T| -> T { x } };
// or just:
let identity = for<T> |x: T| -> T { x };

However, this possibly touches on RankNTypes which could be non-trivial to support.

even more unbaked idea on top of this: could this be prototyped as a macro/plugin attached to the mod, which just applies/prepends the modā€™s parameters to each of the members?

and in a different vein, for the initial prototype would it be better to ā€˜enforceā€™ your first note and reduce any confusion of 6, by having ā€˜proxyā€™ generics. IE mod<T: Default> { fn foo<T,U>() -> T { T::default() } } where the foo's T is specifying the ā€œlocationā€ of the mod's T?

1 Like

Probably? The complexity of such a macro depends on how far you take it... Do you also have to consider fn inside fn and such scenarios or is it just the immediate stuff inside mod<..> { .. }?

Sounds feasible; however, I would expect the binder in fn foo<T, U> to shadow the first binder in mod<T: Default> { .. } wrt. T, so my hope is that that would remain a prototype but not the final thing.

I'd tend to say that yes, it should be mod {}, because a naked pair of curlies is already an expression in Rust, so even if it's not allowed at the top level (and so it doesn't technically cause syntactic ambiguity from the compiler's PoV), it might confuse programmers or at least look weird. Furthermore, one of the primary basis for "namespaces" in Rust are modules, not blocks, so I think mod {} would mesh better with that aspect of the language too.

3 Likes

I like the idea! Should your example be:

mod<T: Default> {
    fn foo<T, U>() -> T { T::default() }
}

That is foo<T, U> instead of foo<U> ?

I think anonymous (PAM) modules is a better naming and mod { } should be used.

This idea could be even extended to named modules, where a module takes one or multiple types as parameter. Inside the modules these type parameters are used as you describe them but the user is then allowed to have multiple instances of the same module:

use mod1<u32> as m1;
use mod1<bool> as m2;

m1::bar(); // u32 used here inside bar()

m2::bar(); // bool used here inside bar()

Since you mentioned OCaml, IIRC they allow s.th. like this. But this would be a separate RFC (built on top of yours?).

That is foo<T, U> instead of foo<U> ?

I believe the point is that the types attached to mod<> are 'implicitly' prepended to the parameter list when the mod is inlined into the parent module, so they only need to be defined in usage (outside? of the implicit/anonymous module)

1 Like

I get the motivation for parameterized/generic modules, but whatā€™s the motivation for also making them anonymous? In other words, why do we want this to be mod<T> { ... } instead of mod foo<T> { ... } use foo::*;, other than slightly less typing?

Is it just to sidestep the awkward questions about how paths and imports work in the presence of named parameterized modules?

4 Likes

Iā€™d definitely appreciate the ability to elide a bunch of repeated generic parameters. Itā€™s one of the things I missed from C++, where you can do:

template <typename foo, bunch of other template parameters> struct scope {
  template <typename bar> struct first_object {ā€¦ details that refer to foo, etc. ā€¦};
  template <typename baz> struct second_object {ā€¦ details that refer to foo, etc. ā€¦};
  
  // no actual details for struct scope, it's only here to be a scope rather than a regular struct
};

ā€¦

typedef scope<uint64_t,ā€¦> concrete_scope;
typedef concrete_scope::first_object<ā€¦> concrete_first;
typedef concrete_scope::second_object<ā€¦> concrete_second;

But this PIM syntax just doesnā€™t feel right to me, and it doesnā€™t fully match my use cases.

  • Itā€™s a mod thatā€™s not a mod? The way I see it, defining attributes of ā€œmodā€ are making a privacy boundary and making paths distinct. This doesnā€™t do either of those things. So for this feature as described, I think it should be called something other than ā€œmodā€.
  • The automatic merging of the parameter lists also seems confusing. If you just looked at the type signature of foo where it appeared in the file, youā€™d write the wrong number of generic parameters and then be confused that the compiler was giving you errors for not giving enough generic parameters. (Good error messages could alleviate this a bit, but it still seems counterintuitive.) In particular, you could easily have a fn foo that USES a type parameter from the PIM, but only in its body, and doesnā€™t refer to that type in its signature, so there isnā€™t even a hint that you need to go look for the definition of that type. At first glance, Iā€™d prefer having the 2 lists be separate, like the C++ way where you say scope<parameters>::specific_struct<parameters>. That seems pretty similar to a having explicit, named, parameterized modules.

A related thing I miss from C++ is being able to define a struct inside a function and have it use the template parameters from the outer scope. In Rust, you have to make the inner struct be a generic struct that takes all of the parameters it needs. Thatā€™s another case of parameter list duplication. Whatā€™s the interaction between PIMs and structs defined inside a function? If those struct inherited the PIM parameters, that would prevent you from just making a regular one-off struct thatā€™s useful for the functionā€™s algorithm. So presumably, a struct defined inside a function inside a PIM would NOT inherit the parameters from the PIM, which makes such a struct even more second-class when you actually want to use the outer parameters.

Let me see if I can come up with a variation on this feature that would meet my particular needsā€¦

3 Likes

As long as we're spitballing syntaxes, here's a random thought: we could use an explicit ...

mod<A, B, C> {
    fn foo<..>() -> B { ... }
    fn bar<.., X>() -> X { ... }
}
2 Likes

Oh, thatā€™s a neat way to deal with the implicitness problem. Perhaps you would even be allowed to put the ā€¦ in whatever position in the argument list you wanted. Although maybe standardizing it would make it easier to learn.

The mod<...> {} syntax, and even the name ā€œparametrized inline modulesā€, strongly suggests that this is the intersection of two separate features:

  1. parametrized modules, where the module is named and the generic parameters are associated with it, rather than duplicated into all items inside the module
  2. anonymous/inline modules, which splat their contained items into the parent module

Having this feature but not those two would be pretty misleading and disappointing. So Iā€™d really suggest painting the bikeshed a different color, unless weā€™re sure those two features or something similar will make it into the language too.

3 Likes

Having thought about it for a bit, I think the feature I really want is explicit, named, parameterized modules. Iā€™m thinking youā€™d declare mod module_name<T: Trait, ā€¦>, and T would automatically become an item in inside module_name, like it was a type alias that was explicitly written at the beginning of the module. (Why an item? Well, Iā€™m thinking about when you have more submodules over multiple files. Submodules normally have to explicitly use things from their super, so it would be odd for this to be an exception.)

Thereā€™s still a bit of an inconvenience. I often want to write logically-related pieces of code near each other in the same file, but if some of the logically-related functions DONā€™T depend on the outer generic parameters, then either I couldnā€™t write them in the same file, or you wouldnā€™t be able to use them without giving the module some explicit, arbitrary generic parameters. UNLESS the parameters of the module were somehow determined lazily(?) ā€“ i.e. you can import the module without specifying all of the generic parameters, and thatā€™s permissible as long as you donā€™t actually USE any of the moduleā€™s descendants that refer to the parameters you didnā€™t specify.

It's probably worth mentioning that the main reason C++ has anonymous modules namespaces is because, by default, all top-level items in C++ have "external linkage"/are potentially visible to all other code in the program, so nowadays whenever you have an item that needs to be accessible to an entire file but isn't part of that file's public interface, you put it in an anonymous namespace to give it "internal linkage".

Because Rust had mod and pub and was private-by-default on day 1, that motivation does not apply.


I also personally think we should try to solve the awkward path/import questions around named parameterized modules rather than bundle them up with anonymous modules to work around that.


With explicit .. we could easily permit items in the module that lack the ... But this would still create an additional awkward wrinkle for the path/import questions.

1 Like

The .. idea wouldn't work as-is with full parameterized modules, because you could have multiple nested modules that all have parameter lists, so you'd have to disambiguate which one the .. referred to.

I'm probably repeating previous design work here, but the way I imagined the parameterized module syntax is this:

mod foo<T: Display, U: Clone> {
  pub fn bar(input: T) {
    println!("{}", input);
  }
  
  pub fn baz() {
    // does not use T or U
  }
  
  pub mod submodule {
    use super::U as Something;
    
    pub fn quux(something: & Something)->Something {something.clone()}
  }
}

use foo<i64, u32> as first;
use foo<i32,_> as second;
use foo as third; // all parameters elided

fn main() {
  // always legal because baz doesn't use the generic parameters
  first::baz();
  second::baz();
  third::baz();
  
  // legal because T was specified
  first::bar (3i64);
  second::bar (3i32);
  
  // legal by inferring the generic parameter T, or maybe just a compile error if inferring is too confusing
  third::bar (3i16);
  
  // definitely compile error because it over-constrains the generic parameter T
  third::bar (5.0f64);
  
  // compile error because it infers U = NotClone and that doesn't meet the bounds
  struct NotClone;
  second::submodule::quux(NotClone);
}

Just a meta note: The term ā€œbrainstormingā€ really is the appropriate description for the prior posts in this thread.

1 Like

I really donā€™t like the ā€œimplicit prependingā€ idea. Something Iā€™d be more comfortable with would be the following strawman:

mod foo<'a, A: Tr1> {
    fn bar<B: Tr2>() -> Baz<'a, A, B> { .. }
}
// ..
foo::<'static, i32>::bar::<f32>();
// or something like
fn bonk<'a, T: Tr1>(&'a T) {
    use crate::foo::<'a, T>;
    foo::bar::<f32>();

    // or maybe even:
    use crate::foo::<'a, T>::bar;
    bar::<f32>():
}

Basically, I donā€™t want mod <..> to be a shorthand for repeated type parameters. If you want to use it, you have to parameterize the module, not its contents. If you want to expose the concatenated version, I imagine you could do

mod foo<T> {
    fn bar<U>() { .. }
}

// re-export `bar` with the module's 
// parameters prepended
pub use foo<..>::bar;

bar::<T, U>();

Also, if we allow this sort of syntax, I suppose weā€™d need to allow general concretization in all use statements, so you could do silly things like

pub use std::Box<FnOnce()> as MyBox;

and oh my god Iā€™ve basically replicated C++'s templated using what have I done I might as well just say that nonsense like pub use &'_ _ as MyRef is allowed.

2 Likes

After a little more thinking, since weā€™re throwing around silly ideas, here is an unfiltered idea that hasnā€™t even had the yeast added:

In general, <T, ..> becomes a true path component, which in a lot of situations itā€™s sorta like that: in my slightly more reasonable proposal, if you have path to module path::to::foo, you need to include type parameters with a turbofish before you can access its contents: path::to::foo<T>, or you can write path::to::foo::<..> to delay application of the arguments further down the chain. What if instead we allow something like

mod foo<T> {
    fn bar<U>() {} 
}

use self::foo::<..>::*;

// the type parameters are forwarded,
// but now there's two parameter lists!
bar::<T>::<U>();

// or even allow multiple type parameter lists
fn baz<T><U>() {}
baz::<T>::<U>();

// useful for HKTs without type closures!
struct Foo<'a><A>;
fn foo<T<_>>() {}
foo::<Foo<'a>>(); // instead of
foo::<|X| Foo<'a><X>>();

// even more nonsensical:
mod foo<T> {
    mod bar<U> {
        fn baz<V>();
    }
}

use self::foo::<..>::bar::<..>::baz;
baz::<T>::<U>::<V>();

This gives us a vaguely sane way to do inline modules:

mod foo {
    mod<T> {
        fn bar<U>() {}
    }
    static BAZ ..;
}

self::foo::<T>::bar::<U>();
self::foo::BAZ = 0;

use self::foo::<..>::bar;
bar::<T>::<U>();

Feel free to take all of this with a grain of salt. I just had this idea and Iā€™m pretty sure most of what Iā€™m suggesting is terrible for readability. Especially the type-currying part.

Edit: Alternative, more reasonable(?) syntax for ā€œforwardingā€ generic parameters:

use self::foo::<*>::bar;

By analogy with *, weā€™re importing all possible parameterizations of foo, though I think this makes around as much sense as use foo::*::bar;ā€¦ but this entire reply is off-the-rails brainstorming.

1 Like

Right, thatā€™s the whole point! Sorry for the noise I was thinking of s.th. differentā€¦