##Disclaimer: This is not an RFC. This is not even a proto-RFC. I am merely trying to have a disucssion about a problem I see in organizing code, and would like some commentary on this, as well as an exploration of the possible solution space. Do not criticize this as harshly as you would an RFC. It is not complete. It does not discuss drawbacks.
Currently datatypes can be parameterized by types. I am going to make the argument that both datatypes and modules should be able to be parameterized by (optionally bounded) types, constant values, and potentially other modules; that this doesn’t increase the complexity of the language because the simple concept of subsitution applies uniformly in all cases; and that this strictly increases the utility of the language, although perhaps not the expressiveness.
#Example 1: Threading Constants Throughout Code#
Suppose you are writing some code that must be configured by constant values. Perhaps you are writing a language independent software package and someone has contributed several translations for all messages that might be user-facing. You normally have three options in Rust.
##1.) You can declare a lot of consts in a module. mod english { pub const HELLO: &'static str = “Hello”; pub const GOODBYE: &'static str = “Goodbye”; pub const PROBLEM: &'static str = “Expression Problem”; }
mod español {
pub const HELLO: &'static str = "Hola";
pub const GOODBYE: &'static str = "Adios";
pub const PROBLEM: &'static str = "Problema de ExpresiĂłn";
}
mod français {
pub const HELLO: &'static str = "Bonjour";
pub const GOODBYE: &'static str = "Au Revoir";
pub const PROBLEM: &'static str = "Problème d'Expression";
}
Now, in your main, you just import the correct module (and remember to qualify it!), and use it transparently.
use english as lang;
fn main() {
println!("{} {}", lang::GOODBYE, lang::PROBLEM);
}
Are you shipping your build to spanish-speaking customers? No problem; just change the use line to use spanish as language
. If you have misspelled either the identifiers or the module name, the compiler will shriek at you and everything will be caught at compile time.
This is very manual, though. Perhaps your requirements have changed and now you don’t need to express certain phrases in your modules. It is now up to you to remove each obsolete phrase by hand, and no one will help you if you forget to do so (a benign error, admittedly). Moreover, even though you are quite principled and keep your use declarations all at the top of your module, you must now put a region in your code dedicated to configuration.
// Configuration //
use english as lang;
// End Configuration //
use other::datatypes;
fn main() { ... }
It works, but again: this is a fairly manual way of doing this. The worst part of this is that ultimately it breaks down if many, many other modules depend on a global configuration. You cannot, for instance, do this:
use english as lang; // This is in my main module.
mod sub {
use super::lang; // "error: unresolved import `super::lang`. There is no `lang` in `???`"
}
##2.) Use a configuration datatype. The module qualification didn’t cross over module boundaries, but an identifier does.
mod langs {
pub type s = &'static str;
pub struct Language {
pub hello: s,
pub goodbye: s,
pub problem: s,
}
pub const ENGLISH: Language = Language {
hello: "Hello",
goodbye: "Goodbye",
problem: "Expression Problem",
};
pub const ESPANOL: Language = Language {
hello: "Hola",
goodbye: "Adios",
problem: "Problema de ExpresiĂłn",
};
pub const FRANCAIS: Language = Language {
hello: "Bonjour",
goodbye: "Au Revoir",
problem: "Problème d'Expression",
};
}
const LANG: langs::Language = langs::ENGLISH;
mod sub {
use super::LANG;
}
It works! But there’s a caveat: now my client is a software developer and they want my general application framework that they can configure on their own. As it is, we can’t do this, because the configuration is passed by referencing the configuring module. But in the same way that you turn an executable into a library by stripping main
, we need to strip the module out. Hmm.
We could always do this the C way and namespace a static mut
, which we configure at runtime. Then our clients just need to call library::set_language(ESPERANTO)
And they’ll be set.
Uhh… goodbye thread safety. It’s like programming in an imperative toolkit all over again. It also doesn’t express the idea that we are offering a configurable constant. If we had Java’s final
, this would be possible, although I can’t remember if you can defer a final assignment to a client in this way.
The solution I would most certainly use would be to write all of my code as an impl on a datatype that is parameterized by a Language
. This is what that would look like:
struct Application(Language);
impl Application {
pub fn some_fn(&self) { ... }
pub fn some_other_fn(&self, arg: i32) -> i32 { ... }
...
}
And to use it, we simply construct an Application struct with our preferred configuration and call methods on it.
fn main() {
let a = Application(langs::ENGLISH);
a.some_fn();
a.some_other_fn(102);
}
This is a fine way to solve the problem, but there are some quirks. First, we always need a value to call methods on; there is no way to say once that some functions are in scope and then use them thereafter because there is state that is carried around. But that state is constant, so we lose the benefit of referring to globals implicity compared to having “free” constants not tied to a structure. Furthermore, we have changed the shape of our code considerably. When we were programming with a module, we wrote free functions that lived in some space and imported them as needed. Now, we are writing functions that live in an inherent impl, we have to construct values to hold our configuration, and we must refer to a “self.0” throughout the methods anywhere we try to use the configuration. This is a loss of tersity.
But not only is it a loss of tersity, it simply isn’t common to program this way in Rust, at least from what I’ve observed.
We did not want our collections of code to be values. We wanted our code to work on a collection of code provided to our library. Luckily, modules are not the only way to organize collections of code.
##3.) Go Full Object Oriented Sorry if this hurts.
We start with a trait to express the functionality we are parameterizing over.
#![feature(associated_consts)]
trait Lang {
const HELLO: &'static str;
const GOODBYE: &'static str;
const PROBLEM: &'static str;
}
We then declare dummy types.
struct English;
struct Espanol;
struct Francais;
And now that we have gained namespaces to put all of our consts, we just write impls.
impl Language for English {
pub const HELLO: &'static str = "Hello";
pub const GOODBYE: &'static str = "Goodbye";
pub const PROBLEM: &'static str = "Expression Problem";
}
impl Language for Espanol {
pub const HELLO: &'static str = "Hola";
pub const GOODBYE: &'static str = "Adios";
pub const PROBLEM: &'static str = "Problema de ExpresiĂłn";
}
impl Language for Francais {
pub const HELLO: &'static str = "Bonjour";
pub const GOODBYE: &'static str = "Au Revoir";
pub const PROBLEM: &'static str = "Problème d'Expression";
}
Now our application needs to be generic over any kind of language.
struct Application<L: Language>(L);
And after this, we can just write the entirety of our application as an impl within our Application struct. Glorious.
impl<L: Language> Application<L> {
pub fn new(lang: L) -> Application<L> { Application(lang) }
pub fn say_victory(&self) -> String {
format!("{}, {}!", L::GOODBYE, L::PROBLEM)
}
}
Lovely. Our client can now write his own languages, which the compiler will check for completeness, and use our library accordingly.
fn main() {
let a = Application::new(Esperanto);
a.say_victory();
}
The ugly part is having to write structs to carry around our code. If you’ve seen my trick for introducing dependency injection through type hints without passing around structs, then you know that you can move the formal parameter into a generic one, so that you can write the above block as
fn main() {
let a = Application::new<Esperanto>();
}
There is a fourth solution that is not possible in Rust. Making new types for each collection of phrases was the wrong abstraction. Really, we wanted to pass around a constant configuration datatype at compile time, like in solution 2, but we want this constant to parameterize our module.
4.) Parameterized Modules
Reusing the lang module from (2), this might look like:
use langs::Language;
mod application(const L: Language) {
// ...code... //
fn main() {
println!(L.hello);
}
}
let spanish_app = application(langs::ENGLISH); // creates a new module synonym
fn main() {
spanish_app::main();
}
This works cleanly for every situation thus delineated. It clearly expresses that our module is dependent on some constant of type Language in order to be used. It does not require an informally specified “configuration region”, naming schemes, or introducing excessive types and traits. Moreover, we have continued to use modules, as is familiar to most Rust programmers, and we can export our new “module with a hole”. This can also nest arbitrarily.
#Example 2: Parameterizing code over bounded types
Suppose you are writing a collection of filesystem utilities that are generic over any type that implements a Filesystem trait. Every Filesystem also has an associated type D: Directory, and every Directory has an associated type F: File. There are three different traits! The gods of polymorphism are proud of you. How do you write out this code?
##1.) Just use generic functions, obviously. This is so obvious it didn’t even deserve a question. You can pretty obviously just make every function generic over the filesystem you are operating under. Done. Time to pack up. But then again, something feels inconvenient…
// I apologize in advance for this API. It is meant to be illustrative, not accurate.
fn openFile<FS: Filesystem, S: Into<String>>(s: S) -> FS::D::F { ... }
fn closeFile<FS: Filesystem>(f: FS::D::F) { ... }
fn ls_all_rec<D: Dir>(d: D) -> impl Iter<Item=D::F>
// Assumes impl trait feature has landed. Someday!
{ ... }
fn ls_all<D: Dir>(d: D) -> impl Iter<Item=D::F>
{ ... }
fn ls_all_by_name<FS: Filesystem, S: Into<String>>(s: S) ->
impl Iter<Item=FS::D::F> { ... }
Oh, that’s right. Once you commit to making your functions generic, you public API goes to hell. Every function in this module now has its signature polluted by FS:Filesystem
, D:Dir
, or F:File
. Even worse is the overqualification of names. Having to write FS::D::F
everytime is horribly inconvenient. Lastly, the functions that are heterogeneously generic with respect to the rest of my module are obscured because of the extra parameters I have to throw in. What I mean to say is: it should be obvious that every function in the module is generic with respect to filesystems, but what’s really important for users to see is the occasional Into<String>
, because that is not consistent throughout the module.
What I would like to do is scope the generic filesystem parameter across the entire module. There is, of course, a Rust feature that lends itself well to scoping parameters…
##2.) Use a generic datatype. Just write an adaptor for all Filesystems (adaptor pattern?). struct FSExt<FS: Filesystem>(FS);
impl <FS: Filesystem> FSExt<FS> {
pub fn openFile<S: Into<String>>(s: S) -> FS::D::F { ... }
pub fn closeFile(f: FS::D::F) { ... }
pub fn ls_all_rec(d: D) -> impl Iter<Item=D::F>
// Assumes impl trait feature has landed. Someday!
{ ... }
pub fn ls_all(d: FS::D) -> impl Iter<Item=D::F>
{ ... }
pub fn ls_all_by_name<S: Into<String>>(s: S) ->
impl Iter<Item=FS::D::F> { ... }
}
This is good; we’ve scoped the Filesystem parameter over the entire codebase. However… because you can’t use
functions in datatypes even though datatypes are quite functionally modules, you end up having to write things like FSExt::<Ext4>::openFile("name")
. This won’t do at all. Not only that, but there are still long chains of associated type access. What would be more desirable is
type File = FS::F;
type Dir = FS::D;
But this is currently impossible within an impl! It wouldn’t make any sense as an “associated type”, sure, but it would be nice to introduce type synonyms.
There is a much simpler way of doing this.
##3.) Extend the Filesystem trait with a new trait
Some of you reading this might have jumped straight to this. Well, I didn’t.
trait FilesystemExt {
type F: File;
type D: Dir;
pub fn openFile<S: Into<String>>(&self, s: S) -> Self::F { ... }
pub fn closeFile(&self, f: Self::F) { ... }
pub fn ls_all_rec(&self, d: Self::D) -> impl Iter<Item=Self::F>
// Assumes impl trait feature has landed. Someday!
{ ... }
pub fn ls_all(&self, d: Self::D) -> impl Iter<Item=Self::F>
{ ... }
pub fn ls_all_by_name<S: Into<String>>(&self, s: S) ->
impl Iter<Item=Self::F> { ... }
}
impl<T: Filesystem> FilesystemExt for T {
type F = T::D::F;
type D = T::D;
}
I have now extended every type belonging to the family of types Filesystem to have a whole suite of methods. Not only are the methods readable, but they also are not “overly generic”. One of the great tools of expressiveness is that of context: everyone who reads the API of FilesystemExt knows that all the methods are generic over every Filesystem, without having to see the clutter in every function’s signature.
But there are still problems. For one, even though I tried to alleviate the namespacing hell by putting associated types in the trait, I still had to write Self::F, and Self::D. That may or may not be better than FS::D::F.
EDIT: Blanket impls and impl specialization do not work the same way that I thought they did. Besides the minor verbosity of associated traits, this is a fine solution. However, I still think it is inconvenient that you cannot use
the functions in the trait.
##4.) Write a parameterized module We wish to provide a collection of code with a hole the shape of any type which satisfies the Filesystem trait that the client can instantiate appropriately. The pattern remains the same as last time:
mod FSExt(type FS: Filesystem) {
type F = FS::D::F;
type D = FS::D;
pub fn openFile<S: Into<String>>(s: S) -> F { ... }
pub fn closeFile(f: F) { ... }
pub fn ls_all_rec(d: D) -> impl Iter<Item=F>
// Assumes impl trait feature has landed. Someday!
{ ... }
pub fn ls_all(d: D) -> impl Iter<Item=F>
{ ... }
pub fn ls_all_by_name<S: Into<String>>(s: S) ->
impl Iter<Item=F> { ... }
}
use FSExt(::std::fs::Ext4)::*; // Assumes an Ext4 filesystem type. I know this is not
// the way things are currently done in Rust. Again, this is illustration.
ls_all_by_name("CatPictures");
This, to me, seems like a nicer abstraction to use. We have scoped the generic type across the entire code base at the cost of having to instantiate our module explicitly, which is a small price to pay. We have presented a clear and precise API that is not “overly generic”. We have not had to choose between A) making static functions but having to write FSExt::fn_name because traits are not usable namespaces and B) adding &self parameters, but having to pass around “dummy structs” that carry their impl with them. The visible generic functions in this module are the ones that are distinctly generic in a way that is different from being generic over filesystems, such as Into<String>
. Note that what I have done here has both liberated the abstraction from certain arbitrary constraints, while making it more general, all at the same time.
Furthermore, as was mentioned in the bit about blanket impls, I have now restricted this extension. There is only ONE way to create a module that extends the filesystem in this way. This is a function from traits to modules.
Wait! There’s more! If we come up with more extensions for our filesystem utilities in the future, but don’t want to put too many functions in FSExt, we can write yet another parameterize module over this one, so that the modules stack and the parameterization trickles down. This is what I meant by “nesting”.
mod MoreFS(type FS: Filesystem) {
use FSExt(FS)::*;
// More code relying on the concrete parameterization of the FSExt module...
}
Now we have a quite general filesystem utility library that we can put on crates.io. People can choose arbitrary levels of complexity, like stratification. If they need more functionality, they can use MoreFS
, but if they just need the base functionality, they can just import FSExt
. And this nesting can keep going for the client, too. Let’s say our application from example 1 should also be filesystem generic. We can just parameterize over as many things as we want.
mod DisruptIndustry(const L: Language, type FS: Filesystem) {
use MoreFS(FS)::*;
openFile(L.hello);
...
}
#Example 3: Parameterizing over Modules that guarantee an interface It might have struck some of you odd that I had a Filesystem trait at all. That might be common in Java, but I can’t think of any kind of code like that in Rust. Datatypes are not introduced when there is no data to represent. This is in line with Rust’s focus on low level representations.
In fact, it felt very odd while I was writing that, too, but the reason I had to write it that way was because the only way to group code together and conform it to an interface is via a trait. But all traits must be tied to datatypes! I think eliminating the “carrier struct” pattern would benefit clarity of code and also feel more “clean”. The benefits may seem much smaller than the costs, but I have a few points to make in a follow up post that might convince you otherwise.
Admittedly, using dummy structs gives most of the same benefits as parameterizing modules, but I have listed out the usability problems thus far with using that strategy. In a post below, I have listed the kinds of extensions that would be needed for inherent impls on structs to be as usable as parameterized modules. In fact, those extensions are mostly minor and not nearly as serious as a whole module system makeover, so I am seriously considering developing those extensions into an RFC (which I will remind that this is not).
The rest of this is considered separate from the previous paragraphs.
I propose that module “interfaces” be possible to write.
abstract mod FILESYSTEM {
type D: Dir;
fn some_fn(String) -> Dir;
fn some_other_fn(String) -> Dir::F;
}
Then we can statically check modules to see if they conform to such a module signature.
mod Ext4: FILESYSTEM {
type D = Ext4Dir;
pub fn some_fn(String) -> Ext4Dir;
pub fn some_other_fn(String) -> Ext4Dir::F;
}
Here’s an example of a client configurable application framework.
mod framework(const L: Language, type DB: Database, mod FS: FILESYSTEM) { ... }
Maybe you are writing a virtual machine that can talk to multiple different filesystems at once.
mod framework(type C: NetworkConnection, mod Real: FILESYSTEM,
mod Virtual: FILESYSTEM) { ... }
Here’s LLVM itself.
mod LLVM(mod FE: FRONTEND, mod BE: BACKEND) { ... }
And if you so wanted, you can even constrain the type of resultant module.
mod ExtendFS(FS: FILESYSTEM): FILESYSTEM { ... }
My point is only that just like Steve Yegge’s famous decade old article Kingdom of Nouns once talked about, it seems overly constraining to only be able to work effectively with code when it is attached to datatypes. Rust has modules, but if you want to parameterize over code the same way you can with impls and traits, you end up having to resort to funny business with datatypes. This is not an object oriented programming language, so I think it’s most appropriate for Rust to move in this direction.
Anyway who is familiar with OCaML or SML knows that I have basically just proposed functors and signatures. I presented this in the way that I did because I wanted the features to arise naturally from a discussion about code reuse and polymorphism.
I am all out of time, but here’s some food for thought before I write a followup:
-
impls are just modules with a privileged Self type.
-
traits are a way to register a canonical module conforming to a signature to a particular datatype.
-
traits with
n
input parameters just addn
more rows to the registry lookup table for that particular signature. -
For each generic parameter, a bounded parametric function requests a module from the registry authority by supplying a number of types.
-
There is no truly parametric polymorphism that is useful. There is only bounded parametric polymorphism over modules that conform to a signature with at least one element, where all datatypes are modules that conform to the DATA signature: abstract mod DATA { type Self; }
-
Generic datatypes are just parameterized modules that take DATA and produce DATA. I.e., the parameterized modules
mod SomeMod(type T) { … } mod SomeMod(mod T: DATA} { … } are equivalent, as are all of the following:
struct Vec{…} mod Vec(type T): DATA {…} mod Vec(mod T: DATA): DATA {…}
-
The need for type level constants is assuaged by modules parameterized by values.
mod Array(const n: isize): DATA