Imaginary Rust with open types
To make things a little clearer and easier, let’s invent some syntax (this should not be understood to be a proposal for syntax additions, or how it should be implemented, etc., it’s just pedagogical) for open types in rust, and then proceed to the 3 points above, and why I think it’ll help (I’ll repeat each of the points later)
I’m not going to get into generics, or lifetimes, or anything else that might interact with the base definition, or how you’d extend it with that (these are all really important considerations for a serious proposal of course); but for demonstration purposes, let’s just assume the open type is enum
-like and must be extended with elements that you would normally be allowed to add to an enum
, with restrictions that they’re “simple”.
Now, let’s suppose the module core::exn
defines:
pub enum Exn = ..
This, let us say, will declare the type/enum/variant Exn
to be open; that is, other downstream clients can extend that type with their own, extra variants.
To do so, we might write:
use core::exn::Exn;
pub enum Exn := CustomError(String)
At compile time now, if only core::exn
and the above module extending Exn
with CustomError
are present (and there are legimate concerns how this takes place, again, not my target right now), the enum Exn
will look something like:
pub enum Exn {
CustomError(String)
}
with the caveat that anyone pattern matching on a binding with x: Exn
must include a catch all, as the type can always have a variant you didn’t include (it’s open, and someone else can always extend it to a type you now aren’t including in your match, if you don’t have a catch all).
So if another crate foo
extends Exn
:
use core::exn::Exn;
pub enum Exn := BadFoo{ msg: &'static str, foo_size: usize }
and with the above example, our Exn
now perhaps looks like:
pub enum Exn {
CustomError(String),
BadFoo{ msg: &'static str, foo_size: usize }
}
I’m not going to get into what happens if two different crates define the exact same enum variant (this is problematic, because two separate crates could define the same Exn
variant (e.g., the type and constructor matches), and you are unlucky enough to invoke functions from both in your function which returns a Exn
, and they both happen to return that particular variant with populated values)
One last thing: because the original motivating factor for this was to use as error types, we (or the module core::result
) will define a final, regular convience type for error-returning functions (like many crates all already do, except it is fixed on the right hand side), called EResult
(to avoid confusion, so you don’t think I’m talking about regular Result
):
use core::exn::Exn;
pub enum EResult<T> {
Ok(T),
Err(Exn)
}
Ok with that out of the way, I’ll move onto the motivating use case for why I thought this would be fun/interesting/nice, other concerns aside
Motivating Use Case
I’m a new user; I’m told to use the std libraries Result
and Error
types to return errors, and I start playing around. I first encounter this when doing IO
, because read_all
returns an error.
So I write something like this:
pub fn eat_bytes() -> io::Result<Vec<u8>> {
let f = File::open(Path::new("foo.txt"))?;
let mut bytes = Vec::new();
f.read_all(&mut bytes)?;
Ok(bytes)
}
pub fn run() -> io::Result<()> {
let bytes = eat_bytes()?;
println!("wow, bytes: {:?}", bytes);
Ok(())
}
pub fn main() {
match run() {
Ok(()) => (),
Err(e) => { println!("Err: {:?}", e); ::std::process::exit(1); }
}
}
This is really cool; the annoying walls of pattern matching is gone, replaced with ?
and my handler which calls eat_bytes
and I can pattern match on the kind of error that it returned, and etc.
I then start to add more functionality, and realize I want to add logging; I hear about env_logger
, read all about how to use it, then modify my code:
pub fn eat_bytes() -> io::Result<Vec<u8>> {
let f = File::open(Path::new("foo.txt"))?;
let mut bytes = Vec::new();
f.read_all(&mut bytes)?;
Ok(bytes)
}
pub fn run() -> io::Result<()> {
env_logger::init()?;
let bytes = eat_bytes()?;
info!("wow, bytes: {:?}", bytes);
Ok(())
}
Now, suddenly, my code won’t compile. I do some research, and learn that env_logger::init()
returns Result<(), SetLoggerError>
, and also discover at the same time io::Result
is a type definition for result::Result<T, Error>
, and Error
is it’s own type.
In my experience, several people do different things here, and all are legimate, because honestly, they just want to start coding again!
One thing that people do is that they learn about io::Error
, study it, understand it’s api, and then manually map other errors into this error they understand, using the ErrorKind
and returning a custom message when a foreign error is encountered.
Another thing people do is remove the IO error result type, and replace it with Result<(), String>
, and manually map other errors into string, by either pattern matching, or other fancy methods.
Some dig deeper, and learn about rust “best practice” error handling, and read the chapter, understand how errors are enums, and can work together when you implement From
, or they find error-chain, and with some fidgeting and largely through black magic macros at the top of their crate and elsewhere, they create their own custom type, and add IO, and any other encountered as the link or foreign_link.
By far however though, is that people forego ?
or try!
and unwrap
instead. Once they’ve studied a bit more (sometimes months later), and maybe finally get to that chapter on error handling in the book and wrap their heads around it, they write their own custom error type, implement From
for the errors they encounter, and maintain their own Result<T>
; only to have to go back through their code and remove all the unwraps they added while developing and learning.
1. Unified Error Type
Let’s now return to this first time user, but in a parallel universe where open types are used by prominent crate vendors and the std lib.
It now looks like this:
pub fn eat_bytes() -> EResult<Vec<u8>> {
let f = File::open(Path::new("foo.txt"))?;
let mut bytes = Vec::new();
f.read_all(&mut bytes)?;
Ok(bytes)
}
pub fn run() -> EResult<()> {
let bytes = eat_bytes()?;
println!("wow, bytes: {:?}", bytes);
Ok(())
}
pub fn main() {
match run() {
Ok(()) => (),
Err(e) => { println!("Err: {:?}", e); ::std::process::exit(1); }
}
}
when this user realizes they want logging, and write:
pub fn eat_bytes() -> EResult<Vec<u8>> {
let f = File::open(Path::new("foo.txt"))?;
let mut bytes = Vec::new();
f.read_all(&mut bytes)?;
Ok(bytes)
}
pub fn run() -> EResult<()> {
env_logger::init()?;
let bytes = eat_bytes()?;
info!("wow, bytes: {:?}", bytes);
Ok(())
}
It compiles without a hitch, and they’re on their merry way, ready to frob some more bits, because both std::io::Error
and log::SetLoggerError
extend Exn
, and therefore the type signature, () -> EResult<()>
is still valid after the addition of env_logger::init()
, because as we saw above, EResult
includes the Exn
as it’s “right hand” Error variant. In our syntax, the extensions are as simple as:
use core::exn::Exn;
pub struct Error {
repr: Repr,
}
enum Repr {
Os(i32),
Simple(ErrorKind),
Custom(Box<Custom>),
}
pub enum Exn := IOError(Error)
and
use core::exn::Exn;
pub enum Exn := SetLoggerError(())
2. Mental Overhead Reduction, Developer Flow, and the Bureaucracy of Errors in Rust
Bureaucratic is the best way for me to describe the bookkeeping, initial setup (when starting a new lib/binary), and further adjustment when new error types enter the fray in Rust.
Jean Yves Girard (the founder/inventor/discoverer of linear types, whooo!) referred to the “Inessential differences in proofs as the bureaucracy of syntax”, and I think this perfectly captures my feelings when I have to change do_something()
to Ok(do_something()?)
or adding new error types to my From
impl, or error macro in the examples below. I know why I have to do it, but it’s inessential to the task in large, and this in my opinion has an unmeasured mental toll on me, the developer. I often find myself procrastinating doing proper error types in an initial lib or binary, because it distracts and delays me from the fun things.
Anway, let’s consider one of the cases where you the new user, got motivated enough and:
- studied and learned in depth about rust errors
- wrote your own error type by hand (or used error-chain)
Your flow got broken, but you feel better for it; you’re now back to programming. You realize that you need to add another function, frobulate
, but it too returns a custom error. Now, either through helpful compiler messages, or on the docs for the crate, you add the From
impl, or the type name to the links
or foreign
links section to teach your custom error how to understand it, and thereby preserve your return type.
What happens if we had open types and frobulate
also extended Exn
? It’s the exact same scenario as above in 1 - your code does not change at all, except for the addition of frobulate
, you leave signatures untouched, and are unmolested by the compiler, and you continue on in your zen state.
You never got distracted by adjusting your error type to understand another error type, it doesn’t need to, because when you learned about rust using open types to model it’s error in rust, you just added this one single line to start working with your own custom error type, and you did not have to adjust any of your original function types returns:
use core::exn::Exn;
pub enum Exn := MyError(String)
pub fn eat_bytes() -> EResult<Vec<u8>> {
let f = File::open(Path::new("foo.txt"))?;
let mut bytes = Vec::new();
f.read_all(&mut bytes)?;
let is_frobulant = frobulate(&bytes)?;
if !is_frobulant { Err(MyError("Is not frobulant enough".to_owned())) }
else { Ok(bytes) }
}
pub fn run() -> EResult<()> {
env_logger::init()?;
let bytes = eat_bytes()?;
info!("wow, bytes: {:?}", bytes);
Ok(())
}
Basically you’re already primed and ready to go to write custom errors, AND integrate other crate’s errors into your “own” error type. You don’t need to stop and think about the crate’s error type, open your error.rs file, add the type in the links section or add a From
impl (and then in the match for Display
, etc.); you just don’t need to really do much at all - except concentrate on the programming task at hand, and leave all the syntactic bureaucracy where it belongs, on planet Vogsphere.
3 Non-proliferation
Lastly, and perhaps not as important, but another benefit of the effect of an open type is that adding new Error
types isn’t that big of a deal. You perhaps might think that it actually isn’t that big of a deal.
This is straight from the amazing walkdir
's error source code:
/// An error produced by recursively walking a directory.
///
/// This error type is a light wrapper around `std::io::Error`. In particular,
/// it adds the following information:
///
/// * The depth at which the error occurred in the file tree, relative to the
/// root.
/// * The path, if any, associated with the IO error.
/// * An indication that a loop occurred when following symbolic links. In this
/// case, there is no underlying IO error.
///
/// To maintain good ergnomics, this type has a
/// `impl From<Error> for std::io::Error` defined so that you may use an
/// `io::Result` with methods in this crate if you don't care about accessing
/// the underlying error data in a structured form.
#[derive(Debug)]
pub struct Error {
depth: usize,
inner: ErrorInner,
}
“To maintain good ergonomics” They’re being nice! They implemented From<Error> for ::std::io::Error
for you, so you don’t have to add it to your foreign link, nor change your type signature if you started off using std::io::Result
because you read the examples and didn’t know any better.
This entire comment, and rationale, wouldn’t even have to exist if there was an open type, Exn
, which they could extend; they wouldn’t be proliferating, because there’s nothing to proliferate; there’s nothing to make ergonomic, because it would already be ergonomic.
A final point I’d like to make is that I now have a crate, scroll, which another crate goblin uses; another project, panopticon, uses goblin.
Scroll has a custom error; goblin has a custom error; panopticon has a custom error. Goblin From
impls Scroll and IO
; panopticon impls IO
and a bunch of others.
Now, if panopticon decides to switch to scroll because its awesome :D, it will now impl From<scroll::Error>
for it’s error’s as well, making two crates it uses, at different levels, all have enums, which wrap at various levels the same errors. I.e., if you wanted to match you could literally have:
match err {
Error::Goblin::Scroll(err) => ..
Error::Goblin::IO(err) => ..
Error::Scroll::IO::(err) => ..
Error::IO(err) => ..
Error::Scroll::(err) => ..
}
This just fundamentally smacks of bad design to me, and it will just get worse, the more and more crates depend on each other, and the more they impl each other’s errors in order for better ergonomics, etc. This is the non-proliferation treaty that open types would enable us all to sign.
To be clear, I don’t think this is anyone’s fault; it’s just how things ended up, which is fine. And I am optimistic we’ll come up with a good, robust, forward looking solution for it. I don’t know if it’s open types, or some variant thereof, but I hope whatever it is, and however it’s implemented, that for me, most importantly, it can address the 3 issues I’ve talked about here in as robust and satisfying a manner as I think open types have in this minor, pre pre RFC.
Thanks folks for your time, and I’ll see you at the after party.
~m4b