Traits that should be in std, but aren't


#1

A common thread that I see in language design over time is that the “standard library” designers are often forced to revisit hard-coded concepts and then convert them into abstract interfaces in later rewrites or “2.0” versions. A particularly smart approach – used by the Microsoft .NET language team, among others – is to define core interfaces in the standard library early on, even if the implementations of such interfaces are trivial, or only expected to be provided in expansion libraries.

These interfaces then serve as a glue between third-party packages, which is particularly important for Rust, a language geared towards the “crates.io” model of package-based code distribution. Secondly, getting these interfaces correct to begin with can save a lot of rework, avoiding a messy standard library littered with deprecated interfaces that can never be removed.

The most familiar example of one of these “low level” interfaces that Rust does provide in the standard library is probably Iter, which thankfully has been implemented by the Rust team in a forward-thinking way.

However, there are a variety of similar core interfaces that are still missing. These are interfaces that the standard library will have to have sooner or later. These are simply unavoidable, and are required for orthogonal behaviour, or are just commonly needed for practical software of every sort. I see these re-invented a lot, but left out of most “1.0” languages (not just Rust), despite mountains of evidence from other, older languages that this is a mistake.

A list of some proposals (apologies if some of these already exist or have already been proposed):

Observer and Observable: the matching “push” equivalents of the Iter trait. Incredibly important for event-based and asynchronous code. The entire Reactive Extensions library/concept/manifesto is based around this. Note that I’m not proposing that the std library contain Rx, I’m just suggesting that it contain the two traits. (Correct me if I’m wrong, but as far as I know, Rust doesn’t even have the concept of an event at all in the standard library!)

Clock: I don’t mean a timer, or the system clock API, but the abstraction of now() -> Time, even if “Time” is a template parameter, not a concrete type. Why is this important you ask? There is a classic “lessons learnt” talk by John Carmak (of Doom/Quake fame), who made an entire class of bugs become vastly easier to reproduce and debug by coming to the realisation that “get the time” is an interface, and should be treated as such. By replacing all calls to the C++ SDK’s “get system time” with a pluggable interface, he could reproduce replays exactly by replaying the time along with other captured events. He figured this out in something like 1999! He’s not the only one. Microsoft’s Reactive Extensions heavily depends on pluggable time sources for testing, reproducibility, and the like.

Compare with the “time” crate: https://doc.rust-lang.org/time/time/fn.now.html Bzzt! Wrong! This can’t be plugged into a future “Rx” implementation. It’ll have to be wrapped in a Trait. Where does that trait belong? In Rx? No. Time? Not really. It belongs in std, the common glue between non-standard crates.

Future or Task: I know that there are proposals for adding C#-style asynchronous programming to Rust, but it’s worth noting that Microsoft added the Task type to the .NET standard library in version 4.0, but the async keyword was added later in version 4.5. The concept of a “future result” is incredibly generic, and frankly all IO should be rewritten in terms of it, with or without language support for “async”. Sooner or later, this will be the standard. Java added “NIO2”, .NET added async methods to all IO libraries, and then dropped the synchronous versions for Universal App development. There is a reason!

Speaking of asynchronous programming in general, C# is a goldmine for abstract concepts that belong in the standard Rust library. The CancellationSource, ProgressNotification, Scheduler and Dispatcher concepts are core to Task-based and GUI programming on most platforms.

Transaction, Enlistable, and TransactionScope: these are core abstractions over a wide range of APIs, that not only all behave almost identically, but also are often expected to interact. Transactions are the only multi-threading primitive that compose safely, making them an essential abstraction for safe multi-threaded code, not just databases. What is a “transaction”, really? It’s a Trait inheriting from “Drop” that has a single “commit()” function. If it’s dropped before the commit call, it’s rolled back. That’s it. One method!

Lock and/or Monitor: currently, the synchronization primitives are distinct struct types. Many abstractions can be built on top of an abstract concept of a lockable or waitable object, without specifying that it is a Mutex, SpinLock, Semaphore, or even a “Null” lock that does nothing. A brilliant example of this type of library design is the Oswego multi-threading library for Java, which Sun eventually adopted into the standard library. Note the ‘eventually’! Why not start with the elegant design from the beginning?

To add further to the Lock trait example above: currently RwLock has separate methods for acquiring a read or a write lock. The correct abstraction would be to recognise that it is simply a tiered lock with two levels. A RwLock should instead “contain” (or return) two objects that implement “Lock”. This allows better encapsulation in library design – an API could expose just one of the two lock types, and can also abstract away the internal implementation details.

This kind of abstraction allows things like DebugLock, which wraps any Lock, performing tracing and deadlock analysis. The next set of abstractions below also rely on abstract Lock interfaces:

Sink, Source, and Queue: again, examples from the Oswego library, which contained the abstract multi-threading concept of a Queue split into two interfaces. Queues were “built” out of a lock and a container interface, allowing a single class to implement everything from a “synchronous slot” to a priority queue based on circular buffers.


#2

“get the time” is an interface, and should be treated as such.

I totally agree.

A particularly smart approach – used by the Microsoft .NET language team, among others – is to define core interfaces in the standard library early on, even if the implementations of such interfaces are trivial, or only expected to be provided in expansion libraries.

That’s funny you’re mentioning interfaces in C#. I was recently grumbling about the IReadOnly family of interfaces that were introduced too late (with the unfortunate consequence for example that ICollection does not extend IReadOnlyCollection for backward compatibility reasons). This coincides with your general remark that early introduction of very general-purpose interfaces is better. But this specific example also reminds me I have been surprised by the fact std contains few traits for collections. IntoIterator is nice but more specific sub-traits may also be required. A collection is something that can be iterated but also something that has a size (this is precisely the difference between IEnumerable and IReadOnlyCollection in C#). Most of the collections also provide methods to add, remove or to test the presence of an element. If you want to write generic methods on collections that are independent of the underlying data structure or to create your own type of collection consistent with the existing ones this kind of interface is very useful. Collections are such a core concept for the standard lib of a language that I would expect a bunch of generic abstractions provided by default.


#3

I think it is the intent that we eventually have collection traits (I am experimenting with some at https://github.com/apasel422/eclectic), but they would be more powerful and ergonomic with a few future language features, including higher-kinded types, associated lifetimes, abstract return types, specialization, and by-value self, should these be accepted for inclusion in Rust.


#4

We had several in the Long Ago, but they were removed as part of collections reform in anticipation of better expressing such interfaces with HKT (which we do not have yet, but most seem to assume we will have it one day).


#5

I’ve had this open in a tab for a while because I really like the idea. One of my first toying around things with traits was aiming to define traits for clocks for exactly the reasons mentioned. Some notes on a couple of the ideas:

Clock: strong agreement from me. Benefits: injectable clocks, mockable time. Injecting clocks could allow, eg, an event loop to provide its clock to the event handler. This would let stuff running on the event loop get the updated-per-tick time, and that’s probably sufficient in a lot of cases. Mockable time is just a great way to improve tests that are sensitive to time, so a big ++ here.

Future or Task: we really need something like this, but I don’t think an answer has emerged that’s strong enough for std yet. I definitely agree with C# as an inspiration. There’s also a great post by Joe Duffy from the Midro research project at MSFT: Asyncrhonous everything. Go also provides some ideas worth investigating with x/net/context, which had a good intro blog post. It’s another take on propagating cancellation and deadlines.


#6

While the Iterator (and IntoIterator) trait is great, and there are a good number of other traits in std (notably From, Into, Deref, AsRef, etc.),I suspect that something like Java’s Spliterator should make its way into std at some point. Also something like Cursor which is not quite an iterator, but very similar – allowing Rust to iterate with a for loop over those would be awesome.


#7

You can iterate over cursors with their Read methods bytes() and chars(); I think the only reason an IntoIterator implementation isn’t provided for Cursor<T: AsRef<[u8]>> is that its not obvious that Rust should prefer iterating over bytes or code points.


#8

Wow, I thought my suggestion was dead in the water, so I’m pleasantly surprised someone is still interested!

My goal with the original post was to spark a debate on what those interfaces should look like. I think I should have included some sample primitive attempts, with all that warts one would expected from a beginner Rust user – that might have triggered a more vigorous debate from the experienced Rust hackers.

Since my post, I’ve put some thought into it, and some of these interfaces are definitely non-trivial, with many alternative options. However, I think they’re unavoidable, and hence the complexity has to be tackled. Leaving it for later will simply cause a bigger mess.

Look at Java’s Transaction interface: it’s surprisingly rich! Trait methods like commit() are obvious, but what about enlist()? Something like checkpoint() is reasonably useful in some long-running scenarios, but makes little sense for a systems language, and is rarely used in general. Should we leave it out of the std library, or simply add a derived EnlistableTransaction trait?

Ideally, something like Transaction should encompass everything from Intel’s TSX instructions all the way up to a distributed transaction involving databases and queues. This is what I’d like to see in a systems language: interaction between the lowest-level programming, so that memory structures are rolled back if a database transaction fails. That would be very powerful, and possibly a unique feature. I can’t think of any other language with a native memory transaction feature available as of early 2016…


#9

I would personally expect something like this to be developed in a crate outside of std initially. (As in, I kind of think "traits for std" is a bit of a distraction at such a nascent state. I’d like to see them being used in the ecosystem first.)


#10

That’s entirely contradictory to my whole point! The “ecosystem” would result in Atomic, Tran, Transaction, SqlTransaction, and a host of other vaguely similar traits and/or structs that are mutually incompatible.

Instead, I propose that they all derive from a common trait in std, and then they would be compatible, while still leaving the ecosystem free to experiment with various implementations. Generic database code wouldn’t have to be rewritten to support both Postgres and SQL Server. It would be possible to cleanly compose transactions across wildly unrelated APIs, such as transactional registry operations in Windows and databases.

Having a Future (or Task) trait in the std library is even more critical. Callbacks are not good enough! In the absence of native async support everyone will “roll their own”, leading to spaghetti code. No – worse – incompatible, asynchronous, and potentially multi-threaded spaghetti code. Ugh…

I particularly like kamalmarhubi’s link to the “Asynchronous Everything” blog post, which makes a strong point for integrating features like async at the compiler level:

The model we built was one where asynchronous activities ran on linked stacks. These links could start as small as 128 bytes and grow as needed. After much experimentation, we landed on a model where link sizes doubled each time; so, the first link would be 128b, then 256b, …, on up to 8k chunks. Implementing this required deep compiler support. As did getting it to perform well. The compiler knew to hoist link checks, especially out of loops, and probe for larger amounts when it could predict the size of stack frames (accounting for inlining).


#11

@peter_bertok The rationale for letting it bake in the ecosystem first is that we are probably not going to get it right the first time, and you only have one shot at it when adding it to the standard library.


#12

I think what @burntsushi is suggesting is start with the traits in the ecosystem to figure out what they should look like. For Clock, it’s not too hard to imagine what’s most ergonomic. For some of the others, it’s less clear. Rust’s type system is different enough that transplanting interfaces probably won’t be the best that can be done. And std should have a high bar. Once a good interface is figure out, it can go through the RFC process for inclusion in std.

I agree, but again I think that the design will need to be iterated on. The easy way to do that is to put some ideas out there and see what they feel like. And as a counterpoint to having Future or Task as a primitive, you may want to check out David Beazley’s curio which implements async purely on top of await as added in Python 3.5. The built-in asyncio module has the Future as its basic type.