Libsystem, or, the great libstd refactor

https://github.com/rust-lang/rust/compare/master...arcnmx:libsystem

Overview

Suggestion: skip to the tl;dr at the bottom, but let's start with some assertions...

  • libstd needs some cleaning up. Its implementation of various interfaces is rather adhoc and relies heavily on implementation details. It's written against libc, and does not consider any platforms beyond windows-like and unix-like.
  • liblibc is far from ideal. See RFC #1291
  • Devs want a way to use Rust and its standard library on platforms where full functionality may not be available. See issue #27701
    • Restricting usage to libcore is often far from ideal when interacting with the greater Rust ecosystem. Cargo and crates.io dependencies rarely work well with it. A true facade with independent features/pieces as discussed over there recently would be great.

Why do I care? I simply love the language, and want to use it anywhere I can. Everywhere from desktop applications to mobile to embedded userspace to kernel to microcontrollers to the web. I feel like it's a perfect language choice for anything, but there's so much friction around the cross-compiling and embedded story that it's often a pain to use.

What do we need?

  • Cargo and crates.io integration with less-capable platforms
    • Crates that only require a subset of libstd should be available for use on embedded platforms.
    • Need the ability to specify a sysroot to Cargo and rustc, or otherwise tell rustc where to find libstd and friends.
      • Cargo could use a --sysroot option, and this needs to interact properly with build dependencies, in the same way that --target does.
      • Alternatively it could just build them for you when missing?
  • A convenient way to build a libstd sysroot for a given target, without having to go through the entire rustc build process.
  • Ease porting libstd to a new platform.

Ways to achieve this and goals

  • Clearly define a platform-independent API that provides all functionality libstd needs without exposing any implementation details.
  • Make porting libstd to new platforms straightforward by isolating all functionality that needs to be provided.
    • bare metal, kernels, microcontrollers, etc.
    • static syscall libstd that doesn't depend on any C libraries
  • Cut the circular dependencies between libstd::sys and libstd
  • Remove all #[cfg(target_os..., unix, etc)] branches in libstd.
  • Remove libstd's dependency on liblibc
  • Implement a bind fallback libsystem implementation entirely made of extern fns in order to...
    1. Have a reference implementation that exposes nothing about itself to ensure no implementation details are being relied upon.
    2. Allow libstd to be used raw without any dependency at all, while lazily linking in any functionality that does end up being necessary.
  • Migrate as much as possible away from rust_builtin (low prio, but would be nice)

Unrelated useless things in this branch:

  • Cargo.toml for all of rustc and its stdlib. Not incredibly useful compared to the ongoing cargo-build branch, but I like it as a quick way to build a libstd sysroot and the rustc compiler without having to deal with the multi-stage build system.
  • target_family flexible target configuration - the whole world isn't just unix and windows. This probably belongs as a standalone PR?
  • liblibc changes for a generic target_family. Probably useless, better to just kill liblibc as a dependency from everything except libsystem.

tl;dr what actually needs doing:

  1. Isolate platform dependent behaviour and ensure it only exists in libstd::sys. Fix up all the current leaky abstractions so there's a single point where the system implementation lives.
  2. Expand the binary target_family so libraries may be aware of other platforms.
  3. Remove the circular dependency between libstd and libstd::sys so that it can be pulled out into libsystem.
    • Type implementations that currently need to be shared and thus make this slightly problematic:
      • sync::Once
      • thread_local!()
      • io::Error, a new sys::Error type should be introduced
    • This is mostly a bikeshed argument, it'd be fine to still live in libstd. Pulling it out simply makes it easier to assert that certain invariants aren't exposed or accidentally relied upon, and makes for cleaner and well defined abstractions.
  4. Avoid unnecessary allocations and general cleanup of the system code.
    • Related: RFC #862
  5. Move as much as possible from rust_builtin into native Rust code.

Comments, thoughts, feedback?

A good way to start would be to do a minor refactor and implement 1. from the list above first. Does this require an RFC, or does anyone object to the idea of it? Then 2. would come next, though our current #[cfg(...)] story could use a facelift...

7 Likes

I really don't understand the obsession with using libstd in embedded contexts. It's designed for user-land, and cuts many corners that are completely unacceptable for freestanding environments.

Its implementation of various interfaces is rather adhoc and relies heavily on implementation details.

Can you give concrete examples of these interfaces?

It's written against libc, and does not consider any platforms beyond windows-like and unix-like.

Can you give concrete examples of platforms that we are excluding?

Everywhere from desktop applications to mobile to embedded userspace to kernel to microcontrollers to the web.

I can't imagine sharing almost any code between all of these contexts, any examples?

Crates that only require a subset of libstd should be available for use on embedded platforms.

What subset of libstd that isn't in libcore is viable on embedded platforms?

I hit the wall with std::io::Write not being in libcore. It is sometimes handy to have convenient way to write to same source (like raw pointer) in embedded systems. Unfortunately it is impossible to provide the same solution for std::io::Read for now as it depends on libcollections. That was also mentioned in byteorder #37.

1 Like

[quote="Gankra, post:2, topic:2765, full:true"] I really don't understand the obsession with using libstd in embedded contexts. It's designed for user-land, and cuts many corners that are completely unacceptable for freestanding environments.[/quote] Well I guess my opinion is that it doesn't need to cut corners! It contains plenty of useful items, of which many external crates rely on, that aren't necessarily part of libcore.

[quote="Gankra, post:2, topic:2765, full:true"]

Its implementation of various interfaces is rather adhoc and relies heavily on implementation details.

Can you give concrete examples of these interfaces?[/quote]

  • { grep -r 'cfg(' src/libstd; grep -r 'libc::' src/libstd; } | grep -v sys/
  • io::Error makes many assumptions around errno being a thing and what it looks like. It's unfortunate that it's all stable now and not part of os::ext. I'll probably look to try deprecating those APIs.
  • Pretty much anything in my branch's diff that isn't in libstd::sys

Anything that doesn't fall into those two buckets, various non-posix-and-libc platforms. Often platforms that try (poorly) to offer a posix-like interface but it's less than ideal. Various bare metal embedded contexts such as game handhelds and console SDKs, various DSP and other non-userspace platforms, emscripten and web-based platforms...

I... almost can't understand this sentiment. There are so many useful crates out there that may not depend on external environments such as filesystems, but do useful I/O things, only need networking but not threading, or only require allocations, or may just offer utility traits that don't do very much at all. See the above reply and issue for why the io traits are really nice to have, for example.

See the libcore stabilization tracking issue for discussion around why #[no_std] isn't often ideal:

  • Having to PR to all your upstream crates
  • Having to feature-gate for a few months until it's stable
  • Having to feature-gate anyway because it's only applicable to half the crate
  • Having to ensure all crates that depend on it use default-features = false and have their own core feature flags etc so pulling in a dependency doesn't suddenly nullify the no-core-ness.

Many parts of libstd are applicable in the platforms mentioned in the above reply:

  • Allocations
  • Networking
  • Filesystem Access (often in a limited form, or read-only)

But many aren't necessarily available as well:

  • Environment variables
  • Threading
    • Some platforms will only have cooperative threading, or may offer threading but no pthreads-compatible interface
    • TLS kinda fits in here too
  • Networking (this goes both ways)

Getting off my train now, will see if I missed some points later.

Also, to answer @alexcrichton

I would personally want to discuss some of the high-level goals here as well as the general architecture for the structure of the standard library and crates beneath, I'm sure I'll have opinions! This also touches on other topics like the liblibc stabilization, libcore stabilization, bootstrapping with cargo, etc, that'd be good to make sure we're all aligned on before taking such a large step forward.

My only comment on liblibc stabilization is that it should never be made public, and RFC #1291 sounds good to me. Iterate, refactor, etc. over on crates.io with it. I'm not recommending any changes to it with this refactor.

Cargo bootstrap will be awesome, and I have a number of opinions there. Somewhat irrelevant though, as I certainly don't plan on including my Cargo setup with this refactor. It's simply in the branch because it's useful for development, but certainly not polished or designed well enough to land in tree.

libcore stabilization is probably the most relevant, the discussion about "profiles" and feature sets exposed via std are interesting. In general I'd like to see std usable on platforms that may tend more toward libcore right now, so that crates.io dependencies may work without much friction. That's a whole other discussion though, and doesn't really change my suggestions here. The refactor would certainly make it easier to implement and isolate those functionality profiles, though!

Here's a much pared-down version of the refactor, simply shifting things around so most of the platform code now actually resides under libstd::sys. Circular dependencies weren't really addressed, beyond the sys::error::Error type being used over io::Error. I could leave this out for less diff noise, though it's something that should be in there anyway...

Still a WIP, only Linux is done at this point. I need to go back through and fix up a few issues with the other unix platforms, and then do Windows. Also need to throw rustfmt at it to fix up styling, add copyright headers, etc.

Also, style questions...

  • All these newtypes should probably be standardized to either tuple structs or { inner: T }... Which way should I go with that?
  • There's a lot of map(From::from).map_err(From::from) noise going on here, with conversions between the two error types, and back/forth conversions around Path and PathBuf types. Should this be done some other way?
    • Ok(try!(...).into()) and try!(...); Ok(()) for example?
    • Some magical conv(...) function that just magically does everything for us?
  • Is the sys::etc re-exporting from sys::imp pattern too obnoxious? Too much going on in sys/mod.rs? Should I just flatten it instead?
  • I am really bad at ordering my use lines. Hoping rustfmt can deal with this for me :neutral_face:

Overall is it still too large? Where to go from here wrt discussions/opinions/PRs/RFCs/etc? I'm not sure that it can be toned down much more than this :confused:

Sorry for taking awhile to get back to you, but I'm quite interested to hear more about this! I'm a little surprised that you see the need for such a large refactoring of the standard library, it's been designed quite carefully up to now with many of the use cases you're mentioning in mind, so I'm at least not under the impression that it's due for a refactoring. I totally agree on the sentiment that Rust should run everywhere, however, and would love to ensure that the process is as seamless as possible!

I want to take a second a drill a bit more into some of the pain points you're running into. Organizations like what you're mentioning have been suggested in the past, but in my opinion it largely boils down to shuffling code around and not being a net win in terms of ease of portability, but I could be wrong!

Cargo and crates.io integration with less-capable platforms

I very much want to use Cargo as a build system (as you're aware), but we need to tread carefully here. Right now a stable compiler cannot build the standard library (due to unstable features), and this is not an easy hurdle to overcome. We've got efforts in the works to at least get precompiled std binaries for more platforms than just Linux/Windows/OSX, but I do realize that won't get us quite all the way there.

Other than that, however, this is in my opinion "done" in terms of rustc/cargo implementation work. Cross compilation has always been first-class in both the compiler and Cargo and much of the infrastructure is already there to add support for new platforms.

Clearly define a platform-independent API that provides all functionality libstd needs without exposing any implementation details.

I'm not sure I understand this in terms of an implementation detail goal, isn't std::{io, fs, ...} this exact API? The implementation is free to use whatever it wants, but it seems like perhaps overkill to have yet another layer of abstraction which other platforms can plug into.

I think this plays a lot into the message of "making libstd easier to port", but it's fundamentally just a lot of functionality that needs to get into the right place. The language has all the necessary tools via #[cfg] and the standard library should be quite amenable to adding new platforms as in theory all of the platform-specific meaty code is in std::sys (there's certainly some cleanup that could happen here though!).

I'm not sure it's a necessarily desirable goal to remove all #[cfg] in the standard library, they've all gotta be somewhere right? I would love to minimize them to the absolute smallest set possible, but getting down to 0 seems like it may be overkill.

Migrate as much as possible away from rust_builtin (low prio, but would be nice)

I agree! We've basically already done this except for a few small pieces, and it's just up to anyone to take care of the rest, however :smile:


Overall I think before diving down into more implementation-detail related questions, we may want to refine a criper set of goals for something like this. For example:

  • Why is the current organization insufficient? How would a new structure improve these problems while retaining the same pros of today?
  • What are precisely the problems this is trying to solve? This'll help flesh out precisely where the pain points are today, and perhaps allow us to see some less invasive solutions.
  • etc

This is basically the kind of thought process that goes into writing an RFC, and I definitely think that a major restructuring of the standard library will require an RFC (regardless of it being a private implementation detail). Excited to see progress here though!

Yeah, sorry, I'm pretty terrible at explaining myself. Let's see if I can clear it up a bit...

Right, there's no other work to be done after that. My concern is simply that it's currently difficult to port libstd to other platforms, and only using libcore isn't a nice solution because the story around crates.io and #[no_std] is far from ideal. Stabilizing libcore without using some sort of libstd facade doesn't really solve the underlying issues.

It's really not, as those modules mix the platform abstractions with library helper methods, traits, and other unrelated things. It's useful to define the exact abstraction layer that needs to be implemented by a port, and how.

Making libstd easier to port is my only concrete goal. The rest is all implementation details, and picking the way forward that generates the least churn in libstd is probably for the best. Basically, I want to make sure std::sys is the only part of libstd that contains platform-specific code.

Also for the record I feel the current #[cfg] setup is unsatisfactory for porting to new platforms with the way everything is gated by platform rather than platform capabilities (and they can't really be macro'd well) :stuck_out_tongue:

#[cfg] blocks are strewn throughout the library rather than having platform-specific behaviour be in one nice manageable chunk (std::sys).

Make porting libstd to other platforms easier, and more straightforward, while keeping platforms from polluting various parts of libstd with even more #[cfg] blocks.

I want to move platform-specific behaviour to a confined area (std::sys), that's really it!

2 Likes

I agree that it's not so great today, but with the stabilization of libcore I would expect the experience here to get much better (as an ecosystem around this would actually exist!)

Ok this sounds like a good idea to me! The standard library has been through a lot of iterations and we've never really had a chance once things have settled to take a look at the story of all the littered #[cfg] directives, and it'd be nice to consolidate!

I can, by the way, recommend to walk the walk. I'm a huge proponent of not developing on nightly unless absolute need and propagate that by trying to port everything I see to stable and do pull requests. I can see similar things happening for libs that just require core. (although that is obviously harder, as things like String and Vec are in std - rightfully, but often used and people might be hesistant to code without)

Well this is roughly my final draft of the refactor. Just a ton of moving things around... Is there a nice way to build/test/run this against as many platforms as possible? :confused:

https://github.com/arcnmx/rust/commit/4aefe0d0833e635a53487c19ab34aa585c3b2212

Also, see this commit as an example/outline of what needs to be implemented to port libstd to an entirely new platform...

https://github.com/arcnmx/rust/commit/30d290f6cd456bf64a5be0930463dc7525ab4a90

1 Like

I like some of the ideas here, but there were some things that would help me understand what’s going on:

Is your refactoring 100% backward compatible with the current (at that time) libstd? If not, what are the backward-incompatible differences?

Your patch changes 166 files, +3342/-3030. This is too much for somebody to read at once. Indeed, GitHub won’t even display the whole patch in its web UI. Is there any way that the patch could be broken up into parts?

For example, could we deep-dive into, say, just the part that refactors the implementation of std::thread, to see what kind of new capability we would gain from the refactoring? In particular, you mentioned above the idea of a libc-dependency-free implementation of libstd. What would the libc-dependency-free direct-syscall implementation of std::thread on Linux look like?

If std::thread is too big of a chunk for that kind of exercise, then what’s a smaller thing that we could do it on that would show the benefit of your approach?

It is fully compatible, yes. It's a pure refactor: I simply move things around so that there's a platform-independent API layer (libsystem or libstd::sys) that libstd uses to implement various interfaces.

I'm thinking about trying that if the design is accepted. It's tough though, there tends to be a lot of interdependency between these parts for some reason, so moving them a bit at a time can be awkward.

Well I mean, you can git diff src/libstsd/sys/*/thread* src/libstd/thread/* or whatever. There isn't any new functionality being gained here though. It's just things being moved. The idea being that you can look at sys/thread and see exactly what functions you need to implement in order to port libstd to a given platform.

Basically look at https://github.com/arcnmx/rust/commit/30d290f6cd456bf64a5be0930463dc7525ab4a90#diff-9adecdd6e4eaa5e6e34f584c3f0c9c5dR387

You'd need to fill that in and implement thread::new, Thread::{join, set_name, yield_, sleep}

Awesome.

I'm not a Rust reviewer, but IMO that's probably got the order of operations backwards. The main benefit, as I see it, of the refactoring is that it makes it easier to create alternative implementations of libstd not based on libc, either for a port to a non-unixy platform, or for some other reason. I think that is an important advantage, and it doesn't seem to have negative consequences, but it's not a high priority for others. Regardless, what you proposed seems too big for people to evaluate as one item. Also, I think "don't ask for permission" applies here. So, I recommend just breaking out a few changes into small-enough-to-review pieces and submitting PRs.

Also, people might have read your initial comments as "the structure of libstd is bad and I'm fixing it" whereas it might be better to convey the message "Here's some patches to make porting libstd easier."

As far as I understand your idea, shouldn't most of what was added to liballoc_system/lib.rs be in a separate file, similar to the added bind.rs? Then the added section of liballoc_system/lib.rs would be something like:

#[cfg(target_family = "bind")]
pub use super::bind::{
    allocate,
    deallocate, 
    reallocate,
    reallocate_inplace,
    usable_size
};

And, in fact, that could be encapsulated in a macro that takes the module from which to re-export items as a parameter, I think.

Your patch mixes two ideas:

  1. It should be possible to implement libstd without depending on the libc crate.
  2. The libc-less replacement will be primarily implemented in non-Rust code.

I think it would be useful to split these two ideas, because the late-binding-to-another-library approach isn't the only interesting one. In particular, the invoke-syscall-directly-from-Rust case doesn't require all the late binding machinery.

Also, your change to OsRng hints at another refactoring: If there was a sys::rand::fill_bytes standalone function, then we could share one implementation of OsRng, and each libstd port would only have to implement sys::rand::fill_bytes. In fact, I think that this refactoring should be done separately, probably before the other refactorings.

Anyway, I think that your overall idea is good and I encourage you to move forward with it. I'm happy to look over your pull requests. (Keep in mind I'm not a Rust stdlib reviewer, so I can't approve changes.)

Note that the late-binding commit is not something I plan to ever be included in tree. It was for my own curiosity to see what the absolute minimum is required for porting. I link to it as an example of what Rust prototypes need to be implemented in order to port to a new platform, not as an exemplar that should actually be taken seriously for anything, and certainly not a style guide/suggestion as far as the module and file layout is concerned :stuck_out_tongue:

I guess I can try that... It's just a massive amount of work that I can't reasonably start on again without some indication that it's the right direction to go ahead with.

I think this is a great idea to support increased exposure in the future, e.g. for embedded devices.

1 Like

Alex already said "Ok this sounds like a good idea to me! The standard library has been through a lot of iterations and we've never really had a chance once things have settled to take a look at the story of all the littered #[cfg] directives, and it'd be nice to consolidate!" I don't think you're going to get a much more positive go-ahead than that. Nobody's going to promise that patches gets accepted before they see the actual patches.

I'm looking forward to seeing the PRs if you make them.

2 Likes

Well his more recent comment was pretty dismissive so it seems unlikely that's a good approach to take... I'm not so much looking for approval as much as indication from someone that I'm not doing something very wrong here, but no one seems to care enough to look. As for an RFC... "cut-paste anything in libstd within an adhoc conditional #[config(target_os = ...)] to the sys module" doesn't seem like a very productive one to me so meh.

I don’t think his comment was dismissive of the idea in general. The current form of the patch isn’t really possible to review; a single giant patch with no discussion whatsoever.

This is a big enough change, that affects enough of the internals of the std library, how easy it is to port, and would cause any other WIP branches to require manual merging so would need some timing and coordination to land, that an RFC does seem reasonable. Basically, something discussing why this move is a good idea, what porting use cases it would help out, if there are any alternatives, etc.

In addition to that, once the RFC process is done, I think this would be better done split up into a series of patches; one huge patch makes it hard to break down and verify that each move is sound and doesn’t change anything important; particular concerns might be privacy, error reporting, and the like.

I would not take his response as dismissive; just saying that once seeing the change, it’s clear that it will require a little more discussion before it’s reviewable and mergeable.

Sorry, by that I just meant dismissive of the actual design and layout being proposed here. I simply haven't received any feedback on that whatsoever on any front, which is kind of what I'm looking for... The lack of any discussion or feedback on the changeset itself simply doesn't instill much hope that an RFC would generate much more discussion. I get that it's large but that doesn't preclude being able to see what kind of changes have been made and whether they're a good idea or not.

Agreed.

I guess my view is that it's currently impossibly difficult to port, and libstd changes are far and few enough between right now that it'd be a good time to land. Though mostly I'm just impatient because after the few months it takes an RFC to get through I won't have the time or energy to actually make the changes anymore - I'm currently on a rare break with some actual time to get things done.