Thoughts on Rust stdlib and C interfacing

Hey there!

Just wanted to give my thoughts some space and maybe some of you in the rust community can change my mind or give me some other thought I haven’t considered yet.

I just recently “discovered” Rust and also Go which got me thinking which language to learn first and I have to say at first sight Rust seemed more promising. Apart from its syntax which I regard as sometimes ugly but I think one gets used to that as well.

Then I started looking into the std library and recognized that a lot of features there are missing which I expect a modern systems programming language to have built in. For example: UNIX sockets. I get that Rust wants to be compatible on all platforms, namely Windows and *NIX systems. But why not let the user of the language decide what he wants to use? With some googling I discovered https://github.com/rust-lang-nursery/unix-socket which seems to implement UNIX sockets, fair enough. BUT: as far as I can see, it actually interfaces with libc. Same case for a library I found which handles UNIX signals: it interfaces with C. But why interface with C instead of doing a pure Rust implementation which does not rely on C? After all I want to work with a self-dependent systems programming language. Go on the other hand has a very large std library and in most cases seems to not use the C library for that which I have to say seems more tempting and promising to me. I don’t want to be badmouthing Rust, I actually like it. But this was my train of thought when I looked at both, Go and Rust. And I still haven’t figured out which I should learn because there are a lot language concepts in Rust I really like which Go does not have.

Maybe someone of you can shed some light on the why of the std library’s lacking features and if it is actually necessary to interface that much with C? I mean, I would say anything if you would give me a library with C bindings for GTK, since GTK is written in C, there is no other way. But why resorting to C for such low level things like UNIX sockets? Or signal handling?

All the best, David

The Rust standard library tries to be as cross platform as possible and also tries to be minimal. It doesn't expose all the options of stuff like sockets yet because it takes time and effort to figure out whether std should attempt to offer it in a cross platform manner or with OS specific extension traits, and then design such an API and have it implemented and then eventually stabilized. The user of the language can already do whatever they want. They can call syscalls themselves or link to system libraries and do whatever. Most things don't need special language support and can be developed and used outside the standard library just fine. Plus with Cargo and crates.io, depending on other libraries is fairly trivial, so there is very little push to get new functionality into std.

As for why Rust interfaces with C to do things like sockets, well, what else is it supposed to do? How else do you use sockets? Rust could theoretically use syscalls directly instead of using libc for some things on linux platforms, although there are questions about things like syscall stability and whatnot. On Windows, using syscalls is entirely out of the question as the syscall interface is explicitly unstable and cannot be relied upon. We don't use libc on Windows for things like sockets though, instead we use the various system libraries like ws2_32 and kernel32 to do most things, and they will internally syscall to the kernel as needed. Also many system APIs are implemented entirely in user mode, such as the console API, so we couldn't replace them with syscalls even if we really wanted to.

But really, if you have the choice between a very stable glibc function that does quite a bit of work for you and has been battle tested and optimized for years, and a potentially unstable kernel syscall interface, why not just use the glibc function? Calling C functions from Rust is just as efficient as calling other Rust functions. I know Go does a lot of stuff on linux through syscalls because of the way it handles its green threads and function calls which makes calling external C functions somewhat expensive, but even on Windows Go has to go through the system libraries and not use syscalls.

Just look at what Go has to do, maintaining huge syscall interfaces for every single OS and architecture combination. That's a lot of work for very little gain.

Just because something is not in the standard library does not mean that it never will be, or that it’s not wanted there.

A lot of things were moved out of std when Rust was approaching 1.0, since we were not confident about stabilizing them just yet. Stability as a Deliverable | Rust Blog has more background.

Please excuse me while I rant on the relationship of Rust and C.

It's not smart to avoid C just for the sake of avoiding C. Go has technical reasons to use syscalls directly instead of libc wrappers. Rust does not. Oh, of course Rust needs to work on target platforms that don't have a libc, that's a feature and the motivation for quite a bit of work happening on the core crate. But any target platform that has Unix sockets has a libc, and thus avoiding libc for Unix sockets does not actually get you any portability, nor any technical advantages. It doesn't even really free you from C, since the kernel is C as well. If you want a 100% Rust system, you can do that, starting from core and perhaps an OS written in Rust (such as Redox). Meanwhile, std is designed for mainstream operating systems and that includes libc. Making use of that doesn't mean Rust is reliant on C, it just means Rust is a team player.

There are a couple of OSes where the system call interface, that is, the kernel/libc interface, is unstable/undocumented and the documented interface is libc itself. OS X and Solaris(/Illumos/SmartOS/etc.) are two major examples. Linux’s syscall interface is stable, but often undocumented: someone will submit a patch to the kernel and to glibc to do something together, and document the glibc interface, but not document why glibc is interacting with the kernel in a specific way.

Of course in practice the interface doesn’t change gratuitously, and the source for these libcs and kernels are public. So you can reimplement these functions yourself, which is Go’s approach. There are advantages and disadvantages to that. Here’s a blog post from the founder of Illumos complaining about how Go uses raw syscalls instead of the libc interface, and therefore additional work is needed to make it work on Illumos that wouldn’t be needed if you were targeting the libc interface. The idiomatic Rust code to do the same thing would work on Illumos with no changes.

Similarly, on Linux, a sigaction(2) signal handler usually needs the undocumented sa_restorer field to point to a correct restorer function. Certainly Rust could figure out what it needs to be set to and implement its own version, but libc has already done that work. (Note that this affects any code bypassing the usual libc, whether through an alternate libc, or raw syscalls, or anything else.)

It’s also worth mentioning that Rust code usually doesn’t use libc for stuff that is specific to libc: for instance, Rust doesn’t interact with <stdio.h> buffering at all. For Rust, libc is primarily a stable interface to the kernel.

BTW, for UNIX sockets I’d recommend looking at mio (or perhaps one of the libraries on top of it), which you may want for multiplexed sockets anyway. mio has support for UNIX sockets just like TCP/UDP ones.

1 Like

Unix sockets will be added to the standard library - they just haven’t been yet.

There are some subtleties here: one can get more portability by not being tied to specific versions of the system's glibc, and an easy way to do this is to not use that glibc at all. For instance, the linux binaries Rust distributes are (or were, historically, not sure about the current status) compiled on an old distribution with an old version of glibc, which is designed/hoped to be compatible with most existing linux installations around (glibc is very serious about backwards-compatibility).

However, the best way to handle this isn't necessarily to avoid C, but instead change how one interfaces with the C code, or better, which C code. Statically linking is the general idea, either to glibc (not recommended and somewhat unreliable), or to an alternate libc such as musl which is designed for this purpose. Rust has nascent support for musl, but the ability to use it nicely will build on improved cross-compilation support (it's a new target: x86_64-unknown-linux-musl instead of x86_64-unknown-linux-gnu).

(Avoiding dynamically linking against the system libc is how Go has such a reputation for ease of deployment: any Go linux binary will run on essentially any linux computer, no matter the state of installed native libraries.)

1 Like

I, for one, would love to have the Linux port, at least, changed to do raw syscalls instead of going through (g)libc. On Windows, you can use libc for some things, but you’re often (usually? almost always?) better off using Windows APIs directly. Sure, use libc for Mac OS X and Solaris and whatnot that require it, but otherwise Rust is better off not using it.

Would porting musl to Rust be a (good?) way to do this?

1 Like

It’s a trade-off that should be in the hands of whoever’s building the binary.

  • Your binary will be bigger when it includes syscall wrappers (whether they’re statically linked C code from musl, or implementations provided by the Rust runtime, which may or may not be pure Rust)
  • Your build won’t benefit from improvements to the dynamically linked libc syscall wrappers (glibc may make improvements between backwards compatible releases, e.g. using the VDSO-provided version of gettimeofday, not to mention security fixes)
  • Using Linux syscalls directly means you no longer have to worry about glibc compat (particularly when you want to use new features), but you do have to worry about kernel ABI compat (hopefully not a problem), and wrapper compat / adaptability (you’re leaning on musl or the Rust runtime to do the right thing based on the Linux kernel your software’s running on, ability to use available features correctly, forwards compat after your software is built, etc.)

A good way to keep this choice in the user’s hands is cross-compilation with --target x86_64-unknown-linux-musl as @huon suggests above.

2 Likes

Aside from “why depend on C”, I think there’s a bigger question lurking the original post: Why isn’t Rust a “batteries included” language, like Python or Go, where the standard libraries have pretty much everything one might need for a pretty broad range of hacking?

I’ve found that it’s best to treat crates.io as a first-class part of the language. For example, doing without scoped_threadpool and crossbeam is a pain. I wouldn’t hesitate to throw a libc = "0.2.4" line into my Cargo.toml file.

It takes a long time to build up a high quality standard library, for one. Python has been around for 24 years.

2 Likes

I, for one, would love to have the Linux port, at least, changed to do raw syscalls instead of going through (g)libc.

Yeah, if you wanted to do this I think this would be fine for most functions, though for stuff like sigaction(2) I'd still suggest going through libc. (And I think it'd be fine on OS X and Solaris if you want to put effort into it.)

However, what would the motivation for this change be? I've seen two implied in this thread, "Rust shouldn't depend on C for ideological reasons" and "It should be possible to build Rust binaries that can run on older systems." The first one doesn't seem compelling when the platform is defined in terms of the C ABI, but maybe I'm missing an argument for it.

The second one is a good goal, but statically linking a libc works there, and outsources the work of keeping up with the kernel to a project that's actively spending time on it. You can use musl; I also suspect that you can build a hacked glibc that's suitable for this purpose without too much difficulty (probably just statically link libnss_files and remove the dynamic-library code from nsswitch). It's also worth noting that it's possible to dynamically link glibc while maintaining compatibility with older systems; just don't have any extern symbols newer than the RHEL 5 (or whatever) glibc ABI version. I am actually sort of curious whether this Just Works already, but if not, it should be pretty tractable.

Is there another motivation for avoiding the system call wrappers from libc? Note that there is a good chunk of functionality that is purely from libc that Rust would still use: all the threading and concurrency stuff is built on top of pthread, std::net does hostname resolution, std::env::home_dir uses NSS, std::dynamic_lib wouldn't work at all, backtracing uses dladdr, etc. If you wanted to have a completely freestanding libstd that only makes raw syscalls, you'd need pure-Rust reimplementations of these things. Certainly doable with work, but I'm not sure what the use case would be.

There's also one notable downside to switching to native syscall wrappers: you can no longer easily LD_PRELOAD a Rust application and hook system calls.

2 Likes

Totally.

When I see that a function is "in the standard library", it gives me a warm fuzzy feeling. Trying to analyze that, I think I'm making a bunch of assumptions:

  • Its design was seriously haggled over by experts.

  • It will be available for the foreseeable future.

  • I don't need to make any effort to install it.

  • Using it in my code won't cause portability hassles for other people building my code.

Availability (third and fourth points) is just not as big a deal now as it was in, say, 1990. Requiring someone to have a net connection to do an effortless build (with workarounds for unconnected use), is completely okay. "You already have a local copy!" doesn't justify too many warm fuzzy feelings anymore.

One of crates.io's goals is to keep published versions of crates available in perpetuity. (Or at least I thought so; I couldn't find this in the FAQ...) So that addresses the second point.

So we're left with the first point. I think the "standard library" is really (forgive me) a brand. It's an expectation of quality based on the reputation of the source. For example, In this view, the Boost C++ libraries are essentially a second brand that's managed to establish itself in parallel to (not really in competition with) the C++ standard.

It would be really cool if Rust could separate out the idea of "endorsed by The Good Folks Who Brought You Rust" from "appears under the std:: prefix". The libc crate is already in this state; you can't really use the FFI without it. And if, say, crossbeam is enough of a success that the people with responsibility for std:: feel it merits their endorsement, why shouldn't its author just hand over control to the project, rather than making everyone change their code?

If I could do crates.io searches narrowed by endorsement, with the heavy hitter projects (Rust, Servo, ...) treating endorsement as one of their responsibilities, I would totally do that all the time.

1 Like

It is, yes. This is why it's append-only, and why we don't let you publish crates with non-crates.io dependencies.

We try to do this already with "crates provided under the rust-lang organization".

I, for one, would love to have the Linux port, at least, changed to do raw syscalls instead of going through (g)libc.

Is there another motivation for avoiding the system call wrappers from libc?

When you are building a system where you have to security audit everything, including even the libc, it is nice if most of what libc does is in Rust so that you can take advantage of the safety features of Rust that made one choose it in the first place. glibc is probably OK, but that doesn't help at all on a platform that uses musl libc. And, when we build on a Linux variant where libc isn't glibc, we have to worry about what glibc-specific decisions have been made in the Rust standard library.

Maybe you think the above is overkill, but it is basically what one has to do to use seccomp-bpf most effectively.

Longer term, I am also interested in operating systems in Rust. IMO, the way to get a viable set of Rust operating systems is to code both from the bottom up (kernel and drivers first) and from the top down (userspace libraries and applications first) and have them meet in the middle. I am approaching this from the top-down angle, as it seems there are many others approaching it from the bottom-up angle.

Note that almost all people I've asked "Go or Rust for server-side development" have answered "Go". There are lots of reasons for this, but one is particularly relevant here: They don't have to worry about DLL hell when compiling a Go program and then deploying it on their systems because Go executables are (usually) self-contained. IMO, this is evidence of problems with how these people are doing deployment in the first place, but it was an issue that has been brought up by several people.

Regarding the size issue: IMO, the size tradeoff can be managed by making the Rust standard library available as either a statically-linked or dynamically-linked thing, just like libc is.

Note that there is a good chunk of functionality that is purely from libc that Rust would still use: all the threading and concurrency stuff is built on top of pthread, std::net does hostname resolution, std::env::home_dir uses NSS, std::dynamic_lib wouldn't work at all, backtracing uses dladdr, etc. If you wanted to have a completely freestanding libstd that only makes raw syscalls, you'd need pure-Rust reimplementations of these things. Certainly doable with work, but I'm not sure what the use case would be.

I certainly think all of those things should be done in Rust. But, it is a matter of prioritization and resources. I expect that the core Rust team wouldn't have time to replace libpthread with Rust code. But, would they accept a Rust-coded replacement for libpthread--not one that is just transliterated from C to Rust, but actually optimized for safety using Rust's language and library? Or, is the Rust team committed to the libc approach?

There's also one notable downside to switching to native syscall wrappers: you can no longer easily LD_PRELOAD a Rust application and hook system calls.

There are many ways to do that without LD_PRELOAD.

Anyway, I think it would be helpful for the Rust team to indicate whether they are categorically opposed to the bypass-libc approach--i.e. whether they would accept patches that build a framework for doing that switch and/or otherwise make incremental progress towards that goal.

There are a few questions here.

  1. Why are certain things missing from the stdlib? Usually one of 3 reasons: either it’s unfinished but planned, or the interface is too niche / optional to consider “standard”, or implementing a “nice” rust-y interface will involve enough iterations that the author wants the freedom to release on their own schedule and without the stability-for-all-time guarantees you have to put on stdlib things. In the case of unix sockets, I think it’s just unfinished. But the other reasons emerge now and then.

  2. Why doesn’t rust just call syscalls, why go through C libraries? I initially started rustboot this way but changed course for three reasons: first because of expedience, because we were busy bringing up a language and realized we didn’t really want to write a libc along with it. Second because of portability of users: when someone writes a library to a linux, windows or OSX-specific API, they have (perhaps only through sloth) walled their library off from users on other systems. We wanted to make the most-likely case be that a rust library written by someone on unix would run on windows and vice versa. Finally, some platforms literally only expose or support a C library as their interface; they forbid or refuse to support “random syscalls originating in random applications”.

  3. Why use libc rather than a more appropriate platform C interface (kernel32.dll on windows, say)? Again, partly expedience and partly user-code portability. Though these arguments have weakened over time, especially as the modules involved have been repeatedly expanded / rewritten / restructured. At the time I initially wrote it, rust’s libc interface was structured in a way to get maximum portability / common platform subset with minimal code. Observe the very minimal initial form of it here. That took one person a couple days to generate and was easy to fix bugs in (despite the spec-lawyer-y structure). It covered 3 platforms x 2 architectures with a measly 900 lines of bindings, and let us write programs that worked on all of them, and focus on the language itself.

  4. Why isn’t the rust stdlib huge / more “batteries included”. I initially wanted it to be, but we basically ran into somewhat predictable scaling limits around this approach: the stdlib takes core-team attention, cycles on their build farm, and commitment to long-term API stability. External libraries, by contrast, can iterate at their own pace, can stop getting updated / continuously rebuilt when they’re “done”, can stop being supported when they’re obsolete or obviously no longer a desirable API, etc. So the team took the decision to draw a tighter line than they had originally hoped. I think it was actually a wise decision, though of course I love a good stdlib as much as the next person. There are also plenty of “bad, but standard” stdlibs out there for other languages, that illustrate the scalability and stability problems well.

For those close-to-the-linux users who do not care about the factors that pushed us to using C / libc, there’s an alternative rust stdlib project called lrs that you might enjoy.

5 Likes

This is a bit obscure, isn't it? I'm having a hard time imagining a first-time user searching crates.io, seeing (say) rand, noticing the rust-lang-nursery organization in the GitHub URL, and inferring that the Rust project has made some sort of commitment to the crate.

We have control of the full system, e.g. there can be badges indicating rust-lang crates. However there’s very limited developer time for implementing such things, but the code is of course open: https://github.com/rust-lang/crates.io

Don’t miss the section of the RFC discussing exactly this point: https://github.com/rust-lang/rfcs/blob/master/text/1242-rust-lang-crates.md#advertising

2 Likes