Thoughts on Rust stdlib and C interfacing

Would porting musl to Rust be a (good?) way to do this?

1 Like

It’s a trade-off that should be in the hands of whoever’s building the binary.

  • Your binary will be bigger when it includes syscall wrappers (whether they’re statically linked C code from musl, or implementations provided by the Rust runtime, which may or may not be pure Rust)
  • Your build won’t benefit from improvements to the dynamically linked libc syscall wrappers (glibc may make improvements between backwards compatible releases, e.g. using the VDSO-provided version of gettimeofday, not to mention security fixes)
  • Using Linux syscalls directly means you no longer have to worry about glibc compat (particularly when you want to use new features), but you do have to worry about kernel ABI compat (hopefully not a problem), and wrapper compat / adaptability (you’re leaning on musl or the Rust runtime to do the right thing based on the Linux kernel your software’s running on, ability to use available features correctly, forwards compat after your software is built, etc.)

A good way to keep this choice in the user’s hands is cross-compilation with --target x86_64-unknown-linux-musl as @huon suggests above.

2 Likes

Aside from “why depend on C”, I think there’s a bigger question lurking the original post: Why isn’t Rust a “batteries included” language, like Python or Go, where the standard libraries have pretty much everything one might need for a pretty broad range of hacking?

I’ve found that it’s best to treat crates.io as a first-class part of the language. For example, doing without scoped_threadpool and crossbeam is a pain. I wouldn’t hesitate to throw a libc = "0.2.4" line into my Cargo.toml file.

It takes a long time to build up a high quality standard library, for one. Python has been around for 24 years.

2 Likes

I, for one, would love to have the Linux port, at least, changed to do raw syscalls instead of going through (g)libc.

Yeah, if you wanted to do this I think this would be fine for most functions, though for stuff like sigaction(2) I’d still suggest going through libc. (And I think it’d be fine on OS X and Solaris if you want to put effort into it.)

However, what would the motivation for this change be? I’ve seen two implied in this thread, “Rust shouldn’t depend on C for ideological reasons” and “It should be possible to build Rust binaries that can run on older systems.” The first one doesn’t seem compelling when the platform is defined in terms of the C ABI, but maybe I’m missing an argument for it.

The second one is a good goal, but statically linking a libc works there, and outsources the work of keeping up with the kernel to a project that’s actively spending time on it. You can use musl; I also suspect that you can build a hacked glibc that’s suitable for this purpose without too much difficulty (probably just statically link libnss_files and remove the dynamic-library code from nsswitch). It’s also worth noting that it’s possible to dynamically link glibc while maintaining compatibility with older systems; just don’t have any extern symbols newer than the RHEL 5 (or whatever) glibc ABI version. I am actually sort of curious whether this Just Works already, but if not, it should be pretty tractable.

Is there another motivation for avoiding the system call wrappers from libc? Note that there is a good chunk of functionality that is purely from libc that Rust would still use: all the threading and concurrency stuff is built on top of pthread, std::net does hostname resolution, std::env::home_dir uses NSS, std::dynamic_lib wouldn’t work at all, backtracing uses dladdr, etc. If you wanted to have a completely freestanding libstd that only makes raw syscalls, you’d need pure-Rust reimplementations of these things. Certainly doable with work, but I’m not sure what the use case would be.

There’s also one notable downside to switching to native syscall wrappers: you can no longer easily LD_PRELOAD a Rust application and hook system calls.

2 Likes

Totally.

When I see that a function is “in the standard library”, it gives me a warm fuzzy feeling. Trying to analyze that, I think I’m making a bunch of assumptions:

  • Its design was seriously haggled over by experts.

  • It will be available for the foreseeable future.

  • I don’t need to make any effort to install it.

  • Using it in my code won’t cause portability hassles for other people building my code.

Availability (third and fourth points) is just not as big a deal now as it was in, say, 1990. Requiring someone to have a net connection to do an effortless build (with workarounds for unconnected use), is completely okay. “You already have a local copy!” doesn’t justify too many warm fuzzy feelings anymore.

One of crates.io’s goals is to keep published versions of crates available in perpetuity. (Or at least I thought so; I couldn’t find this in the FAQ…) So that addresses the second point.

So we’re left with the first point. I think the “standard library” is really (forgive me) a brand. It’s an expectation of quality based on the reputation of the source. For example, In this view, the Boost C++ libraries are essentially a second brand that’s managed to establish itself in parallel to (not really in competition with) the C++ standard.

It would be really cool if Rust could separate out the idea of “endorsed by The Good Folks Who Brought You Rust” from “appears under the std:: prefix”. The libc crate is already in this state; you can’t really use the FFI without it. And if, say, crossbeam is enough of a success that the people with responsibility for std:: feel it merits their endorsement, why shouldn’t its author just hand over control to the project, rather than making everyone change their code?

If I could do crates.io searches narrowed by endorsement, with the heavy hitter projects (Rust, Servo, …) treating endorsement as one of their responsibilities, I would totally do that all the time.

1 Like

It is, yes. This is why it’s append-only, and why we don’t let you publish crates with non-crates.io dependencies.

We try to do this already with “crates provided under the rust-lang organization”.

I, for one, would love to have the Linux port, at least, changed to do raw syscalls instead of going through (g)libc.

Is there another motivation for avoiding the system call wrappers from libc?

When you are building a system where you have to security audit everything, including even the libc, it is nice if most of what libc does is in Rust so that you can take advantage of the safety features of Rust that made one choose it in the first place. glibc is probably OK, but that doesn’t help at all on a platform that uses musl libc. And, when we build on a Linux variant where libc isn’t glibc, we have to worry about what glibc-specific decisions have been made in the Rust standard library.

Maybe you think the above is overkill, but it is basically what one has to do to use seccomp-bpf most effectively.

Longer term, I am also interested in operating systems in Rust. IMO, the way to get a viable set of Rust operating systems is to code both from the bottom up (kernel and drivers first) and from the top down (userspace libraries and applications first) and have them meet in the middle. I am approaching this from the top-down angle, as it seems there are many others approaching it from the bottom-up angle.

Note that almost all people I’ve asked “Go or Rust for server-side development” have answered “Go”. There are lots of reasons for this, but one is particularly relevant here: They don’t have to worry about DLL hell when compiling a Go program and then deploying it on their systems because Go executables are (usually) self-contained. IMO, this is evidence of problems with how these people are doing deployment in the first place, but it was an issue that has been brought up by several people.

Regarding the size issue: IMO, the size tradeoff can be managed by making the Rust standard library available as either a statically-linked or dynamically-linked thing, just like libc is.

Note that there is a good chunk of functionality that is purely from libc that Rust would still use: all the threading and concurrency stuff is built on top of pthread, std::net does hostname resolution, std::env::home_dir uses NSS, std::dynamic_lib wouldn’t work at all, backtracing uses dladdr, etc. If you wanted to have a completely freestanding libstd that only makes raw syscalls, you’d need pure-Rust reimplementations of these things. Certainly doable with work, but I’m not sure what the use case would be.

I certainly think all of those things should be done in Rust. But, it is a matter of prioritization and resources. I expect that the core Rust team wouldn’t have time to replace libpthread with Rust code. But, would they accept a Rust-coded replacement for libpthread–not one that is just transliterated from C to Rust, but actually optimized for safety using Rust’s language and library? Or, is the Rust team committed to the libc approach?

There’s also one notable downside to switching to native syscall wrappers: you can no longer easily LD_PRELOAD a Rust application and hook system calls.

There are many ways to do that without LD_PRELOAD.

Anyway, I think it would be helpful for the Rust team to indicate whether they are categorically opposed to the bypass-libc approach–i.e. whether they would accept patches that build a framework for doing that switch and/or otherwise make incremental progress towards that goal.

There are a few questions here.

  1. Why are certain things missing from the stdlib? Usually one of 3 reasons: either it’s unfinished but planned, or the interface is too niche / optional to consider “standard”, or implementing a “nice” rust-y interface will involve enough iterations that the author wants the freedom to release on their own schedule and without the stability-for-all-time guarantees you have to put on stdlib things. In the case of unix sockets, I think it’s just unfinished. But the other reasons emerge now and then.

  2. Why doesn’t rust just call syscalls, why go through C libraries? I initially started rustboot this way but changed course for three reasons: first because of expedience, because we were busy bringing up a language and realized we didn’t really want to write a libc along with it. Second because of portability of users: when someone writes a library to a linux, windows or OSX-specific API, they have (perhaps only through sloth) walled their library off from users on other systems. We wanted to make the most-likely case be that a rust library written by someone on unix would run on windows and vice versa. Finally, some platforms literally only expose or support a C library as their interface; they forbid or refuse to support “random syscalls originating in random applications”.

  3. Why use libc rather than a more appropriate platform C interface (kernel32.dll on windows, say)? Again, partly expedience and partly user-code portability. Though these arguments have weakened over time, especially as the modules involved have been repeatedly expanded / rewritten / restructured. At the time I initially wrote it, rust’s libc interface was structured in a way to get maximum portability / common platform subset with minimal code. Observe the very minimal initial form of it here. That took one person a couple days to generate and was easy to fix bugs in (despite the spec-lawyer-y structure). It covered 3 platforms x 2 architectures with a measly 900 lines of bindings, and let us write programs that worked on all of them, and focus on the language itself.

  4. Why isn’t the rust stdlib huge / more “batteries included”. I initially wanted it to be, but we basically ran into somewhat predictable scaling limits around this approach: the stdlib takes core-team attention, cycles on their build farm, and commitment to long-term API stability. External libraries, by contrast, can iterate at their own pace, can stop getting updated / continuously rebuilt when they’re “done”, can stop being supported when they’re obsolete or obviously no longer a desirable API, etc. So the team took the decision to draw a tighter line than they had originally hoped. I think it was actually a wise decision, though of course I love a good stdlib as much as the next person. There are also plenty of “bad, but standard” stdlibs out there for other languages, that illustrate the scalability and stability problems well.

For those close-to-the-linux users who do not care about the factors that pushed us to using C / libc, there’s an alternative rust stdlib project called lrs that you might enjoy.

5 Likes

This is a bit obscure, isn’t it? I’m having a hard time imagining a first-time user searching crates.io, seeing (say) rand, noticing the rust-lang-nursery organization in the GitHub URL, and inferring that the Rust project has made some sort of commitment to the crate.

We have control of the full system, e.g. there can be badges indicating rust-lang crates. However there’s very limited developer time for implementing such things, but the code is of course open: https://github.com/rust-lang/crates.io

Don’t miss the section of the RFC discussing exactly this point: https://github.com/rust-lang/rfcs/blob/master/text/1242-rust-lang-crates.md#advertising

2 Likes

Yes, exactly. Most of the things that lrs is doing are very good. The problem is that having two standard libraries (three, actually, because Rust also has the core subset of std) is bad. Also, I get the impression that there are some uncomfortable politics between the Rust team and the lrs team, but maybe I’m reading too much into it. Regardless, I hope that Rust adopts the approach to syscalls and avoiding libc on Linux that lrs does.

Maybe you think the above is overkill, but it is basically what one has to do to use seccomp-bpf most effectively.

seccomp-bpf is a very good point I hadn’t thought of! I guess this means you need to know exactly what the system calls are, and you don’t want e.g. open being rewritten to openat(...AT_FDCWD) behind your back.

Incidentally, Issue 24975, Audit and document the system calls made for each “primitive” IO API, sounds relevant to this discussion.

Note that almost all people I’ve asked “Go or Rust for server-side development” have answered “Go”. There are lots of reasons for this, but one is particularly relevant here: They don’t have to worry about DLL hell when compiling a Go program and then deploying it on their systems because Go executables are (usually) self-contained.

But so are Rust executables! Rust-language dependencies are statically linked; it’s only libc, plus any other third-party crates that are dynamically linking third-party C libraries. If you’re running pure-Rust code, you shouldn’t have DLL hell per se. The only thing they depend on is the platform (kernel + libc).

If the problem is Rust binaries that are depending on a newer version of glibc (via symbol versioning), it seems like it should be doable to get rustc to build things that only require an older glibc. At least, this is a relatively smalla mount of work compared to the work of implementing an entire libc. I’m assuming that the target systems do have a glibc, just an older one (e.g., RHEL 5 has glibc 2.5), but if there’s a use case for systems with no glibc installed (empty chroots/containers? Android?) then this wouldn’t work.

Go executables also link against glibc if you use networking stuff IIRC.

I have a contrary opinion: I’d prefer Rust to use as much libc as possible to minimize overhead of its own stdlib (edit: on OS X and Linux specifically).

I write Rust programs that depend on many system libraries (which are in C), and I write Rust plugins/libraries for use in other C/C++/ObjC programs, so for all my purposes libc is already present in these programs, and it’s “free”.

I don’t see a point in removing libc as a dependency until Rust ecosystem is so large that it’s feasible to write a complex program using only pure Rust dependencies so that Rust stdlib is the only thing needing libc (and it might never be possible for programs using platforms’ standard GUI toolkits).

I’d prefer Rust to use as much libc as possible to minimize overhead of its own stdlib.

Can you be more specific? What sort of libc features would you like to see used? Are you motivated by performance?

Regardless, I think it’s a bad idea for us to use more C than is absolutely necessary. Rust is a systems language, and most of the logic you’d be pulling in from C could probably be more easily and more safely implemented in Rust. Even if it were a little harder to write in Rust, I think that’s a price we should pay for having a self-supporting, self-contained memory-safe systems language.

If I’ve read closely enough, this conversation also hasn’t had someone point out the fact that C standard libraries can and do have memory- and overflow-related bugs. Serguey Parkhomovsky and I patched an out-of-bounds read in OpenBSD’s nlist(3) just last week. Glibc had a trivial buffer overflow in gethostbyname(3) this year. An integer overflow in its strncat(3) was reported two weeks ago. If we start pulling C logic into Rust for the sake of performance or code reuse, we risk being vulnerable to these. And they’re exactly what Rust is meant to prevent.

1 Like

Please not on Windows. On windows you gain nothing by using libc, everything is available through system libraries (aside from math functions and memcpy + friends) and libc is just an additional layer of restrictive overhead that can’t even do everything we need.

fn main(){} compiles to a 300KB executable even with -Clto. I realize it’s not that much in the grand scheme of things, and that C has an unfair advantage here, but it still bothers me, and I’m afraid it also affects how others perceive Rust.

I know a 33KB C library with no dependencies is an easy sell for everyone. I’d like to rewrite it in Rust, but Rust makes it literally 10 times larger adds a second stdlib to non-Rust programs using it.

The majority of that 300k is jemalloc. Switching to the system allocator pulls a couple hundred KB off of that.

1 Like