X86_64-unknown-linux-unknown?

Recently I’ve been toying with a few ideas, mostly some wrt a (massive) project of a type-1 hypervisor written in rust (a-la ESXi).

Thinking about this, and switching between ideas such as a free-standing rust kernel, I’ve also had the idea of packaging a very small Linux distribution with; the Linux kernel, and a rust-compiled init program that calls upon decades-tested kernel modules such as kvm.

However, with that, I’ve come to realise that glibc is taken for granted, and required in some form for rust programs (either -gnu or -musl). What’s the reason that a “truly” freestanding rust binary can’t be created on Linux? Either by statically linking against aforementioned glibc library (though I’ve seen advice against it), or by somehow re-implementing and providing all libc functionality through rust, creating no dependency on it?

I understand if this is somewhat of a “why would you do that” issue, but I’m just curious, as this looks like a “missing piece” to me. What’s the dependency on gnu on Linux all about? Historical context notwithstanding.

(If this is in the wrong label, please transfer it)

1 Like

I just realised an alternative (more accurate) target could be x86_64-unknown-linux-elf here

1 Like

The linux-musl targets do not depend on glibc. They statically link musl libc, and can be compiled and run on systems that don't even have glibc installed (e.g. Alpine Linux).

$ rustc --target x86_64-unknown-linux-musl hello.rs 
$ ldd hello
	statically linked

(There have also been various projects like relibc to implement a libc replacement in pure Rust.)

7 Likes

Thanks! I’d wager that statically compiling against musl is a better alternative than to try and implement everything again, I’ll keep that in mind when compiling those projects I mentioned.

Relibc looks interesting, would it be possible that - if the library grows mature and stable - it could become another Linux target selector, possibly?

Static linking doesn't change the target; you can statically link glibc (with -C target-feature=+crt-static) or musl (currently by default, but in the future it'll require -C target-feature=+crt-static).

Separate from that, there'd be value in a Linux target that didn't use libc at all, and made system calls directly. That should be a separate target name, but not linux-unknown.

And separate from that, we also need a target like x86_64-unknown-elf, which doesn't depend on Linux at all, and just builds standalone elf binaries for things like kernels and firmware.

4 Likes

unknown-elf targets for architectures sounds like a good idea, though as the blog series for BlogOS has taught me, there’s a bunch of intrinsics and strings that come attached with that. I also imagine that std functionality (such as env::args) would have to be stubbed or fitted to work with GRUB? (Would grub then become part of the target?) (iirc, grub has the functionality to boot Linux with kernel arguments, im curious how a target like this can tap into that)

However, targets like these sound pretty obvious, and I imagine previous discussion has happened about this, I’m also curious how above problems (that come with a freestanding binary) where considered for the RISC -elf targets, for which no std support exists right now.

1 Like

x86_64-unknown-elf wouldn't have std. It might be able to have alloc, if you supply an allocator.

1 Like

Cont: scouring the rfc and rust repos some, I see that the -elf OS part of host triplets isn’t formally defined somewhere easily, and that -none is instead preferred in most cases? Shouldn’t this then be x86_64-unknown-none? With -elf as an additional fourth (env) part to allow for GRUB booting (if I read correctly), or -multiboot to add a multiboot header to the binary

Or; the default emitted binary could be -elf, with other tools wrapping a multiboot header around the binary. Though I’m not exactly sure about that, because that’d leave the binary in a “half-baked state”, almost.

So; -none as the OS part of the triplet to indicate it’s freestanding, and then an additional environment component of elf (or multiboot) to indicate the how the binary header should look like, how’d that look like? I’d personally be interesting in drafting a RFC for something like that for amd64 and arm64 targets. (If only to formalise all of the current ongoing efforts for kernel development in rust, and encourage more)

Edit: additionally, maybe a new -self OS part could be proposed to indicate that the binary will be freestanding, and handles all OS functionality itself.

(It appears I’ve gone quite off-topic :sweat_smile:, should I move this to a new thread?)

ELF is a general-purpose binary format, not specific to any particular environment, so there shouldn't be anything in an -elf target specific to any particular environment. -elf would just determine the binary format (and the calling convention).

If you want a target for some particular bootable environment, you'd either need a separate target for that, or you'd need to post-process the binary.

1 Like

Note that even when targeting an -elf or -none environment the compiler is going to expect it can use a handful of low-level library routines: to first order, everything defined in libcompiler_rt / libgcc that doesn't itself depend on a hosted C implementation (these tend to do things like multiplication wider than the CPU has instructions for), plus memcpy and memset.

1 Like

The compiler_builtins crate does this for rust. You need nightly to even be able to opt out of depending on it.

1 Like

Aha, then it's already handled. Excellent.

There are languages out there, whose runtime / standard libaries directly interact with the Linux Syscall API rather them via a Posix C Library (or something close to that), notably Zig and Go, so from a technical point of view glibc is not needed. For Rust this is currently not used for the following reasons:

a) Simplified standard library implementation: The "libc" Api on Unix System is largely standardized via the C Standard and the Posix Standard (even through Linux is not fully Posix compliant.) As such the std can use a single interface for all Unix Platforms. The syscall interface is Linux specific and would require an unique abstraction layer implemented in std.

b) When linking .dlls or statically linked C components into Rust programms these assume a fixed C library to be present. In this case even Go and Zig have to swap the standard library for a libc based one.

c) glibc is more or less present on all Linux systems, so not using it fells a little bit like bringing tools to a bike shop.

d) If static linking is required, musl is available.

There has been an attempt to add a non-libc alternative: GitHub - japaric/steed: [INACTIVE] Rust's standard library, free of C dependencies, for Linux syst But this didn't obtain enough momentum to keep up with rustc's std development. A full Rust target has some merits, so you could consider implementing a new Rust target, but this will a lot of work and apparently hasn't been high enough on the wish list so far.

2 Likes

In case anyone's interested, rsix is such an abstraction layer. It has a low-level syscall-oriented and Rust-friendly API that can be configured to use either direct linux syscalls or libc. The direct linux syscall configuration avoids depending on libc.

4 Likes

Would it be feasible to use rsix as the backend for a libc-free Rust std (for new linux targets for the architectures rsix supports "directly")?

rsix could do part of it!

rsix has some maturing to do before it would make sense to start talking about actually putting it in std, but I do think that will be something to consider, with I/O safety, ergonomics, and modest performance gains, and its ability to be adopted incrementally, being key motivators. If anyone is interested in helping move this forward, I'd be happy to mentor :-).

That said, rsix is just a system call library, so it isn't aiming to cover other things libc provides, including:

  • malloc
  • program startup (crt0.o) and getenv/setenv
  • pthreads support

I expect there are options for malloc and other things, and beyond that I've heard the steed repo has code that might be a helpful starting point for program startup and libpthread. If anyone's interested in working on this, I'd be happy to help map out possible paths forward.

4 Likes

I likely don’t have the time or expertise to work on this, but providing a freestanding Linux target could be an interesting evolution step in the Rust ecosystem, so I definitely want to look and follow it (if anyone else is interested in perusing it).

Like I said before in this thread, I have minimalisation usecases, and would absolutely use this target if it pops up.

Furthermore, while I don’t question the good that glibc has brought Linux, I do wonder what experimenting with new libraries would yield, and if in turn it could inform rust compiler/language design.

Whatever benefits this provides for Linux, it can't provide them for other OSes as their syscall interface is not stable. Windows and macOS sometimes change the syscall interface. OpenBSD even checks that the syscall originated from a select few memory regions to ensure that an attacker has to find the address of the respective libc function: syscall call-from verification [LWN.net] Go tried to use direct syscalls on platforms other than Linux and had to backtrack to using libc in several cases: all: stop using direct syscalls on OpenBSD · Issue #36435 · golang/go · GitHub, runtime: failed build, compile and use compiled binary on macOS Sierra Beta 4(16A270f) · Issue #16570 · golang/go · GitHub

Rsix falls back to using libc on platforms other than Linux. I don't think it will provide much benefit on platforms other than Linux. That is not to say that it may not provide a lot of benefit on Linux. In fact I think it does provide a lot of benefit on Linux. In fact I was already toying around a bit with getting rsix to compile as part of libstd and making the necessary changes for this. For example currently it's dependency linux-raw-sys is not marked as #![no_std] yet.

5 Likes

Could RSIX be used as a common library throughout std instead of libc for all *IX libc functions that are just thin wrappers around syscalls? For Linux RSIX would use syscalls directly and on other UNIX-like operating systems it would use the platform libc.

Also while the syscall interface on Windows is not stable getting rid of the Windows MSVC runtime dependency and instead only calling directly into user32.dll, kernel32.dll, etc. directly would be worthy goal IMHO. But that is a separate discussion.