How stdlib is found by cargo -Z build-std

Hello internals. I'm attempting to look into how cargo finds the stdlib for -Z build-std, and how it may be improved for differing installation layouts potentially used by non-rustc implementations, in particular, those with high-degrees of end-user customizability.

From what I can tell, it finds the libraries in <sysroot>/lib/rust-src (or something similar, I currently am unable to check from my computer). Is this path hardcoded/configured, found relative to either cargo or rust, or does rustc report (part of) it in some way.

As for improving and generalizing it, my idea would be some compiler option that prints the build-std info, which includes:

  • The path to the workspace for build-std
  • Default build-std-features
  • Environment Variables (in particular, whether to pass RUSTC_BOOTSTRAP or not).

This could be produced in a form that cargo (and other build systems) understands, such as json.

It looks like the relevant code is here:

Ok. Am I correct in assuming that sysroot is obtained from $RUSTC --print sysroot then?

Yes

Would there be a reason not to unhardcode that? For GitHub - LightningCreations/lccc: Lightning Creations Compiler Frontend for various languages, the "sysroot" (which, on top of controlling the search path for rust libraries, also controls the system include path for the C and C++ frontends, and linker search paths) is not related to the directory that the standard library source code can be found in (internally called the libsrc directory), which is within the host prefix, rather than the sysroot prefix (unless modified at any one of 4 levels). This is the basis for my proposed change, to allow different implementations to use different directory structures (this would also allow multiple implementations to coexist within the same prefix, even when they may have different standard libraries).

Rust's sysroot and its structure are intended for the Rust compiler to use. The exact contents of the Rust sysroot are an implementation detail not just of the Rust compiler but of the exact version of the Rust compiler. We don't, to my knowledge, have any stability guarantees there, nor should we. As such, if you're trying to find things for use with Rust, it's appropriate to ask the Rust compiler where to find them. If you're trying to find things for use with a compiler other than the Rust compiler, I think it would make more sense and generate less conflict to have a separate sysroot.

I assume this is supporting removing the hardcoded path, if only to reduce the reliance on any particular layout within the rust sysroot.

I'm unsure whether this is referring to "the Rust compiler" as merely rustc, or as $RUSTC (any rust implementation, particularily one with a rustc-like CLI). In either case, I don't see much point in bifrucating the sysroot option in lccc, particularily since it would just add to the list of unstable options it has. It also affects sysroot cross-compilation which, on rustc, has a lot of redundant points of failure (I seem to remember setting the linker sysroot in 3 places in the cargo rustc invocation, and still using -Z build-std, and hoping I put the right directory in all four, failing at least 5 times).

Between rust compilers, I do agree, as there are inherently significant differences between the compiled artifacts (and also, in some cases, the stdlib source code). However, there are mechanisms for differentiating between artifacts between compilers and standard libraries, when their sysroots overlap (which, could occur when multiple compilers are installed to the same prefix, even when the sysroot is defined similar to how rustc does), and tools like cargo should be designed to allow use of these mechansims (even where not required).

-Zbuild-std has to depend on the layout one way or another. It is not even guaranteed that the standard library uses Cargo.toml. -Zbuild-std will depend on implementation details of rustc for as long as the interface between the standard library and the compiler is not stabilized. (this will likely never be stabilized) -Zbuild-std has to use RUSTC_BOOTSTRAP to enable using nightly features on a stable compiler. This is an implementation detail in itself.

You could use a different directory than lib/rustlib to put your artifacts in. lib/lccc-rustlib or something like that. As for multiple versions of rustc in the same directory, that is already mostly supported by the fact that the compiled standard library files contain a hash in their filename and the metadata contains the exact compiler version. Binaries and the standard library source can't be installed in the same directory currently. For binaries you would have to rename them. For the source I guess the directory could be changed to include the rustc version.

The proposal here is to reduce that dependency. For Cargo.toml I think it's reasonable for cargo to expect (though as a result, I need to maintain 3 different build systems for lccc's stdlib, and perform feature detection in a rust build.rs which is not fun). If $RUSTC can tell cargo what it needs to do in order to build the standard library, then it doesn't need to rely on anything else. On rustc, a fictional --print build-std-info could produce something like {"workspace":"<sysroot>/lib/rustlib/src/rust",env:{"RUSTC_BOOTSTRAP":1},"default-features":["backtrace",...]}. lccc would produce {"workspace":"<lccc_libsrcdir>/rust/0"} (noting that it currently does not set RUSTC_BOOTSTRAP, and has no default-enabled features yet).

Currently, binary artifacts are in the rustlib dir, which defaults to <libdir>/rust/lcrust/v0 (though, the directory is configurable with the rustlib option for stdlib builds, and the LCRUST_SEARCH_PATH_SUFFIX option for the frontend). At search time, <libdir> refers to each directory on the default search path and in LIBRARY_PATH/SEARCH_PATH, except on msvc targets, where it uses <sysroot>/<prefix_in_sysroot>/lib/<target> (and the afformentioned environment variables).

I assume this is supporting removing the hardcoded path, if only to reduce the reliance on any particular layout within the rust sysroot.

I think it's possible that we're referring to two different things here. My primary concern is that nothing outside of Rust should be treating the Rust sysroot as anything other than an opaque directory for Rust itself to find things in. Its contents or layout could change in future versions of Rust, to accommodate whatever the corresponding version of Rust that owns that sysroot needs; it isn't part of any Rust stability guarantee (as far as I know). So I'm in favor of having tools remove hardcoded paths to anything inside it, insofar as they should just ask the Rust compiler for what they need. (Cargo is part of Rust, and it could potentially rely on some of those details if it needed to and did so carefully, but it may be useful and more robust not to.)

Separately, there's the question of "how do you find that top-level opaque path". I'm not sure how much it makes sense to try to find that path at all, since the only useful thing to do with it is pass it right back to rustc, and rustc already knows where to find it. If you're looking for where to find the standard library source, it's probably best to query that rather than querying for the sysroot and then trying to find the standard library source there yourself.

I'm unsure whether this is referring to "the Rust compiler" as merely rustc,

Yes, I was specifically distinguishing between "the Rust compiler" (rustc) and "a compiler other than the Rust compiler". The whole point of this message was to state that the Rust sysroot is in large part a compiler-specific implementation detail of the Rust compiler. By "have a separate sysroot", I mean that if some other compiler needs to have a sysroot, it should have its own sysroot and not rely on Rust's, and especially not on Rust's internal layout.

Makes sense. I've been treating it more like a sysroot for gcc-likes, at least in terms of what lccc does, but even then it's just an implementation detail of "where does the frontend search for rust libraries and where are linker artifacts pulled from" (it's just more stable, since the layout is imposed by various standards and conventions). Since it's the thing invoking the linker (normally), it's allowed to internally use the knowledge that the rust sysroot is also the linker sysroot.

With the development of non-rustc implementations underway, it may be a good idea if cargo wasn't too attached to the specifics of the rust compiler that aren't stabilzed (it may or may not also be a good idea to decide what things that are stablized and are to be stablized wrt. to the external interface of rustc, but that's out of scope here). A potential argument is that cargo (the specific program developed in tandem with rustc) is specifically for use with rustc and alternative implementations should provide their own versions if they choose to (or, do what gcc-rs seems to be be doing, which is provide a cargo subcommand to do that job). If that is the case, I think it would be a good idea to have it documented. That would likewise be out of scope here, though, beyond it's applicability to the question of whether or not (and how) to generalize -Z build-std.

Ok, thanks for the clarification. For languages like C++, "the C++ compiler" typically refers to "whatever C++ compiler you happen to use", hense my confusion, but for Rust the use here is reasonable.

It should be a good idea to handle the case where they match, by accident (or by design of the end-user, who, in my opinion, should have ultimate decision in how to lay out their system). Depending on the internals of both implementations, this may or may not be fiesible (especially when both the sysroot and the layout match). However, I do agree that, in general, different rust implementations should not intentionally overlap sysroots (unless the implementation that is doing so accepts the burdern of compatibility in that case, like clang and gcc). This is just as applicable between alternative implementations, and from an alternative implementation to rustc, as from rustc to an alternative implementation. (if rustc started expecting to be able to find things it understands at <libdir>/rust/lcrust/v0/, it's probably going to be disappointed when it searchs for libcore.rlib!/.rmeta or tries to read libcore.rlib!/.rmanifest expecting whatever format it uses for .rmeta, just like lccc would be, reading the libcore.rlib that rustc produces in <sysroot>/<libdir>/rustlib/<target>).

Do you count rust-analyzer in that exception? It also digs up sources in the sysroot.

I'd assume it depends on whether rust-analyzer is intended for use with rustc in specific, or $RUSTC in general. Though if build-std information is made available, that argument could be mooted, as rust-analyzer could just ask for that information, and extract the source root (and I'm unsure there would be a benefit to hardcoding the information if it is made available, even if the program is intended specifically for use with rustc).

Yes, and as far as I know rust-analyzer works with various other bits of Rust that aren't covered by stability guarantees, as well. But as with Cargo, just because it can rely on such things doesn't mean it should, or that it would benefit from doing so.

In general, stable features of Cargo should only use stable features of rustc, and nightly features of Cargo might in turn use nightly features of rustc.

We haven't yet had much conversation on whether Cargo should directly support compilers other than rustc. There are tradeoffs there, and we'll want to set clear expectations. It's already the case that Cargo features often build substantially on corresponding support from rustc; I wouldn't want to additionally have to worry about whether a non-Rust compiler has support for a given command-line argument yet. The critical question: "who carries the compatibility burden and has to adapt: the non-Rust compiler keeping up with Rust's command-line interfaces and similar, or Cargo restricting itself to options available by a non-Rust compiler?". Stated that way, I believe the compatibility burden should fall on a non-Rust compiler attempting to be compatible with Rust, and Cargo should be able to immediately use functionality supported by Rust. (This isn't unique to non-Rust compilers, either; the same question also applies to whether new Cargo should be compatible with old Rust by detecting available features, or just call functionality from the corresponding Rust and fail if it doesn't work.) But all that said, I don't think Cargo should gratuitously prevent use of a non-Rust compiler; it's more "feel free to try, it may or may not work depending on if your compiler supports the latest Rust features that Cargo is using".

Returning to the topic of -Z build-std (as much as I'd love to discuss general usability of cargo with alternative rust implementations), is there any potential issues about my proposal (json structure reporting the root of the workspace, environment variables, features)? One thing I can think of is that it should also be able to supply additional RUSTFLAGS.