Ship `clang` in rustup too

Hello, Rust is using custom fork of llvm and builds it every time, so it should not be a big change. Sometimes you need to use clang too, and it is tricky - building it yourself to match rust's version is not scalable.

What are your thoughts on this?

2 Likes

Rust isn't really using a custom fork of LLVM, almost all patches to LLVM get merged upstream first. That being said, a while ago I added a flag to build clang along with rustc here, and I work with some people who use rustc+clang+llvm plugins together (therefore we need a matching llvm), so I'd be happy if clang would be shipped as part of the binary tools which we already provide.

5 Likes

@josh Since you're on the Rust team and also seem to be interested, what would be the right step forward? I'll have a bit of time in December once my autodiff upstreaming is done. I assume shipping clang is pretty straightforward from a technical point, so shall I just ask around on zulip, or does this need an MCP or something?

I'm not on any Rust team that's in a position to approve this. I don't actually know what team is responsible for approving new rustup components. Best guess: t-release maybe?

I want to clarify one thing: if it's going to make it, for which architectures will it be shipped? For example we need both x86 and arm clang compilers (target)

Not a rustup developer, but if we're going to do this I wonder if it wouldn't make sense to have a unified way to handle target CPU arches in rustup so that target-specific std and clang builds are handled in a uniform fashion.

I have a concern that it will be shipping pre-release clang binaries. Is there any hope of shipping the latest-tagged clang (that is compatible) with it rather than some snapshot? As a CMake developer/maintainer, I really don't want to have to make a RustupClang compiler ID to encapsulate this custom build.

I expect LLVM's C++ api to be sufficiently unstable that you can't use a pre-release version of LLVM with the latest stable release of clang. Rustc already only works across multiple LLVM versions thanks to conditional compilation and rustc uses the much more stable C api where possible. Clang doesn't use any conditional compilation based on the LLVM version as it is expected to be built against a single LLVM version and it exclusively uses LLVM's C++ api.

Why would that need to be? If the end user doesn't specify this clang version as C compiler it wouldn't be used as I expect the clang copy to either be put somewhere in lib/rustlib in the rustc sysroot (which isn't part of PATH) or to be called rust-clang to avoid conflicts with the system clang installation. (both measures are taken for rust's copy of lld)

If it's not going to be used, what are we shipping it for? Or are you saying no one would use such a clang to build anything with CMake? Because CMake has version checks for Clang and, AFAIK, all in-development code is versioned the same without distinguishing characteristics for CMake to deal with broken stuff. So maybe a disclaimer of "clang is a development snapshot and may not represent any level of stability recognized by full releases" would be good? But then…is that any good for those asking for this?

I did expect it would be used if and only if the user decides to use it, not automatically just because you have rustc installed on your system.

Is clang often that broken that you actually need to distinguish between patch releases as opposed to only needing to distinguish between major releases for knowing if you can use newly introduced features?

Google for example is maintaining an infrastructure just to test that LLVM HEAD can be used to compile both rustc and clang based on the same LLVM commit. Everything except the exactly same LLVM commit also has a chance of causing very weird issue, so shipping a clang that is just based on a similar LLVM than the corresponding rustc is pretty much unusable in my opinion. The motivation of Google, me, and I'd guess other people supporting this is Cross-Language-LTO and for that the relevant factor is that you use one LLVM commit for both clang and rustc, and whether that clang is an "official" release or not doesn't matter. Does that answer for what we need it?

That is very true on LLVM HEAD, but it's less of a problem on the release branches, even before the actual release. Rust's bundled LLVM has only been pulling from release branches for a long while now.

Oh sure, it being the default CC would be silly, but that just makes its use from CMake less, not zero.

There are checks in CMake for things that are more bleeding edge (especially around modules) and without a way to detect which side of the "fixes a bug that matters" commit is in use, it's hard. With custom builds, one can say "just get a newer LLVM/Clang", but when something is distributed like this, it's harder to say "you need to update your development snapshot".

Are you concerned that rustup would ship a commit between 2 releases? Like between 19.1.3 and 19.1.4? Afaik it is not going to happen

A commit in that range is unlikely to cause issues. My concern is more about commits between 18.0.0 and 19.0.0. If LLVM's release branch is used and a suitably tagged clang could be used with it, that resolves a lot of my concern.