Rust vs. LLVM

Current rust nightly is using a snapshot of future LLVM 8, current stable rust is using a snapshot of LLVM 7 (while LLVM 7 is now released). AFAIK, in both cases, LLVM is patched for multiple things, some of which can be miscompilation issues.

Parallel to that, rust supports building against a system LLVM, which downstreams (e.g. Linux distros, BSDs, etc.) are very likely to do. The current minimal version of LLVM supported is 5.0. Which means when you get rust from a downstream, you may be using a LLVM that is much older than the one used if you get rust via e.g. rustup. Or newer, but that’d usually be less of a problem (and much less likely).

Now, for, I guess, time reasons, Travis builds that happen when you create PRs against the rust repository are using a system LLVM: building LLVM for every PR would make those builds probably take twice as long as they currently take, if not more.

All would be fine if the last part of the first paragraph was not happening: the bundled LLVM is patched, and in some cases it’s patched for miscompilation problems. The consequence of which is that downstream rust may have some miscompilation problems because it’s using an unpatched system LLVM.

You might think it’s mostly an hypothetical problem and is not causing practical issues. Let me introduce you to PR50882. That was an attempt to make the Box type aware of custom allocators. It failed miserably in a seemingly completely unrelated way on Travis. Why? As it turns out, after some investigation, because of a miscompilation bug that was fixed in LLVM 7. To add insult to injury, it had been found in older versions of rust, and backported to rust-llvm back when it was based on LLVM 6. It was found early enough that it should have been in a dot-release of LLVM 6, but never was. Even better, it’s a regression that happened some time between LLVM 3.9 and LLVM 5. So in fact, my very first attempt at PR50882 worked fine. Things were left to bitrot, and PR51899 changed the version of LLVM used on Travis and suddenly my problems started. I recently refreshed the patches in a new PR, and unsurprisingly, it still fails for the same reason.

One way out of this current mess would be to bump the system LLVM requirement to version 7. But:

  • It’s barely a month old, and it’s barely available downstream.
  • As a matter of fact, it’s not even in Ubuntu at all.
  • Even less in the version of Ubuntu that’s used for Travis builds (16.04). I did open PR55176 with an tentative patch using apt.llvm.org, but those repositories are shipping the tip of the release branches, which don’t actually provide fixed releases, but daily snapshot of the release branches, so depending on the day you pull the packages, you get a different version, which is not really desirable for CI.

In any case, I’m not aware of an official policy wrt. LLVM, and it’s about time to have one.

I don’t think picking random trunk versions of LLVM is sustainable, but I can understand why it’s being done. But we have to acknowledge the problems it causes, and the fact that we’re not testing those possible setups. In fact, if anything, my problems also highlight that rust-llvm was fixed for miscompilations without a unittest, which would have caught the problem when upgrading system LLVM to version 5.0, instead of hitting some poor random developer trying to just have their stuff done.

I also don’t think the Travis builds should be testing a setup that, from rust-lang.org’s perspective, is not standard. I can understand using system LLVM saves time, but it feels like we should build the in-tree LLVM and cache it instead. It is, IMHO, desirable to test system LLVM as well, but it shouldn’t be the first thing that is tested. And surely, testing only a single version of system LLVM is not really getting us anywhere, as the history of PR50882 showed.

Cc: @alexcrichton

14 Likes

As a Gentoo packager, I also dislike the current situation. So far I’ve used the bundled LLVM exactly because I don’t want to get bug reports about issues that have been patched in Rust’s LLVM, but since my users do their own compiles, this makes installing Rust expensive (especially on slower/older/weird architecture machines, which is one audience that Gentoo caters to, I guess) and testing hard.

I guess my strategy would be more aimed at making sure LLVM patches needed for Rust land upstream (in LLVM) first and/or soon. Ideally at least Rust releases released around the same time as LLVM releases work without fault with that LLVM release.

I definitely feel your pain, discovering that a bug was fixed after-the-fact is never a pleasant situation! I think there’s always room for improvement, but the constraints we’ve gathered around our LLVM usage so far is:

  • We do not want to be blocked when picking up new LLVM features. If we’re willing to do the work to update LLVM and fix all the bugs, then we want to be able to take advantage of this. This has manifested in major features for rustc like parallel codegen with ThinLTO and first-class WebAssembly support.
  • We want to be compatible with a range of system LLVM installations. This is useful for packagers and such, and we want to be sure that our code which compiled against LLVM is correct and rustc produces at least semi-functional code.
  • We do not want to wait 6 months for a bug to get fixed, we want the freedom to backport bug fixes in LLVM to our branch which fix real issues we’re running into.

We can possibly track bugfixes and backports in a more first-class fashion, but otherwise I’m not really sure personally how we can do much better here. It’s not a fantastic situation but LLVM is a big project (and so is rustc!). Interesting things just come about from depending on LLVM and we do our best to deal with it.

2 Likes

That's a low bar. Are you sure that's what you mean? That sounds completely counter to the idea of "Safe".

4 Likes

Having good transparency in how rustc's LLVM currently deviates from mainline (in terms of upstream commit + patches applied) would at least help a lot with my packaging issues.

For bugs specifically, do you also liaise with LLVM's maintenance release team to make sure that fixes we need land in there?

2 Likes

As an Exherbo packager, I leave the choice to users to use either the system llvm or the bundled one.

If the bundled llvm is llvm7, even rust pretends to be compatible with llvm5, I’ll require llvm7 if the user chooses to build with system llvm, and if llvm7 is not yet available, I’ll force-enable the bundled one until it is.

This way, even when built with system llvm, we get results closer to the one we’d expect with the bundled one.

Apart from the emscripten part which is lagging behind, when I see that the bundled llvm is 7 for stable and even 8 for nightly (well, a snapshot of the development tree of 8), I fail t understand why rust pretends to be compatible with llvm5. The pull request mentioned in this post really shows that well: if the required system version was bumped to 7 (which is the closer to the bundled one), the problem would disappear.

llvm7 not available downstream? Fine, downstream can either update it if they want a newer rust, or use the bundled one if they can’t.

3 Likes

Unfortunately it is indeed what I mean! We put for a huge amount of effort and CI time to make sure the LLVM submodule we have works. We don't have the resources to do that for other releases. It's on downstream packagers if they decide to not use the LLVM that we use.

Not currently, no, but we have a policy of only taking patches which are already in LLVM itself for backports. We don't currently recommend or advise such patches making their way into point release to LLVM itself.

Our compatibility is primarily to ensure that rustc's C++ continues to compile against older versions of LLVM. It just so happens that this Travis builder is one of the fastest and most comprehensive, which is why we run it on PRs. We don't run the full exhaustive suite against older versions of LLVM

I was trying to figure out how to actually build with system LLVM and couldn’t quite figure it out from looking at config.toml.example and the Travis config used for rust-lang/rust. Any hints?

The simplest way is to set llvm-config to point to the system llvm-config executable:

llvm-config = "/bin/llvm-config-64"

If you’re using configure, this is the --llvm-config option.

My plan was to bump the minimum LLVM version from version 5 to version 6 in december depending on what @cuviper and the other distro packagers are able to cope with. If they all agree that it's ok to bump it to version 7, it would be great to do that.

Now, for, I guess, time reasons, Travis builds that happen when you create PRs against the rust repository are using a system LLVM:

@alexcrichton this sounds very wrong. We could have one build bot testing the minimum system LLVM version that we support, but all others should be using the bundled LLVM "somehow".

Once Fedora 27 goes EOL in December, I won't need LLVM 5 support any more. Fedora 28 still has only LLVM 6 though, and will be supported until around next June. We aren't planning to update F28 to LLVM 7, as it tends to be rather disruptive. The imminent Fedora 29 is coming with LLVM 7.

On the RHEL side, I would also appreciate continued LLVM 6 support for the time being. I think we'll be fine with LLVM 7 here sooner than the F28 constraint though.

The initial build for any new PR -- the one that gives you a basic CI checkmark -- only has the LLVM compat build and a quick mingw check. See any of the Travis CI PR builds. Once bors gets approval, you get the full build and check on all targets, which you see in the "auto" and "try" branches.

1 Like

The problem is that both system LLVM 5 and LLVM 6 are practically broken, except if you apply patches to them. Maybe even system LLVM 7 is broken, I don’t know, I haven’t looked. I got LLVM 6 patched in Debian for https://github.com/rust-lang/llvm/pull/106, but LLVM 5 won’t likely be fixed ever. We can’t expect all downstreams to keep track of every patch applied to rust LLVM and apply them. Although one could argue that if they go around their way to use system LLVM, they should ensure it is fit for the task… But how can they do that when some of the patches are non-trivial to backport? (e.g. https://github.com/rust-lang/llvm/pull/106 is not trivially backportable to LLVM 5)

So the problem is multi-dimensional:

  • When you open a PR, your immediate pre-bors result is that of a non-standard configuration. Independently of whether rust should bump its LLVM requirement, it seems to me the first thing a PR should trigger is a build that uses rust LLVM. PR55176 only bumped the system LLVM to the tip of the upsteam release_70 branch, which is a) not deterministic and b) still not rust-llvm. @alexcrichton, can you point me in the right direction to setup a cached build of the in-tree rust-llvm?
  • There is actually no testing of the various supported system LLVM versions. As I noted earlier, for PR50882, everything was fine when the PR was opened back when the system LLVM used on the travis build was still 3.9. Meaning, LLVM 3.9 was good, and LLVM 5 and 6 were broken, and 7 fixed. IOW, when we have a span of multiple LLVM versions supported, even only testing one is doing us a disservice.
  • Then there’s the separate problem of using random trunk versions of LLVM for rust-llvm, and patching them on top of that.
2 Likes

Given that current master won't be released until 1.32 in January, after Fedora 27 goes EOL, I'd be OK with dropping LLVM 5 now. I don't need to build beta/nightlies for the distro.

We don't track every rust-llvm patch in our system LLVM. Some have relatively low impact, and some aren't even relevant on Linux. And frankly, a lot of times it's reactive -- we haven't applied your example JumpThreading patch yet, because we haven't seen that issue. (I guess because rustc also worked around it.)

But yes, I do see this as the job of a distro maintainer. I recognize that it may be harder for those who volunteer vs. paid roles, but we can also collaborate on this.

We can and should work together on this. I'm happy to attempt backports myself, and I also have teammates to consult who work on LLVM directly. Your particular example for LLVM 5 may be irrelevant soon, but the offer stands in general.

(edit to add) All that said, I do know the overall problem is still not trivial, but I think it's manageable.

One thing that would help maintainers know which patch to backport to llvm or not would be, in the rust llvm fork, to add to rust specifics commits the reason why it’s not upstream in llvm, or a link to where it was submitted or backported from.

For example, why is this patch not upstream? https://github.com/rust-lang/llvm/commit/deaf37e407d6c54e6c8ac60de2a0a4ca47b95c41

This patch alone would help a rustc built with system llvm to be closer to one built with the embedded llvm, feature wise.

2 Likes

For most patches, the answer is that nobody cared enough about upstreaming them to put in the work. If you care about this, help upstreaming all these patches would be very welcomed. I don't think anyone involved believes that upstreaming patches is a bad idea - it is just a process that takes some work (even for "trivial" patches), and most people just have higher-priority things to do.

There is also a case where some of our “patches” are hacks, workarounds and otherwise unsuitable for upstreaming. LLVM upstream might be working on a design of some sort that would solve the problem in a more general manner or otherwise a better way, but we often cannot afford to wait for the feature to get designed, implemented and then 6 more months for the next LLVM release.

I can’t see ourselves refraining from patching our LLVM.

1 Like

If you are following this thread you may also be interested in this issue about bumping the minimum LLVM version.

So I’ve added an option in the Gentoo package for building against system LLVM, and while it works fine for me and for a number of other users, I also got the first segfault today: https://bugs.gentoo.org/671266

What information do I need to gather in order to file useful bug reports about something like this?

A backtrace from the segfault would be the first step.

Can you tell whether their LLVM was built with Clang or GCC? There’s a known ABI problem with mixing the two: https://bugs.llvm.org/show_bug.cgi?id=39427

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.