Consider shipping libclang with Rust toolchain?

A common complaint I see with projects that make use of bindgen is the dependency on libclang. It ends up frustrating a lot of users because it requires the user to go and install libclang, which on some systems requires installing the whole LLVM toolchain.

So, perhaps crazy idea - since bindgen is such a common tool in the Rust ecosystem, and Rust already builds (admittedly, a subset of) the LLVM toolchain, why not build and ship libclang with the Rust toolchain? This has the additional benefit(?) of bindgen being able to depend on the toolchain version of libclang.

4 Likes

Note also that this could (and probably should) be an optional component of the toolchain.

But you need to consider windows-msvc as well: as I understand it, that toolchain has to use the MSVC build tools and not clang.

3 Likes

Well, even when building for *-windows-msvc, you still use libclang to generate bindings using bindgen. So it's useful for every toolchain, regardless of what host compiler you're using to build any related C or C++ code - as long as the tools agree on the ABI.

1 Like

I agree on this front, too. This still makes it very cheap to add libclang (rustup component add libclang)

1 Like

Zig goes even further than this and bakes a bindgen-like use of Clang use into the compiler, and then ships its own Clang driver command on the side because why not at that point.

Which raises an oft-overlooked aspect of this approach: Zig goes to great effort to ship a cross-compiled libc for all of its supported targets, so that Zig programs themselves remain effortlessly cross-compilable.

Even with a Rust-shipped libclang, bindgen would still need enough of a C++ toolchain for libclang to be able to parse what bindgen feeds it.

Presumably, anyone who is using bindgen also already has such a C++ toolchain to build the library they're running bindgen on. So unless Rust also wants to start shipping cross-compiled libc for every target, the utility of shipping libclang is probably limited to cases where the library is built with some non-Clang toolchain but still linked with LLVM-built Rust.

2 Likes

Of course rustup might eventually offer Zig as well, as a C replacement. I'm one of the Zig development funders. I consider it a younger niece of Rust, with Zig-acknowledged shared DNA, and a reasonable stepping-stone from C to Rust.

I'm not sure this is a particular issue for including libclang in the toolchain. Rust already requires that a compatible version of libc (or musl, for the musl target) is available on the host - it's necessary to run any Rust code that's not #[no_std]. Could you elaborate on why this is an issue only when including libclang?

This is pretty common, at least in my field. Vendor compilers from Intel, etc. are often preferred, and the open alternative is usually GCC. Clang is sometimes available on these systems, but it's not a sure thing, and often it's out of date. It's just not as typical, as far as I've seen, to use clang as your primary compiler. The big sticking point, at least for my organization, is a Fortran compiler, which LLVM does not yet ship.

Personally, I would love to see bindgen shipped with rustc, or even fully integrated into rustc, using the same LLVM. That would substantially simplify integration with C code.

2 Likes

I don't think this makes sense today. There's always lots of development happening with breaking API changes (they're at version 0.53 today). I don't see how that can be combined with the stability needed for shipping something with Rust.

2 Likes

Yeah, I wouldn't want Rust to ship bindgen until stability and backwards compatibility could be ensured. But it would reduce a lot of build times if this were to be eventually done! :slight_smile:

bindgen seems to assume that libclang produces consistent-enough results given that it doesn't lock the supported libclang version, so there's presumably less risk with libclang. Though I don't know if bindgen's assumption about libclang returning consistent results between releases is valid.

Toolchain and bindgen's dependencies are separate problems. It's easy to have a C/C++ toolchain, but having LLVM and libclang is a pain.

For example, I can't use bindgen on some of my machines that have an old Debian, because it ships with a prehistoric LLVM version that can't be upgraded. I do have a working gcc there, easily.

Feeding gcc headers to libclang may be problematic, but not all bindings require system headers. It's also relatively easy to replace system types with opaque type definitions for bindgen.

2 Likes

We shipped rustfmt-preview and clippy-preview much sooner than they became stable; I think there's value in starting to work towards a bindgen-preview.

2 Likes

rustfmt and clippy (and dev tools in general) don’t have „accidental ossification“ risk. If we ship bindgen and then remove/change it, it‘ll break a ton of code without clean way to fix it. Breaking rustfmt, in contrast, does not break existing code.

5 Likes

Clang is compatible with libstdc++, and on Linux is the default, so at least the C++ header files should be fine. Unless you meant something else?

1 Like

For the idea of shipping bindgen itself, how that would work with clients that use bindgen as a library rather than invoking the binary?

(I still think a long-term goal should be binary caching of arbitrary crates. Then someone could upload a crate to crates.io that just consists of LLVM's source code together with a build script, and bindgen could depend on it. crates.io would automatically build binary artifacts for common OSes, and Cargo would download them whenever possible instead of actually building from source.)

I’d re-write my build.rs in a hearbeat to use the bindgen binary if that’s what was required. But I don’t see why you couldn’t also ship the compiled crate at that point.

I won’t speak for the bindgen developers, but it still seems too early to me to ship bindgen. Shipping libclang seems less fraught to me.

2 Likes

I don't want to prescribe any particular action, but I think a smoother FFI integration than we have would be a big win, and probably should've gotten more attention a long time ago. I really envy languages like Swift and Zig where you literally just have include directives for C headers.. getting something close to that for a lot of use cases would be pretty sweet.

To me the best outcome of this thread would be for there to be enough motivated people to form a working group on making binding to C libraries as smooth as it can be (distinct from the work already being done on other FFI issues like defining unwinding semantics and so forth).

26 Likes

It just so happens that I've been working on a PR to convert bindgen to using the Clang libtooling interface (the C++ libraries for Clang, often distributed as static libraries) rather than the libclang C interface. The libclang interface exposes a rather limited set of information about the C/C++ AST, so bindgen has to work around a lot of missing information. Switching to libtooling would make fixing a bunch of outstanding issues and adding some new features in bindgen significantly simpler. I've already fixed bindgen's layout of C++ classes with virtual bases and started adding support for generation of C++ vtables on top of this conversion. If we can make this happen I think bindgen will be substantially better for it.

However, work on that switchover has stalled a bit due to not having a good solution yet for building bindgen against libtooling on Windows. The official Windows LLVM installer package doesn't include the Clang C++ libraries and only ships the C libclang library, so users currently would need to compile their own build of Clang on Windows to link bindgen against. Most binary distributions on Linux and Mac OS include the libtooling C++ libraries, so switching wouldn't affect most people on UNIXes.

I've been trying to get the libtooling libraries added to the Windows packages, however, one workaround that @emilio thought of was shipping appropriate Clang libraries in the Rust binary distribtions, as is being suggested here. All this to say, if we could make libtooling (instead of or in addition to libclang) an optional toolchain component, I would be super excited.

TL;DR Yes, please! We can make bindgen a lot better if we can just get the Clang C++ libs shipping in the toolchain.

12 Likes

I am extremely interested in this, for both C and C++, and would love to help out in any way I can.

6 Likes

I would be happy to help with such work; I very much want to see first-class C integration.

10 Likes