The goal is to explore the current situation of crates including statically linked C/C++ libraries and to start a discussion about ways to make it easier to import external code in crates in a secure and reliable manner.
Overview
To get an idea of the extent of this pattern, let's explore crates.io content with an analysis of the crates with more than 100k downloads on 2022-08-07 (the 4,7k top crates, see the methodology for more details).
There are currently 70 C/C++ native libraries included with git submodules in 58 crates from the top 4,7k crates. Some of them are widely used, like libz-sys with 20M downloads and 46 reverse dependencies, or libgit2-sys with 11M downloads. Among these crates:
-
6 have the
-srcsuffix in their name-
boringssl-src,openblas-src,sqlite3-src,openssl-src,zeromq-srcandluajit-src - A total of 47 crates in crates.io have the
-srcsuffix.
-
-
38 are
-syscrates that include a library directly-
libevent-sys,pcre2-sys,lmdb-sys,lzma-sys,lmdb-rkv-sys,croaring-sys,openvino-sys,zstd-sys,cloudflare-zlib-sys,mozjpeg-sys,boring-sys,libsodium-sys,librocksdb-sys,libz-sys,libnghttp2-sys,libgit2-sys,sdl2-sys,curl-sys,rpmalloc-sys,sass-sys,rdkafka-sys,snmalloc-sys,wabt-sys,z3-sys,ckb-librocksdb-sys,libssh2-sys,libbpf-sys,oboe-sys,lz4-sys,tikv-jemalloc-sys,fasthash-sys,libusb1-sys,shaderc-sys,minimp3-sys,jemalloc-sys,liblmdb-sys,aom-sys,brotli-sys - A total of 2288 crates in crates.io have the
-syssuffix.
-
-
2 have a
_syssuffix, a-sysvariant-
audiopus_sysandonig_sys
-
-
1 has an
-ffisuffix, a-sysvariantwepoll-ffi
-
12 have no specific name pattern
-
afl,hidapi,khronos_api,mimalloc,parity-secp256k1,rust-htslib,rusty_v8,souper-ir,spirv-reflect,sprs,tflite,twox-hash
-
Two main patterns appear:
- standard
-syscrates which are also able to compile the library they are providing an interface for, either by default or only when enabled by a feature flag (see this blog post for details on how it's done). - dedicated crates containing
-srcin the name, depended on by-syscrates
Note: This only covers the crates containing submodules, but sometimes the code is vendored directly into the repository, like freetype-sys which has a copy of freetype2 sources. In any case, the source becomes part of the crate uploaded to the registry.
Case studies
Let's have a closer looks at a few representative crates.
mozjpeg-sys
- The source is included through a git submodule.
- The version number of the crate,
1.0.2, is not related to the upstream version,4.0.3. - The license of the crate is
IJGwhich broadly matches the source crate (but seems incomplete) - It always builds
mozjpegas a static dependency.
curl-sys
- The source is included through a git submodule.
- The crate versions are built as the following SemVer string:
0.4.56+curl-7.83.1, defined asMAJOR.MINOR.PATCH+BUILDwithBUILDbeingcurl-+the upstream curl version. - By default, it will try to dynamically link to the system curl and openssl, and fallback on static linking. It also has
static-curl/static-sslfeatures to enforce static linking. - There is no way to enforce dynamic linking (i.e. make the build fail if library is missing on the system).
- The crate documents an
MITlicense, while curl is licensed under a custom license (but close to MIT).
openssl-src
- The source is included through a git submodule.
- The crate only contains the logic to build openssl. The API is in
openssl-syswhich depends onopenssl-srcwhen thevendoredfeature is enabled (disabled by default). Some crates depending onopenssl-sys(likeopensslandnative-tls) expose a similar flag too. - The crate documents an
MIT OR Apache-2.0license, while openssl is licensed under:- Apache-2.0 starting from 3.0
- Dual OpenSSL and SSLeay licenses before, which are in particular not compatible with the GPL. The
release/111branch providing versions under this license is still maintained.
- The crate versions are built as the following SemVer string:
111.16.0+1.1.1l, defined asMAJOR.MINOR.PATCH+BUILD, withBUILDbeing the upstream openssl version.
Issues
A lot of widely-used crates include third-party libraries, with little consistency. It causes problems in terms of:
-
Visibility: It is not always easy to know if a library was statically linked (and which version) as it does not appear in the crates tree,
cargo-auditabledata, or any automated SBOM (like cargo-spdx). -
Usability: The way to select static vs. dynamic compilation varies, and is sometimes not even actionnable. Some
-syscrates fall back to using statically linked dependencies if not detected on the build system without a way to force dynamic linking. -
Licenses: The core problem here is that the license documented in the
Cargo.toml(which is supposedly thought to cover only the build code) is sometimes different from the licenses applicable to the library itself, meaning the crate metadata does not match reality. In this case they are not easily discoverable, and tools likecargo deny check licensescannot check them. A good example is the OpenSSL licence for versions before 3.0, which is incompatible with GPL. -
Vulnerabilities: Except for dedicated source crates, there is no accurate visibility over vulnerabilities affecting the included library in the usual Rust tooling (
cargo-auditandcargo-deny). -
Trust: The code is included from external sources, written by unidentified people, and is not visible in tooling like
cargo-supply-chain,rust-auditorcargo-crev
Possible improvements
Just like -sys crates have an official definition in cargo docs, with a set of recommended practices, a first step could be to write an RFC with similar guidelines for external source crates. This could build upon implementations, and allow an easy convergence for libraries using different patterns. It could then be improved by additional tooling or metadata.
Dedicated -src crates
Having dedicated crates (with the -src suffix for discoverability) seems to have quite a few advantages:
- Allow independent versioning, releases, licenses, security advisories
- Give visibility over included code in all cargo-based tooling
One obvious big drawback is the maintenance overhead.
Consistent feature-based configuration
Ideally there should be a recommended way (through features of -sys crates) to:
- Allow to enforce either static or dynamic linking
- Keep the convenient default used in most existing crates (dynamic linking with static fallback)
The is already a pre-RFC by @kornel to discuss this.
Accurate metadata
License
The license of a crate should cover all files included in the crate archive, including external embedded files.
Using a dedicated crate makes it easier by allowing to easily document different licenses for external code and -sys crate.
Source identification
The other missing information is a way to identify the included software, if possible in a machine-readable manner (CPE, SWID tags, PURL, etc.). It would make it possible to integrate properly with SBOM, automate CVE detection, automate upstream version update, etc.
Note that it would be possible to identify statically linked libraries at compile time already, but this does not work on sources only and does not provide a proper software identifier, just a library name.
Versioning
Most existing -src crates use the SemVer build metadata to provide upstream version. Build metadata is defined as a series of dot separated identifiers using only ASCII alphanumerics and hyphens, which are ignored when determining version precedence. Hence, the format is quite flexible, but cannot be used for actual version comparisons (which need to rely on the base SemVer version).
Using the upstream version directly as the crate version would cause some trouble:
- Not all software use SemVer compatible versioning
- We need to keep a way to publish updated build code without bumping the embedded code
It could also be a separate metadata (maybe part of the source software id), but it would make upstream version invisible in most use cases.
Source embedding
There are two ways:
-
git submodule
- Some libraries have different contents in git compared to release tarballs, and may have different build procedures (git vs. tarball).
-
Source import directly in tree
- Makes it harder to know and check where the source comes from
Improved tooling
Some cargo-based tooling could learn to detect -src crates and implement special handling (extract upstream version, etc.), maybe using additional metadata.
It could also provide automation to alleviate the maintenance burden (automate PRs for upstream version update, security advisories based on CVEs, etc.).
And now?
crates.io is a widely used repository of C/C++ libraries, providing a great experience for Rust developers who rely on them. But the current usage patterns have shortcomings, and are not a great fit for current software supply-chain security and traceability needs.
I'm particularly interested in feedback from -sys and -src crates maintainers about the upstream library handling, how it could be improved and their opinion on the discussed issues.
Potential next steps:
- Work on a documentation for
-syscrates developers, including the-srccrate pattern - Work with existing
-sysand-srccrates maintainers to improve the situation and discuss recommended practices - Work on an RFC to add the
-srcit as part of official cargo documentation
Creating a project group could help coordinate future work on this topic.
Thanks to @Shnatsel for feedback on the initial draft of this post, and to @tofay for feedback on software identifiers for SBOM.