The goal is to explore the current situation of crates including statically linked C/C++ libraries and to start a discussion about ways to make it easier to import external code in crates in a secure and reliable manner.
Overview
To get an idea of the extent of this pattern, let's explore crates.io content with an analysis of the crates with more than 100k downloads on 2022-08-07 (the 4,7k top crates, see the methodology for more details).
There are currently 70 C/C++ native libraries included with git submodules in 58 crates from the top 4,7k crates. Some of them are widely used, like libz-sys
with 20M downloads and 46 reverse dependencies, or libgit2-sys
with 11M downloads. Among these crates:
-
6 have the
-src
suffix in their name-
boringssl-src
,openblas-src
,sqlite3-src
,openssl-src
,zeromq-src
andluajit-src
- A total of 47 crates in crates.io have the
-src
suffix.
-
-
38 are
-sys
crates that include a library directly-
libevent-sys
,pcre2-sys
,lmdb-sys
,lzma-sys
,lmdb-rkv-sys
,croaring-sys
,openvino-sys
,zstd-sys
,cloudflare-zlib-sys
,mozjpeg-sys
,boring-sys
,libsodium-sys
,librocksdb-sys
,libz-sys
,libnghttp2-sys
,libgit2-sys
,sdl2-sys
,curl-sys
,rpmalloc-sys
,sass-sys
,rdkafka-sys
,snmalloc-sys
,wabt-sys
,z3-sys
,ckb-librocksdb-sys
,libssh2-sys
,libbpf-sys
,oboe-sys
,lz4-sys
,tikv-jemalloc-sys
,fasthash-sys
,libusb1-sys
,shaderc-sys
,minimp3-sys
,jemalloc-sys
,liblmdb-sys
,aom-sys
,brotli-sys
- A total of 2288 crates in crates.io have the
-sys
suffix.
-
-
2 have a
_sys
suffix, a-sys
variant-
audiopus_sys
andonig_sys
-
-
1 has an
-ffi
suffix, a-sys
variantwepoll-ffi
-
12 have no specific name pattern
-
afl
,hidapi
,khronos_api
,mimalloc
,parity-secp256k1
,rust-htslib
,rusty_v8
,souper-ir
,spirv-reflect
,sprs
,tflite
,twox-hash
-
Two main patterns appear:
- standard
-sys
crates which are also able to compile the library they are providing an interface for, either by default or only when enabled by a feature flag (see this blog post for details on how it's done). - dedicated crates containing
-src
in the name, depended on by-sys
crates
Note: This only covers the crates containing submodules, but sometimes the code is vendored directly into the repository, like freetype-sys
which has a copy of freetype2 sources. In any case, the source becomes part of the crate uploaded to the registry.
Case studies
Let's have a closer looks at a few representative crates.
mozjpeg-sys
- The source is included through a git submodule.
- The version number of the crate,
1.0.2
, is not related to the upstream version,4.0.3
. - The license of the crate is
IJG
which broadly matches the source crate (but seems incomplete) - It always builds
mozjpeg
as a static dependency.
curl-sys
- The source is included through a git submodule.
- The crate versions are built as the following SemVer string:
0.4.56+curl-7.83.1
, defined asMAJOR.MINOR.PATCH+BUILD
withBUILD
beingcurl-
+the upstream curl version. - By default, it will try to dynamically link to the system curl and openssl, and fallback on static linking. It also has
static-curl
/static-ssl
features to enforce static linking. - There is no way to enforce dynamic linking (i.e. make the build fail if library is missing on the system).
- The crate documents an
MIT
license, while curl is licensed under a custom license (but close to MIT).
openssl-src
- The source is included through a git submodule.
- The crate only contains the logic to build openssl. The API is in
openssl-sys
which depends onopenssl-src
when thevendored
feature is enabled (disabled by default). Some crates depending onopenssl-sys
(likeopenssl
andnative-tls
) expose a similar flag too. - The crate documents an
MIT OR Apache-2.0
license, while openssl is licensed under:- Apache-2.0 starting from 3.0
- Dual OpenSSL and SSLeay licenses before, which are in particular not compatible with the GPL. The
release/111
branch providing versions under this license is still maintained.
- The crate versions are built as the following SemVer string:
111.16.0+1.1.1l
, defined asMAJOR.MINOR.PATCH+BUILD
, withBUILD
being the upstream openssl version.
Issues
A lot of widely-used crates include third-party libraries, with little consistency. It causes problems in terms of:
-
Visibility: It is not always easy to know if a library was statically linked (and which version) as it does not appear in the crates tree,
cargo-auditable
data, or any automated SBOM (like cargo-spdx). -
Usability: The way to select static vs. dynamic compilation varies, and is sometimes not even actionnable. Some
-sys
crates fall back to using statically linked dependencies if not detected on the build system without a way to force dynamic linking. -
Licenses: The core problem here is that the license documented in the
Cargo.toml
(which is supposedly thought to cover only the build code) is sometimes different from the licenses applicable to the library itself, meaning the crate metadata does not match reality. In this case they are not easily discoverable, and tools likecargo deny check licenses
cannot check them. A good example is the OpenSSL licence for versions before 3.0, which is incompatible with GPL. -
Vulnerabilities: Except for dedicated source crates, there is no accurate visibility over vulnerabilities affecting the included library in the usual Rust tooling (
cargo-audit
andcargo-deny
). -
Trust: The code is included from external sources, written by unidentified people, and is not visible in tooling like
cargo-supply-chain
,rust-audit
orcargo-crev
Possible improvements
Just like -sys
crates have an official definition in cargo docs, with a set of recommended practices, a first step could be to write an RFC with similar guidelines for external source crates. This could build upon implementations, and allow an easy convergence for libraries using different patterns. It could then be improved by additional tooling or metadata.
Dedicated -src
crates
Having dedicated crates (with the -src
suffix for discoverability) seems to have quite a few advantages:
- Allow independent versioning, releases, licenses, security advisories
- Give visibility over included code in all cargo-based tooling
One obvious big drawback is the maintenance overhead.
Consistent feature-based configuration
Ideally there should be a recommended way (through features of -sys
crates) to:
- Allow to enforce either static or dynamic linking
- Keep the convenient default used in most existing crates (dynamic linking with static fallback)
The is already a pre-RFC by @kornel to discuss this.
Accurate metadata
License
The license of a crate should cover all files included in the crate archive, including external embedded files.
Using a dedicated crate makes it easier by allowing to easily document different licenses for external code and -sys
crate.
Source identification
The other missing information is a way to identify the included software, if possible in a machine-readable manner (CPE, SWID tags, PURL, etc.). It would make it possible to integrate properly with SBOM, automate CVE detection, automate upstream version update, etc.
Note that it would be possible to identify statically linked libraries at compile time already, but this does not work on sources only and does not provide a proper software identifier, just a library name.
Versioning
Most existing -src
crates use the SemVer build metadata to provide upstream version. Build metadata is defined as a series of dot separated identifiers using only ASCII alphanumerics and hyphens, which are ignored when determining version precedence. Hence, the format is quite flexible, but cannot be used for actual version comparisons (which need to rely on the base SemVer version).
Using the upstream version directly as the crate version would cause some trouble:
- Not all software use SemVer compatible versioning
- We need to keep a way to publish updated build code without bumping the embedded code
It could also be a separate metadata (maybe part of the source software id), but it would make upstream version invisible in most use cases.
Source embedding
There are two ways:
-
git submodule
- Some libraries have different contents in git compared to release tarballs, and may have different build procedures (git vs. tarball).
-
Source import directly in tree
- Makes it harder to know and check where the source comes from
Improved tooling
Some cargo-based tooling could learn to detect -src
crates and implement special handling (extract upstream version, etc.), maybe using additional metadata.
It could also provide automation to alleviate the maintenance burden (automate PRs for upstream version update, security advisories based on CVEs, etc.).
And now?
crates.io is a widely used repository of C/C++ libraries, providing a great experience for Rust developers who rely on them. But the current usage patterns have shortcomings, and are not a great fit for current software supply-chain security and traceability needs.
I'm particularly interested in feedback from -sys
and -src
crates maintainers about the upstream library handling, how it could be improved and their opinion on the discussed issues.
Potential next steps:
- Work on a documentation for
-sys
crates developers, including the-src
crate pattern - Work with existing
-sys
and-src
crates maintainers to improve the situation and discuss recommended practices - Work on an RFC to add the
-src
it as part of official cargo documentation
Creating a project group could help coordinate future work on this topic.
Thanks to @Shnatsel for feedback on the initial draft of this post, and to @tofay for feedback on software identifiers for SBOM.