Multiple libraries in a cargo project

withoutboats · August 21, 2018, 10:31pm

It is currently the case that every cargo package corresponds to exactly 0 or 1 library crates. Its not possible to use cargo to create a package containing multiple libraries. @alexcrichton and @wycats discussed the motivation for this back in 2015:

How about changing [lib] to [[lib]], to allow multiple library in a crate?

This was initially considered when designing cargo, but we chose to only allow one for a variety of reasons:

If there are two libraries, how do you select only one to build if you don’t want both?

Encouraging separate projects for each crate means dependencies are generally more re-usable than if they were bundled together. Crates which want to provide multiple libraries almost always end up getting large enough that they should be separated anyway as well.

With multiple libraries in one package there would need to be a method of specifying dependencies amongst them.

A number of features have been added after-the-fact which have made multiple libraries tricky. For example, if a package has a build script, how does it know which crate to link the native libraries into?

These comments enumerate some of the design constraints and drawbacks of allowing a single package to contain multiple libraries, but they also contain some motivations for the limitation that I think might be reasonable to challenge at this point.

That is, I think there are good reasons to want to have multiple libraries in a package that are uploaded to crates.io as a single unit:

Proc macro crates: Proc macros have to be contained in a special crate, leading to splits like serde vs serde_derive. With macro re-exporting, though, its possible to expose proc macros through the non-macro crate. In theory, the "derive" crate could be completely eliminated by having it subsumed as a "sub-library" of the main crate.
Internal privacy boundaries: Often, a project will have a subsection which exposes a simple interface for very complex internals. Many items inside that submodule will want to be exposed throughout it but not outside it, leading to a lot of pub(in module) declarations. These could be abbreviated to pub(crate) (or even further if we stabilize crate visibility or similar) by making that submodule its own crate.
Improved internal dependency organization: Crates are required to form a DAG. By breaking off subcrates, you guarantee a certain relationship of dependencies between them, helping you maintain a certain order within your project.
Improved compile times: Crates are compiled separately and in parallel; it can be worth making a module its own crate for that reason alone.

I don't have a complete design, but I think a solution in this space is worth pursuing.

Constraints

Here's a list of design constraints I've come up with:

These libraries should all be versioned and packaged together. When uploaded to crates.io, they form a single entry. If you want them to be separate, you want a workspace, not this feature.
Largely as an implication of the first item, the design is constrained to having a "main" library (unless its a binary project), which is probably treated specially. I'm going to refer to the non-main libraries as "sublibraries."
There should be an easy and automatic way to do this without enumerating them all in your Cargo.toml, so that creating a new sublibrary is as easy as creating a new submodule. Ideally, each library need not have its own Cargo.toml either.
It should be possible (with annotations) for these subliraries to depend on one another.
It should be possible for sublibraries to have dependencies that your other libraries don't, but they should also have automatic access to the dependencies of the main library.

Initial sketch

Underlying mechanism (`[[sublib]]`).

Cargo.toml gains a new section [[sublib]], which is just like [[bin]] et al. Every sublibrary is available as a dependency to the main library be default. A sublibrary does not need to have a Cargo.toml.

The [[sublib]] section has a new entry the other target entries don't have: manifest-path, which points to a manifest for that sublibrary. A manifest for a sublibrary contains only a subset of Cargo.toml.

The sublibrary manifest contains the dependencies table. Somehow, through the dependencies table, a sublibrary can depend both on external packages and on other sublibraries (is a path dependency adequate for the latter case or do we need a new type of dependency?).

The sublibrary manifest does not generate its own lockfile: all external dependencies are versioned in the main Cargo.lock for the package.

Sublibrary manifests are optional: without one, that sublibrary has access to all the dependencies in the main library's Cargo.toml and none of the other sublibraries.

Additionally, the sublibrary manifest contains a [lib] table, which has all the options that the main [lib] table would have. Having a [lib] table in the sublibrary manifest as well as a [[sublib]] section is an error: only the implicit form, described below, uses the [lib] section.

Automated implicit form

The src/lib/ directory acts a lot like the src/bin directory. Every subdirectory of src/lib is an automatic sublib with the name of that directory, rooted at src/lib/$name/lib.rs. That directory can also contain a toml file for the sublibrary manifest path, maybe named Cargo.toml but maybe named something else like Sublibrary.toml or something?

As a result, a user can create a new sublibrary by creating a new directory under src/lib, no other work necessary. When they want more complexity, they can create a toml file in that directory. Only if they want a different file structure do they need to move into the [[sublib]] form.

Having a [[sublib]] section in your Cargo.toml turns off the implicit form (just like [[bin]] does).

Backward compatibility

I'm sure there are already some projects with a toplevel module named lib. Not sure the best way to handle this. Maybe we can act fast to reserve that directory name in the edition for now?

Revisiting Alex & Yehuda's design problems:

Building separate libraries: since sublibraries are targets, they can be built the same way any other target can; when building the main target, sublibraries will be built since they are dependencies of it.
Specifying dependencies among them: handled by the sublibraries' manifest file.
Build scripts: Open question! I haven't tried to solve this yet.
Yehuda's "semantic gap:" I think this comment refers to a version in which multiple libraries are exposed from a single crates.io package. My proposal avoids this problem by having a single "main" package that is exposed, and sublibraries are just for internal organization.

cc the cargo team not previously mentioned: @aturon @matklad @ag_dubs @nrc @Eh2406

withoutboats · August 21, 2018, 10:47pm

As an example, the failure project currently tracks a separate failure_derive crate in $root/failure_derive. With this proposal, it could (if it didn’t want to keep uploading failure_derive to crates.io), instead move that to src/lib/failure_derive and replace its Cargo.toml with:

[dependencies]
quote = "0.6.3"
syn = "0.14.4"
synstructure = "0.9.0"
proc-macro2 = "0.4.8"

[lib]
proc-macro = true

bascule · August 21, 2018, 11:03pm

I just ran into the frustrations of this after splitting a single monolithic crate which was using a feature-per-backend model into features into several crates which are now free to have their own features. I feel like the resulting project is a lot cleaner, but I’ve also generally been releasing all of the crates at once as I evolve the API, and right now that’s really painful.

When I was dealing with this what I thought would be nice is an atomic cargo publish --all that could release all of the crates in a given workspace, but if any of them fail to publish the whole thing fails.

I know in the past people have discussed the possibility of some sort of (off-by-default) quarantine/preflighting feature for crates.io for other reasons, for example allowing one person to upload a given crate, but requiring approval from another before a final release. I think something like that could be useful to implement an atomic-ish cargo publish --all, as crates uploaded in a batch could remain in quarantine prior to publication. If all crates succeed in uploading, then you mark them all as published. If any of them fail, just delete the ones that were uploaded out of the quarantine.

qmx · August 21, 2018, 11:41pm

that’s one of the features I miss from maven/java world - being able to promote several artifacts as a single “transaction”

glandium · August 21, 2018, 11:42pm

A related issue that I have is that it’s not possible to choose the crate type through configuration. That is, I have a crate that I’d like to be able to build either as a cdylib or a staticlib. The best I can do at the moment is to generate a Cargo.toml in some way… not really appealing. Or create two sub-crates, one for the cdylib and one for the staticlib, but then I’ll end up with both being built when I only want one.

CAD97 · August 22, 2018, 1:23am

Whatever solution is used, it would be nice to support multiple libraries coming from a workspace as well. In existing projects using workspaces or even future projects that need more individual control over the sublibraries, it would be nice to support bundling workspace path dependencies as sublibraries.

withoutboats · August 22, 2018, 5:23am

I don’t understand the situation you’re describing. If you have multiple packages in a workspace, you can just depend on them: there’s no need for one to be a “sublibrary,” and it wouldn’t make sense for it to be.

matklad · August 22, 2018, 6:13am

Previous RFC in this space: https://github.com/rust-lang/rfcs/pull/2224

matklad · August 22, 2018, 6:55am

I definitely agree that we should solve private libraries problem! It is an unfortunate restriction that, as soon as you split your code into crates, the split itself becomes a public API. As for the approach to solve this, I kind of like the “private path dependencies” of the postponed RFC more. In a nutshell, the approach is roughly “if you omit version field from the [package] section of workspace member, it becomes a “private dependency” and is packaged into .crate file with your main package”

In the rest of the post, I’d like to just discuss package/crate separation in detail.

Cargo has a concept of a “package”, and I’ve always had a love-hate relationship with it. In reality, all Rust code is a set of compilation units/crates, which form a DAG. The “package” concept is a Cargo-specific addition, which, in theory, could have not existed.

In practice a package as a way to add “supporting” crates to the main library crate is really, really useful. It feels very appropriate that examples, tests and build scripts are implemented as special-cased built-ins.

One of the more problematic aspects of the package concept is dependency management. Because package has at most single library crate, depending on package, and not on the individual crate, works perfect. However, the fact that you specify dependencies per-package is less than ideal: a common requests is to add binary specific dependencies.

In broad stokes, it seems that the way forward is either to stick with the “package as a unit of dependency” model and make package creation more light weight (postponed RFC), or to expose the underling crate concept more (current proposal).

Without examining the tradeoffs to closely, I’d expect the “lightweight package” route more promising, for two reasons:

It reuses an existing mechanism for specifying dependencies, which makes Cargo’s interface simpler.
It does not abandon the existing benefits of the packages (you can have dedicated tests and build-scripts for the private libraries)
It makes transition from “private library” to “stand-alone package” more straightforward.

kornel · August 22, 2018, 12:10pm

The problem is very real, but I think you may be looking at the solution from a wrong perspective. There already is a way to have multiple libraries per project, and have excellent internal boundaries: workspaces.

So the problem is not that one crate can’t have multiple libraries, but that crates.io doesn’t support workspaces.

So I think crates.io needs to be extended to allow publishing multiple crates per package.

Cargo has also moved away from from recommending lib and binaries sharing a crate in favor of separate crates, because it doesn’t support separate dependencies, features, etc. for individual targets within a crate. The same problems would affect multiple libraries.

Out of 1600 most popular (>300 downloads per month) crates, 240 (~15%) have a repository with more than one crate published to crates.io. 7% have 3 or more separately published crates per repo.

djc · August 22, 2018, 2:10pm

I agree with @kornel that the conceptually more simple approach lies the other way: rather than introducing another layer of complexity, we could instead strengthen the abstractions we already have and fix their leaks. From the list of reasons for wanting to have multiple libraries per package, the latter three are directly fixed by having things in different crates.

While the initial post seems pretty extensive in terms of its discussion of a potential solution, I found it a bit light in terms of talking about the problems or use cases that need to be addressed. The second post discusses a concrete use case, but seems to mainly revolve around the proc-macro issue.

As the author of a somewhat popular custom derive crate, I definitely feel the pain about not being able to distribute procedural macros as part of a larger library. I also agree with @bascule that publishing a number of libraries from a workspace is painful today. However, I don’t think sublibraries are the best way to address these problems.

withoutboats · August 22, 2018, 5:12pm

@matklad gets pretty good at the heart of the issue: the relationship between packages and crates:

crate: a crate is a unit of compilation for a Rust compiler
package: a package is a set of crates with a single entry in a cargo registry's index

The relationship between these is often muddled. @matklad is right to say that that we have a love/hate relationship with entire "package" idea, but its built into cargo at this point and I can't imagine moving away from it. I think that the current relationship between packages and library crates is a contributor to the muddlement: because a package can only contain 1 library, which is a crate, and because the solution is to create multiple packages and compile them in a workspace, the idea that a package is a crate and if you want to have multiple crates you use a workspace is sort of the most obvious interpretation.

Multiple workspaces are about sharing a version resolution among multiple packages, not multiple crates. And I think workspaces are a heavyweight solution for the problem we're coming at here. The downsides of using a workspace for an internal library is:

That library and its public API become part of your public API that you are responsible for semantically versioning and publishing to crates.io. This is a big maintenance burden.
Setting it up initially takes more work: you have to create a proper manifest file for your new project at least, possibly deal with the [workspace] section.
Now you have to maintain dependency lists between each manifest, which will diverge with time probably, leading to annoyances like "I have to copy my dependency on foobar_baz from Cargo.toml to src/quuxlib/Cargo.toml to get this to compile."

All of these are burdensome for an internal division, and in my experience what most often happens is that users just don't break up their crate, even when they want to, because its not worth the cost. That's very unfortunate: we've made it too expensive to do something good.

In contrast, this proposal is designed around making it extremely cheap to create a new crate, essentially as cheap as creating a new module:

Create a file at src/lib/foobar/lib.rs. You have a new crate!
The rest of your project can depend on it immediately, no additions to your Cargo.toml.
That crate has access to all of the dependencies of the package, you don't need to worry about "exposing" them to it in its manifest.
If you need to, eventually you can make it more complicated by creating its own crate-level manifest for specifying its other dependencies.

Diving into more specifics:

The problem of build scripts for separate crates within a package I think should be solved regardless of this change. I'm not really clear on the benefits of dedicated tests, is it just an organizational thing?

I think this is actually not such a hard distinction in practice. I'm thinking possibly that a sublibrary's manifest would be a Cargo.toml with no [package] section. In that case, all you'd need to do to transition from a sublibrary to a separate package is add [package] to your toml, this is only slightly more heavyweight than adding a version number, at the benefit of making creating the sublibrary initially more lightweight.

I can go into some concrete examples:

I've been working on implementing signed registries in my spare time, which involves expanding the sha256 utility into a whole crypto submodule with sha512 and ed25519 primitives as well. This means adding several dependencies and creating a toplevel crypto module, which is actually totally isolated from the rest of cargo. I'd like to make this its own crate to enforce that division, but for the reasons I enumerated above I did not.
The original seed of this idea actually came from a conversation with Niko. He was complaining that he wanted some "cratelike" boundary inside of his crate, so that he could easily control visibility by saying pub(crate) but actually meaning this particular module, and so on. I argued that what he actually wanted was not a new language feature, but just for subcrates to be as lightweight to make as modules are.
Looking over my own projects, I've never actually wanted workspaces at all (and I've always found creating them a little confusing.. I'm never certain if I've actually made it work at first). One of my largest personal projects, cargonauts, was divided into seven crates, none of which I actually wanted to be separate packages, it was just an internal organizational aspect of the project.

matklad · August 22, 2018, 6:20pm

Yeah. I think that if you have the reason to split a part of the library into a separate crate, you might want to split tests as well? For projects which I am not publishing to crates.io, I use workspace heavily (libsyntax2 is seven crates), and each crate typically has dedicated tests. I also often make use of cd crates/some_crate && hack && cargo test && hack && cargo test workflow.

withoutboats · August 22, 2018, 6:21pm

We could teach cargo about tests for specific crates as well, e.g. having it treat src/lib/$name/tests specially or whatever.

matklad · August 22, 2018, 6:41pm

Yep! The same way, I think we can teach Cargo that libs/mylib/lib.rs creates some kind of virtual default Cargo.toml for a workspace member. Like, I feel we have some kind “either sublibraies or private workspace members” model of thinking, while the end result might actually look almost the same for both approaches?

kornel · August 22, 2018, 6:44pm

Cargo’s path dependencies foo = { path = "../foo" } are IMHO very easy to use and work pretty well for most things except: version bumps and publishing, which are chores that still have to be done per crate.

@matklad’s proposal to remove version from them and bundle them in the package sounds great to me, as it solves two main gripes in one go.

withoutboats · August 22, 2018, 9:12pm

We discussed this in the cargo meeting today, mainly to think about the backward compatibility hazard as it relates to the edition. We determined that this is backward compatible so long as you always somehow specify something in your Cargo.toml section for each subcrate.

jjpe · August 23, 2018, 7:40am

withoutboats:

These comments enumerate some of the design constraints and drawbacks of allowing a single package to contain multiple libraries, but they also contain some motivations for the limitation that I think might be reasonable to challenge at this point.

That is, I think there are good reasons to want to have multiple libraries in a package that are uploaded to crates.io as a single unit:

Proc macro crates: Proc macros have to be contained in a special crate, leading to splits like serde vs serde_derive . With macro re-exporting, though, its possible to expose proc macros through the non-macro crate. In theory, the “derive” crate could be completely eliminated by having it subsumed as a “sub-library” of the main crate.

Internal privacy boundaries: Often, a project will have a subsection which exposes a simple interface for very complex internals. Many items inside that submodule will want to be exposed throughout it but not outside it, leading to a lot of pub(in module) declarations. These could be abbreviated to pub(crate) (or even further if we stabilize crate visibility or similar) by making that submodule its own crate.

Improved internal dependency organization: Crates are required to form a DAG. By breaking off subcrates, you guarantee a certain relationship of dependencies between them, helping you maintain a certain order within your project.

Improved compile times: Crates are compiled separately and in parallel; it can be worth making a module its own crate for that reason alone.

Maybe it's just me, but points 3 and 4 seem more like they argue in favor of keeping the status quo, rather than arguing for allowing multiple libraries in a cargo project.

Specifically, point 3 essentially means no cycles on the crate level. As the only reason for cyclic dependencies I've found to date is if you have 1 coherent whole but the source file grows too large (try editing a 17 KSLOC file as I've recently had to, it's not a nice experience in any editor due to asymptotic scaling of editing algorithms), allowing cyclic imports on the crate level would seem to me a misfeature.

Point 4 says that separate crates are compiled separately and in parallel. This is true, but not the entire truth: the dependency relation between the importing and imported crates means that they will never be compiled in parallel, or at least I don't think they should (as an error in a dependency crate must mean a failure to compile of the dependant crate).

FTR, I do agree with allowing multiple libs in a cargo project for e.g. proc_macros so I'm not exactly opposed to this feature, but it seems like it's easily abused for the wrong reasons.

rocallahan · August 23, 2018, 10:18am

the dependency relation between the importing and imported crates means that they will never be compiled in parallel

This is true but it has to be fixed at some point if Rust compilation speed is to be competitive with C++. I think In principle compilation of an importing crate can be begin as soon as an imported crate has had its interfaces parsed and types resolved.

Returning to the topic, it seems to me that if you break a crate into two crates A and B where A depends on B, you actually slow down non-incremental compilation.

jjpe · August 23, 2018, 9:43pm

The thing is, it is unclear how much that could deliver much in the way of speedups: the last time I heard anything about rustc performance, roughly half of the total time was being spent inside of LLVM to do code generation. Code generation cannot start before type checking finishes successfully, and should not be run at all otherwise.
Given that, it means that all dependencies of a crate C should at least have type checked successfully before type checking C itself can finish, and all dependencies must have code generated before C itself can have code generated. So I think a kind of dependency-based, pipelined approach* might be feasible, but it’s not clear that that would actually improve compile times all that much, given the amount of time spent in LLVM (which cannot be parallellized from what I understand).

*By this I mean that the different stages of compiling a Rust program (i.e. parsing, borrow checking, type checking and name resolution, desugaring, code generation etc) could take into account the crate-level dependency graph and schedule operations such that the appropriate dependencies exist between parsing a crate as well as parsing all of its dependencies, as well as a dependency between each 2 consecutive stages e.g. between parsing a crate C and borrow/type checking C. Compilation as a whole would then walk this mega-dependency graph (which would look like a number of interlinked variants of the crate level dependency graph, 1 variant for each compilation stage) and could find the critical path using this (which is important as it’s pointless trying to parallellize that on this level). It could then also parallellize all the non-LLVM things accordingly, and maybe invoke LLVM as a whole in parallel multiple times on different parts of the code-gen subgraph in a manner reminiscent of SIMD-the-idea (contrast to e.g. any specific CPU implementation).

Topic		Replies	Views
How about changing [lib] to [[lib]], to allow multiple library in a crate? cargo	21	11216	March 25, 2019
Better UX for `cargo install <command>` where the command is provided by another crate cargo	7	755	February 4, 2023
Debian Rust Packaging Policy (draft)	19	7593	March 25, 2019
Pre-RFC: Add macros target to Cargo manifest language design	24	2982	December 28, 2020
Idea: Light-weight reusable dependencies tools and infrastructure	13	1193	October 30, 2022