Crates.io "first-come, first-served" for plain crate names - why and how should it be changed?

I would like to point out that this behavior is pretty confusing, especially for anyone who just casually reads the Cargo.toml and expects package names to be the same as paths in the source. And some tools have problems with it, like cargo-machete for instance.

2 Likes

As part of brainstorming what( could be done, cargo new foo could create a manifest like:

[package]
name = "$USER-foo"
...

[lib]
name = "foo"

I've not dug too much into splitting package.name and lib.name before

  • What does renaming do when the names are split like this?
  • How do we help ensure the use name is discoverable? One-to-ones between declaration and use of anything I think is important for understanding code. In particular for this case, there is adding a dependency, trying to look up a the dependency from the code you are looking at, etc
  • If we de-emphasize package.name, we likely will also need to emphasize each bin.name

Note while it would be trivial to pick up on custom lib.name for registries, anything more (bin.name, whether a package has a lib, etc) would require the registry re-implement the auto-target discovery code from cargo. I have been toying with the idea of finding a way to perform all auto-discovery as part of cargo package (and disabling the auto fields) so crates.io wouldn't have to re-implement anything.

If you specify a dependency as name = { package = "package" }, the crate name used in code is always name, regardless of what the package has set as lib.name.

If you don't set an explicit lib.name, the behavior is exactly that lib.name is set to package.name with any - replaced with _ with no other different behavior.

We already don't have this property, so if it's considered important, then there should be a warning for setting an explicit lib.name.

But I actually agree; many people don't actually know that package.name and lib.name can be set independently, so running into packages which do so tends to be surprising (and because of this, people tend to not set lib.name even if it would improve use of the crate; e.g. nalgebra recommends consumers to use extern crate nalgebra as na; and extern crate nalgebra_glm as glm;).

At one point I toyed with just always using the { package = "package" } form of dependencies, to make the fact that package and lib name are distinct more immediate. In a world where package names being different than lib names were the norm from the beginning, I'd expect this to be the only way to declare dependencies, and the lib.name key would essentially be what cargo add uses when first adding a package.

It's already fairly common for bin.name to be different than package.name. cargo already isn't really meant to be a binary package repository, but rather a library one.

I think a reasonable heuristic would be to display as

  • bin.name (package.name) if a single bin.name is set explicitly and no lib.name is set,
  • lib.name (package.name) if lib.name is set explicitly,
  • package.name if no lib.name or bin.name are set, and
  • package.name if multiple bin.name are set but no lib.name is set.

This gives developers a reasonable amount of control to get the result they prefer.

This also leads to thoughts of what it would look like for a package to contain multiple lib crates versioned, packaged, and distributed as a single unit. Existing discussion of such has mostly limited itself to encapsulated/private crate units, since in the case of multiple public crates it's mostly sufficient to just publish them as separate crates with = dependencies to keep them in lockstep.

2 Likes

What if two crates use the same display name of "serde"? That would be very confusing as they are incompatible with each other even if they have the exact same source. You can't take data structures with a Serialize impl of one version and use them with a serializer that uses the other crate that had a display name of serde. The current method that forces a different crate name prevents this confusion.

That's what happens when a feature is hidden, while available to hard-core users, isn't well documented or promoted and thus remains unnoticed and doesn't get potential problems handled.

2 Likes

One might be serde (serde) and the other would be serde (myorg-serde) or serde (serde-5678). But yeah, I know there're people who don't like the idea, just like how they think different users sharing the same (display) name would confuse them.

I found the Rust concept of crate/package extremely confusing.

I'll try to show what I got from The Rust Programming Language 7.1. Packages and Crates (a), 14.3. Cargo Workspaces (b), The Rust Reference 14. Linkage (c), The Cargo Book 3.2. The Manifest Format (d), 3.2.1. Cargo Targets (e) and 3.3. Workspaces (f), and how it confused me. Please correct me if I made a mistake.

  1. A crate is the smallest amount of code that the Rust compiler considers at a time. [a]
  2. A package is a bundle of one or more crates that provides a set of functionality. [a]
  3. Each Cargo.toml file defines a package. [a][d][f]
  4. A workspace is a set of packages that share the same Cargo.lock and output directory. [b]
  5. You can define a workspace in a Cargo.toml file that consists of one or more packages that are defined by their respective Cargo.toml files. [f]
  6. The [lib] section in Cargo.toml specifies the library target which defines a “library” that can be used and linked by other libraries and executables. [e]
  7. There's a field called crate-type under this [lib] section, which controls how the compiler generates artifacts. [c]
  8. Cargo packages consist of targets which correspond to source files which can be compiled into a crate. The list of targets can be configured in the Cargo.toml manifest. [e]
  9. You can only specify one library (target) for each package (Cargo.toml). [e]
  10. A package can contain as many binary crates as you like, but at most only one library crate. [a]

...

I mean, WTF? From what I've seen, the term crate sometimes means build unit, sometimes package, sometimes library, sometimes target, which is really ambiguous.

I'll take the file hierarchy of repo github.com/hawkw/mycelium for example.

  • The root Cargo.toml defines a workspace, with subdirectory bitfield being one of its members.
  • The root Cargo.toml defines a root package named mycelium-kernel
  • The root Cargo.toml specifies a [lib] target with the name mycelium-kernel for the workspace. I suppose this name specification can be omitted?
  • The Cargo.toml under subdirectory bitfield defines a package named mycelium-bitfield.
  • Subpackage mycelium-bitfield specifies no target, but it is published on crates.io individually.
  • According to doc example, you use mycelium-bitfield by use mycelium_bitfield;
  • cordyceps is mycelium-bitfield's peer subpackage under the subdirectory cordyceps.
  • If I installed cordyceps along with mycelium-bitfield, what is the target lib? Is it cordyceps + mycelium-bitfield, or is it mycelium-kernel as specified by the [lib] section in the root Cargo.toml of the workspace? Should I use cordyceps; use mycelium_bitfield;, or use mycelium_kernel;? Why? (According to various people here, specifying lib.name = "mycelium-kernel" would result in you use mycelium_kernel;)
  • What would happen if I specified lib.name = "bitfield" in the Cargo.toml under subdirectory bitfield? Would that even compile?
  • If you have installed mycelium + mycelium-bitfield, does that mean you can access mycelium-bitfield via either mycelium-kernel or mycelium-bitfield?

From The Rust Programming Language 14.3. Cargo Workspaces:

The workspace has one target directory at the top level that the compiled artifacts will be placed into; the adder package doesn’t have its own target directory. Even if we were to run cargo build from inside the adder directory, the compiled artifacts would still end up in add/target rather than add/adder/target .

From The Rust Reference 14. Linkage:

With all these different kinds of outputs, if crate A depends on crate B, then the compiler could find B in various different forms throughout the system. The only forms looked for by the compiler, however, are the rlib format and the dynamic library format.

So here are my questions:

  • what does the lib.name field really do?
  • how do we know the boundary of a crate, as in the smallest amount of code that the Rust compiler considers at a time?
  • what is the most accepted definition of the term crate?
  • how does single-repo-multiple-published-packages(crates) work?
  • how should we understand all these concepts?
1 Like

I'd be interested to see the history but it didn't seem too bad? A crate is a Cargo.toml with a [package] section, so it seems pretty clear to me that "crate" is just a cutesy name for a package.

Since the only thing you can depend on with cargo is a library target of a crate, and you can only have one library target for a crate, there's no distinction drawn between depending on a crate and depending on it's target.

As I understand things:

  • lib.name is the name that you use in source code
  • You follow the included mod statements from the root file for the target, eg. main.rs vs lib.rs by default. These can use cfg and path attributes, so there's no simpler accurate method.
  • As I mentioned, there's little need to distinguish between crates as proper packages and their library targets, so the two are generally conflated depending on context.
  • Generally a monorepo of packages well use workspaces so they can all be developed with dependencies on each other locally without publishing, but at publish time they are treated as independent crates (paths in dependencies are removed, etc) and crates.io doesn't care. The only other time this comes up is git dependencies, where you have to provide the repository root, and cargo tree searches for the package name (which I find a bit lame)
  • In general, if you need to worry about this, you've probably done something wrong! Having multiple targets in a crate is somewhat less favored now, workspaces cover the same need with less confusion.
  • There exists a glossary which may help.
  • Generally I don't think a normal user need to know these concepts all at once. It's the proposal trying to solve a deep-rooted issue so the author need to understand more than usual.

Maybe we could have a "cargo-by-example" book teaching people common patterns to setup projects, so that they can check example-style doc instead the current reference-like doc. That is also a way to "promote" features.

Check The Rust Programming Language 14.3. Cargo Workspaces:

The workspace has one target directory at the top level that the compiled artifacts will be placed into; the adder package doesn’t have its own target directory. Even if we were to run cargo build from inside the adder directory, the compiled artifacts would still end up in add/target rather than add/adder/target .

From what I've learnt, you can only specify 1 library target for the entire workspace, or am I wrong about that?

Because I've only seen examples of workspaces that has one [lib] section specified in the root Cargo.toml.

So if I specfied a [lib] section in a subpackage Cargo.toml,

  • will that be ignored by Cargo because the subpackage doesn’t have its own target directory?
  • or is it valid specification about the target of the subpackage (e.g. lib.name that controls the name you use in source code), and the only thing ignored by Cargo is the target directory, so compiled artifacts will not be placed into the subpackage's target directory, even if we have these lines:
[lib]
name = "subpackage_name"

in /subpackage/Cargo.toml, while we have

[lib]
name = "rootpackage_name"

in /Cargo.toml?

Let me rephrase my questions:

  • Is it true that while you can have only 1 target directory specified according to workspace definition, each package can specify its own library target (that would have the compiled artifacts be placed together in the workspace target directory)?
  • If you install a package from crates.io, whose Cargo.toml defines a workspace, how does that workspace interact with your own workspace? Or is that workspace definition in the Cargo.toml file from the crate/package repo removed during the cargo package process?

Yes, one target directory, where all the targets are placed on build; and workspaces are removed from crates on publish, they're a completely local concept, at least so far as I know!


Keep in mind also you don't need to have a package at the root of a workspace, that is you can have a Cargo.toml with only a [workspace] section.

Ah, right, I missed it, from The Cargo Book 4.5.4. cargo package DESCRIPTION

This command will create a distributable, compressed .crate file ... This performs the following steps:

  1. ...

  2. Create the compressed .crate file.

  • The original Cargo.toml file is rewritten and normalized.
  • [patch], [replace], and [workspace] sections are removed from the manifest.

So in the end it is possible to have a [lib] section for each subpackage, since target ≠ target directory so a subpackage can have its own lib target, and it is no longer a subpackage when it's packaged for pubish. It just happened that I failed to find a good example showing how this could be done.

And this can be further simplified to:

  1. :jigsaw:lib.name (package.name), when a lib target exists, no matter whether lib.name is set explicitly or not, (lib.name defaults to package.name)
  2. then :cd:bin.name (package.name) (if we haven't got the name from previous step) when a single bin target exists, (bin.name for src/main.rs defaults to package.name. Caution: if multiple bin targets exist and only one has its name set explicitly, I don't think that the name explicitly set should be the "representative name" for a "bin-type crate")
  3. then :package:package.name for multi-bin packages

I think the usage of such icons is good for distinguishing different types of crates. Currently I don't know how to tell a lib crate from a non-lib crate on crates.io at the first glance!


The only thing that isn't intuitive enough is that lib.name overrides everything.

But again, most people go to crates.io to look for libraries, and it would have already been confusing to them if a "crate" has a package.name (crate name), a lib.name and bin.names which differs from each other.

And I think there's a benefit if developers are discouraged from publishing packages with both a [lib] target and [bin] targets. They don't look intuitive even now without a crates.io ui change!

On the other hand, a developer who can understand the relationship between bin.name, lib.name and package.name would have no difficulty with the new crates.io ui mentioned above.


Whether we should omit the "(package.name)" part, if that name equals the display-name (either lib.name or bin.name) before it, is a matter of ui design.

I agree that we habitually conflate the concept of a package/dependency and the library crate which it exports, and don't do much of anything to prevent this. Especially with edition2018 paths and the extern prelude removing the need for extern crate, the concept of crates as the unit of coherence and compilation, especially w.r.t. how it differs from the package as the unit of versioning and distribution, is an advanced topic that most developers manage to go without clarifying.

This, plus the annoying fact that English words can mean more thing depending on context (and sometimes be ambiguous even with context) means that the answer is usually "it's complicated." Packaging is complicated, and as much as Cargo has done to make the common case simple/easy, it also has a tendency to make the complexity cliff when you step outside the goldilocks zone feel even steeper.

2 Likes

I wonder, now that we have come up with a draft of the change, should I file an issue here referring to this thread, or wait for the rust dev team to do that?

I mean, are rust dev team members collecting feedbacks on this forum?

While on the surface, this can be viewed as just a UX change in crates.io, I think this is a major change to how to view crates and has impact on cargo documentation, likely cargo new, etc. Personally, I lean towards this going through the RFC process, starting with a more formal PreRFC here.

3 Likes

Ok then, what can I do to help for now?

One use case that I think we should be careful of with bin crates is sometimes the bin crate is subordinate to the lib crate within the package, rather than the other way around.

pulldown-cmark is a great example of this though in that case the explicit bin crate is names the same as the package.