- Start Date: 2015-01-17
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Summary
Resolve support for hyphens in crate names.
Motivation
Currently, crate names are required by rustc
to be valid Rust identifiers except that they may include hyphens. It is my contention that allowing hyphens in crate names provides zero technical benefit, a marginal (at best) aesthetic benefit, and imposes technical and non-technical costs on the compiler and its users. As such, it should be removed from the language if for no other reason than not pulling its weight.
To elucidate on the above, let us first consider the one and only argument in defense of hyphenated crate names which I was able to find:
[Hyphenated crate names] are often more aesthetically appealing and are sometimes more natural to type as well.
For the record, I completely agree with this position. Aside from the entirely subjective benefits, a hyphen requires slightly less effort to type (with US keyboard layouts, at least) than an underscore, as it does not require using the Shift key. This is an extremely flimsy justification, but a reasonable one if there are few or no drawbacks. Such is not the case.
First, hyphenated crate names are not directly usable. Since paths may only contain valid identifiers, users cannot simply use extern crate foo-bar;
to link against a hyphenated crate. This is distinct from all non-hyphenated crates, whose names are otherwise valid identifiers. Cargo issue #380 shows that this behaviour is confusing to at least some users in that the toolchain allows them to create crates which cannot be used with the standard syntax taught to them by Rust's own introductory material. In the "Crates and Modules" chapter of the Rust Book, how to use a hyphenated crate name is never mentioned.
Secondly, hyphenated crate names require the definition and maintenance of specialised syntax in the language. Specifically, extern crate "foo-bar" as foo_bar;
. It is inarguable that being able to rename crates upon linking is a useful feature, but the following must be considered:
- There is no reason for the "true" crate name to be a string literal except for hyphenated crate names. In lieu of those, it could be a regular identifier.
- The implied ability to use arbitrary filenames does not, in fact, exist: crates cannot have arbitrary filenames, the only exception is for hyphens in an otherwise valid identifier.
- One might expect this to allow for the use of multiple crates with conflicting names, but this is not true in practice, either. Cargo does not allow for the existance of multiple crates with the same name, nor can you specify the filename of a crate's binary.
As such, the "crate renaming" syntax has only two practical functions:
- Shortening the name of a crate in lieu of more general path aliasing syntax, a function which is redundant given the ability for
use
to define path aliases, and - linking against crates with hyphens in their name.
Thus, of its two practical functions, one of them is redundant.
It is also worth emphasising that a hyphenated crate must be renamed on every single use. In other words, the one definite effect of using a hyphen in a crate name is to create additional work for every single user of that crate forever more.
Finally, that crates can contain hyphens and only hyphens in addition to regular identifier characters does not appear to be documented well, if at all. I was unable to find anything in the official Rust Reference or the Rust Book that even noted this in passing.
In conclusion, hyphenated crate names appear to be an almost totally undocumented aspect of the language which continues to exist not for any technical, productivity or expressivity reason, but because some people think they look a bit nicer. I believe the bar for features in the language should be somewhat higher than that, especially for something as foundational as what names are valid for crates, something that will affect almost every single user of Rust going forward.
Detailed design
There are several potential ways to move forward on this. I present them here in no particular order; I believe that so long as something is done about hyphenated crate names, the precise approach is not critical.
Make extern crate
Match Fuzzily
This would involve making extern crate
match crate filenames inexactly. Specifically, the compiler should treat both the underscore (_
) and hyphen (-
) in a crate's name as being acceptable matches to an underscore in a crate's import identifier. That is, extern crate bz2_sys;
would link against a crate passed to the compiler under the name bz2-sys
.
This behaviour would need to be documented in the reference, and introduced when crates are used in introductory material.
Changes should also be made to the Cargo tool and index to ensure that crate names cannot collide (i.e. if there exists a bz2-sys
crate, one cannot upload a new crate called bz2_sys
).
Make Hyphen Canonical For Names
That is, mandate that hyphens are always used in a crate's name and filename, whilst underscores are always used when that name is represented as an identifier. This is distinct from the previous solution in that there is no fuzzy matching: extern crate bz2_sys;
will only match against a crate named bz2-sys
.
This behaviour would need to be documented in the reference, and introduced when crates are in introductory material.
Changes should also be made to the Cargo tool and index to ensure that underscores are not used in crate names. This may require a transition period, or possibly just migrating existing crates with underscores in their names and notifying users who attempt to build packages that depend on crates with underscored names.
Remove Hyphens
Mandate that crate names must be valid Rust identifiers. The Cargo tool and index would need to be updated; the cargo
tool should stop accepting now-invalid crate names, and the index would likely need to migrate existing crates to new names.
Note On Renaming Syntax
A tangential change would be to remove the crate renaming syntax (i.e. extern crate "bz2-sys" as bz2_sys;
) in favour of just using the use A as B;
syntax. Renaming library crates would be more useful as a Cargo feature, in any case. Doing so would allow, for example, linking multiple versions of a crate to a program, something the renaming syntax would not be capable of by itself anyway. It would also allow for linking against multiple packages which happen to share a name; this might happen with packages taken from different Git repositories, or conflicts between packages on crates.io
and project-local dependencies.
Drawbacks
Due to the use of hyphenated crate names, this would cause a fair amount of breakage, close to the 1.0.0 release.
Although I have tried to make the case for these changes in a reasonably passionate fashion, it is inarguable that this is far from a pressing issue. If nothing is done, it will likely result in the odd confused programmer, questions about whether people should or should not use hyphens, an unending bikeshedding war over the issue with calls to "fix" it for Rust 2.0.0 (because after 1.0.0, changing this will be a backward-incompatible change), and a requirement for more careful language in documentation to explain the distinction. None of these is particularly problematic, except possibly from a PR and programmer annoyance perspective.
Alternatives
Do nothing, as noted in "Drawbacks".
Crate names could also have all restrictions on them removed; after all, if aesthetic concerns are good enough for the hyphen, what about the rest of Unicode? Why not allow !
in macro crates? If Rust is going to have syntax that theoretically allows for arbitrary names (i.e. the string literal in the renaming syntax), why impose restrictions at all? This would also be a backward-compatible change.
Unresolved questions
Should the hyphen get a memorial, or should it be buried without ceremony in an unmarked grave and good riddance? Note: this is an entirely facetious question.