[Pre-RFC] Resolve support for hyphens in crate names

  • Start Date: 2015-01-17
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Resolve support for hyphens in crate names.

Motivation

Currently, crate names are required by rustc to be valid Rust identifiers except that they may include hyphens. It is my contention that allowing hyphens in crate names provides zero technical benefit, a marginal (at best) aesthetic benefit, and imposes technical and non-technical costs on the compiler and its users. As such, it should be removed from the language if for no other reason than not pulling its weight.

To elucidate on the above, let us first consider the one and only argument in defense of hyphenated crate names which I was able to find:

[Hyphenated crate names] are often more aesthetically appealing and are sometimes more natural to type as well.

For the record, I completely agree with this position. Aside from the entirely subjective benefits, a hyphen requires slightly less effort to type (with US keyboard layouts, at least) than an underscore, as it does not require using the Shift key. This is an extremely flimsy justification, but a reasonable one if there are few or no drawbacks. Such is not the case.

First, hyphenated crate names are not directly usable. Since paths may only contain valid identifiers, users cannot simply use extern crate foo-bar; to link against a hyphenated crate. This is distinct from all non-hyphenated crates, whose names are otherwise valid identifiers. Cargo issue #380 shows that this behaviour is confusing to at least some users in that the toolchain allows them to create crates which cannot be used with the standard syntax taught to them by Rust's own introductory material. In the "Crates and Modules" chapter of the Rust Book, how to use a hyphenated crate name is never mentioned.

Secondly, hyphenated crate names require the definition and maintenance of specialised syntax in the language. Specifically, extern crate "foo-bar" as foo_bar;. It is inarguable that being able to rename crates upon linking is a useful feature, but the following must be considered:

  1. There is no reason for the "true" crate name to be a string literal except for hyphenated crate names. In lieu of those, it could be a regular identifier.
  2. The implied ability to use arbitrary filenames does not, in fact, exist: crates cannot have arbitrary filenames, the only exception is for hyphens in an otherwise valid identifier.
  3. One might expect this to allow for the use of multiple crates with conflicting names, but this is not true in practice, either. Cargo does not allow for the existance of multiple crates with the same name, nor can you specify the filename of a crate's binary.

As such, the "crate renaming" syntax has only two practical functions:

  1. Shortening the name of a crate in lieu of more general path aliasing syntax, a function which is redundant given the ability for use to define path aliases, and
  2. linking against crates with hyphens in their name.

Thus, of its two practical functions, one of them is redundant.

It is also worth emphasising that a hyphenated crate must be renamed on every single use. In other words, the one definite effect of using a hyphen in a crate name is to create additional work for every single user of that crate forever more.

Finally, that crates can contain hyphens and only hyphens in addition to regular identifier characters does not appear to be documented well, if at all. I was unable to find anything in the official Rust Reference or the Rust Book that even noted this in passing.

In conclusion, hyphenated crate names appear to be an almost totally undocumented aspect of the language which continues to exist not for any technical, productivity or expressivity reason, but because some people think they look a bit nicer. I believe the bar for features in the language should be somewhat higher than that, especially for something as foundational as what names are valid for crates, something that will affect almost every single user of Rust going forward.

Detailed design

There are several potential ways to move forward on this. I present them here in no particular order; I believe that so long as something is done about hyphenated crate names, the precise approach is not critical.

Make extern crate Match Fuzzily

This would involve making extern crate match crate filenames inexactly. Specifically, the compiler should treat both the underscore (_) and hyphen (-) in a crate's name as being acceptable matches to an underscore in a crate's import identifier. That is, extern crate bz2_sys; would link against a crate passed to the compiler under the name bz2-sys.

This behaviour would need to be documented in the reference, and introduced when crates are used in introductory material.

Changes should also be made to the Cargo tool and index to ensure that crate names cannot collide (i.e. if there exists a bz2-sys crate, one cannot upload a new crate called bz2_sys).

Make Hyphen Canonical For Names

That is, mandate that hyphens are always used in a crate's name and filename, whilst underscores are always used when that name is represented as an identifier. This is distinct from the previous solution in that there is no fuzzy matching: extern crate bz2_sys; will only match against a crate named bz2-sys.

This behaviour would need to be documented in the reference, and introduced when crates are in introductory material.

Changes should also be made to the Cargo tool and index to ensure that underscores are not used in crate names. This may require a transition period, or possibly just migrating existing crates with underscores in their names and notifying users who attempt to build packages that depend on crates with underscored names.

Remove Hyphens

Mandate that crate names must be valid Rust identifiers. The Cargo tool and index would need to be updated; the cargo tool should stop accepting now-invalid crate names, and the index would likely need to migrate existing crates to new names.

Note On Renaming Syntax

A tangential change would be to remove the crate renaming syntax (i.e. extern crate "bz2-sys" as bz2_sys;) in favour of just using the use A as B; syntax. Renaming library crates would be more useful as a Cargo feature, in any case. Doing so would allow, for example, linking multiple versions of a crate to a program, something the renaming syntax would not be capable of by itself anyway. It would also allow for linking against multiple packages which happen to share a name; this might happen with packages taken from different Git repositories, or conflicts between packages on crates.io and project-local dependencies.

Drawbacks

Due to the use of hyphenated crate names, this would cause a fair amount of breakage, close to the 1.0.0 release.

Although I have tried to make the case for these changes in a reasonably passionate fashion, it is inarguable that this is far from a pressing issue. If nothing is done, it will likely result in the odd confused programmer, questions about whether people should or should not use hyphens, an unending bikeshedding war over the issue with calls to "fix" it for Rust 2.0.0 (because after 1.0.0, changing this will be a backward-incompatible change), and a requirement for more careful language in documentation to explain the distinction. None of these is particularly problematic, except possibly from a PR and programmer annoyance perspective.

Alternatives

Do nothing, as noted in "Drawbacks".

Crate names could also have all restrictions on them removed; after all, if aesthetic concerns are good enough for the hyphen, what about the rest of Unicode? Why not allow ! in macro crates? If Rust is going to have syntax that theoretically allows for arbitrary names (i.e. the string literal in the renaming syntax), why impose restrictions at all? This would also be a backward-compatible change.

Unresolved questions

Should the hyphen get a memorial, or should it be buried without ceremony in an unmarked grave and good riddance? Note: this is an entirely facetious question.

10 Likes

The first time I saw extern crate "rustc-serialize" as rustc_serialize; I had to stare at it for 10 min before I realized that one had a dash and the other an underscore (which was unexpected). It just looked like someone wrote the name twice for some bizarre reason.

Honestly, it reminds me of accidentally trying to run this which also gives a confusing result where the error message isn’t helpful:

print1n!("{}", 3);
//   ^ should be a lowercase L and not a one

Technically, the similarity of characters (- to _) probably makes it a bad idea in general but this is an infrequent operation so it’s less of a problem.

I agree that allowing hyphens and only hyphens adds complexity and confusion, and does not carry its weight.

1 Like

Seems like hyphens in crate names is really a wart. The fact that it has to be special-cased with the string syntax is needless complication to the language.

Either go in the direction of “crates can be named with spaces in them and all kinds of crazy characters” or make them valid identifiers

1 Like

I think it's definitely worth considering that there are 588/1284 (46%) crates using a hyphen in their name right now on crates.io. If this RFC recommends forbidding it entirely, it would probably want to think through a more concrete migration plan as the current wording is a bit vague with:

The Cargo tool and index would need to be updated; the cargo tool should stop accepting now-invalid crate names, and the index would likely need to migrate existing crates to new names.

1 Like

Personally, I’m emotionally in favour of just nuking them entirely, since it saves having to ever explain the rules. But the fuzzy matching and hyphen canonicalisation ideas actually came directly out of concerns that actually migrating all those crates would be a bit completely hideous.

As for the migration strategy in that particular case… yeah, I don’t know what would be most tractible for crates.io. Now that I think about it, as soon as you bring integration tests into the picture, any sort of automated migration is probably out of the question.

I personally use hyphens for all my multi-word crates, and strongly prefer it to underscores for basically entirely aesthetic reasons.

@alexcrichton what percentage of crates contain an underscore, and what percentage contain neither?

One possible strategy would be to bake logic into cargo that maps foo-bar onto --extern foo_bar for example. We could keep this for a limited time (or perhaps permanently!) while making changes to the compiler as well.

Another strategy would be to rename all crates on crates.io at once after we bake in support to Cargo itself to map foo-bar to foo_bar (for all packages). Whenever cargo would map a package name it would issue a warning, and eventually this behavior could be removed (before 1.0)

Just a few possible ideas though, I'm sure there's more!

Some stats: (all learned from GitHub - rust-lang/crates.io-index: Registry index for crates.io)

  • packages with an underscore: 154/1285 - 12%
  • packages with a hyphen: 589/1285 - 46%
    • *-sys packages: 476/1285 - 37%
  • packages without a hyphen or underscore: 572/1285 - 45%

There's definitely a bit of a skew with *-sys packages for the hyphens (I think @retep998 uploaded a good number recently) so the numbers should probably be taken with a grain of salt, but those are the stats as of right now :slight_smile:

1 Like

Accounting for the ~400 -sys packages uploaded by @retep998, that leaves about the same frequency of _ and -, with names with neither making up the vast majority.

I suggest we recommend single-word names, falling back to hyphens only when necessary (rarely).

It’s incredibly surprising to me that we allow hyphens in crate names at all. The argument for hyphenated names being easier to type or just looking better doesn’t really apply to what is probably the main use of crate names—extern crate statements—which are both harder to type (due to being longer and containing quotes) and also look significantly less clear when using a hyphenated crate name.

If we do disallow hyphens in crate names, I think the renaming syntax should remain, but accept an identifier instead of a string literal (e.g., extern crate foo as bar;).

1 Like

As the owner of 420 of those hyphens, I personally like how hyphens look in the crate name. kernel32_sys just looks wrong. A few of my crates do have underscores, but that’s only because the library name has underscores in it, for example aux_ulib-sys.

Regardless of whether you settle on hyphens or underscores or both, and possibly auto-mapping of hyphens to underscores, Cargo needs to have a way to rename dependencies in Cargo.toml.

EDIT: Also I have some libraries that in reality have periods in their names, but due to naming limitations, I am had to rename the periods to hyphens. windows.data.pdf.lib -> windows-data-pdf-sys. Very unfortunate.

This is where I'm conflicted. Part of me wants to bat for completely removing restrictions so you can name a crate whatever you want. Another part of me wants crate names to be identifiers simply because it's really conceptually simple.

I'm hesitant to suggest expanding the fuzzy matching idea to other punctuation; I'm concerned that it'll be a weird corner case that might trip up newcomers as it is.

The question is, do we want to allow crate names to be emoji? Considering there’s no namespacing on crates.io, we might just have to allow arbitrary unicode in crate names.

When I was naming a couple crates (not on crates.io yet), I first used hyphens to separate words because I think they look a little better than underscores.

I tried using extern crate crate-name, got a compile error, and had to do multiple Google searches to find the right syntax. Even if it was added to the beginner tutorial, it would be fairly easy to forget by the time you end up using a crate with a hyphen. When I found out the special syntax needed for crates with hyphens, I renamed the crates to use underscores.

I think the complications of having crate names not be identifiers or of using fuzzy matching outweigh their benefits.

So thus far, the opinions on this are roughly summarised as:

  • Hyphens in crate names are bad, something should be done about them: 5.
  • Hyphens in crate names are pretty: 2.

Just to note: this is not to denigrate anyone’s opinion. It’s just that no one who said they liked how hyphens looked commented on the practical problems they cause. Again, literally the only defense of hyphens in the abstract is that they look nice.

As for expressed opinions on possible solutions…

  • Forbid them: 2.
  • Nothing, just recommend against them: 1.
  • Not stated: 4.
  • N/A: 1.

I hardly feel this is a good sample size. :stuck_out_tongue: I was hoping to get some feedback to maybe help guide which specific proposal to go with, but I think that would be premature. I’m going to post this to the Reddit later today, and see if I can’t get some more input on this.

At present, I’m leaning toward a more well-defined fuzzy matching, where crate names are canonicalised into identifiers by replacing all punctuation symbols with underscores and continuing to forbid non-ASCII letter and digit codepoints. The idea being that symbols that Rust is unlikely to ever support in identifiers get turned into underscores, whilst letters and digits that might be allowed in future don’t.

I think hypens should be removed and forbidden. I also think that this should be changed quickly before there are even more crates with hypens in the name. People are used to rewrite a lot of code every some days at the moment. Later on that would be even more of a pain to do. Using hypens also looks a bit like simulating namespaces to me: rustc-something?

2 Likes

There is also an aesthetic reason I’m generally against the hyphens: If you have to use extern "..." as "...", it mixes hyphen-fixes with actual renames. Though it’s not a big one, it just bugs me a bit.

That being said, has it been proposed to just give extern crate an identifier and a quoted string form, with the latter having a simplifier? Examples:

// loads foo_bar as foo_bar
extern crate foo_bar;

// loads foo-bar as foo_bar
extern crate "foo-bar"

// loads foo_bar-sys as foo_bar_sys
extern crate "foo_bar-sys";

This could probably be extended to other characters, like the above mentioned dot:

// loads windows.data.pdf.lib as windows_data_pdf_lib
extern crate "windows.data.pdf.lib";

Though at that point, I’d expect the deciding factor will be what cargo can support for crate identifiers. At least the dot seems semantically important there.

1 Like

I agree that dashes are prettier than underscores, but I think we should keep crate names consistent with identifiers - we should forbid dashes entirely. I’m against anything like fuzzy matching, because it seems like a hack and it adds needless complexity to the language. Forbidding dashes will result in some churn, but I think it’s preferable to the alternatives.

2 Likes

I’m in favor of getting rid of hyphens on the condition that I can use namespaces on crates.io. If I can turn something-sys into winapi/something then I will be happy (with some bikeshedding over winapi vs winsdk vs wsdk vs win32 vs win vs windows. Gah, I’ll just start a poll when you guys decide to add namespaces and get rid of hyphens).

3 Likes

+1 for forbidding hyphens entirely. I think we should all just use underscores and accept the minor aesthetic burden.

For a typical chunk of Rust code, we already have the following names:

  • Source code repository name - can contain hyphens.
  • Cargo package name - as declared in Cargo.toml and used on crates.io.
  • Crate name(s) - should be identifiers.

I think unifying these names for simple packages is desirable. Bubbling up the crate name makes the most sense, and is least surprising for users.

1 Like