Maybe pre-RFC: improving "-"/"_" in cargo/crates.io

Context

I undusted a older crate of mine and published it on GitHub and crates.io. To keep with the style of GitHub and most new crates I renamed it to checked-command (from checked_command) I just did a small mistake: I forgot to rename it in the Cargo.toml name field…

The Problem

Once a crate is published with _ instead of - (or the other way around) there is no way to fix this. Publishing a new crate with the fix won’t work, as the name collides with the existing crate. Removing or renaming the existing crate is also not possible as it would brake potential dependent projects.

The Actual Problem

For imports (e.g. extern crate) rust does not differ between _ and - in crate names (it’s always _) so I was surprised to noticed that it matters in the [dependencies] section for crates.io dependencies.

This is also a “problem” for older crates which might want to but cant update their naming convention, as well as new users (through the other way around) which might see a extern crate some_think in the code and then get surprised, that adding a some_think dependency won’t work (as it e.g. expects some-think)

Solution

  1. Tread _ / - the same in cargo allowing user to e.g. write lazy-static even through it’s actually lazy_static or some-crate even through it’s actually some_crate
    • through this should not be limited to Cargo.toml but apply to cargo+crates.io in general
  2. At some point maybe allow renaming _ <->-
    • this might [1] still brake dependencies for users using outdated cargo versions and support for it is, I think, less important then 1.

[1]: I don’t know enough about cargo/crate.io’s inner working to judge this

Sidenote

It might be generally a good idea to consider adding renaming support independent of this as there might be other reasons to rename a crates, e.g. trade mark problems or you noticed that the acronym/name you used has another very inappropriate meaning (e.g. sexaul meaning, racistic meaning etc.).

(note that implementing general renaming support would be noticeable more complex than _ / - support, as cargo can detect such renaming even with out being told about (e.g. seeing a crate named a-b while looking for a_b but no a_b tells cargo everything it has to know, but for general renaming cargo needs to be told what the new name is)

6 Likes

It seems the crates.io web UI does not differentiate between underscores and hyphens: serde-derive and serde_derive both give the serde_derive crate. I have mistakenly put serde-derive in my cargo.toml before and been disappointed in the error cargo gave.

I don’t know enough about cargo to know what the implications of changing this would be, but I do know that Python/PyPI considers underscores and hypens to be equivalent in package names[0][1] and that works well.

1 Like

I believe it would be possible to do this without touching cargo, just changing crates.io to add both versions of the name to the index and run a migration doing that for all existing crates. That would avoid any backwards compatibility issues with old cargo versions now.

When cargo is updated to a new index format (which I believe is almost certainly going to happen at some point) then that could be a clean break to say “after this point all tools MUST treat - and _ as equivalent in package names”.

One question is what to do with crates that use both characters in their canonical name, quine-mc_cluskey for example; https://crates.io/crates/quine_mc-cluskey works as a url to access it as well, so should we allow any combination of - and _ in the package names?

I'm not sure if it works without cargo and/or is a good idea

  • e.g. if two dependencies use some-crate (with compatible versions) cargo could (does?) do some de duplication only statically linking in one version, through if one gets shipped as some_crate cargo wouldn't know that it is the same.
  • it won't work with path/git dependencies (through the focus lies on crates.io)
  • you would have to have two "packages" for each crate one with a Cargo.toml file with - and one with _ in the name field
  • it won't scal to packages with mutiple -,_

If cargo knows about -/_ this would not be a problem at all, as a request for some_crate-name could just return a crate with some-crate_name in it's name field and cargo would be fine with it.

  • wrt. to the implementation of the index: a straight forward approach would be to normalize crate names in queries by e.g. replacing all - with _. Then queries would return a "display name" or hyphon mask additional to meta data like the version, author etc.
    • processes like cargo add would also alow a query with -/_ in any combination but add whatever is specified through the "display name"/name+hyphon mask to the Cargo.toml file

I agree and would like to see Cargo treat “-” and “_” as equivalent. Relevant Cargo issue:


https://github.com/rust-lang/cargo/issues/2775

7 Likes

I too would prefer if the choice of - vs _ were irrelevant. I find it annoying to remember whether some particular crate chose to use _ or not.

6 Likes

I can’t think of any advantage to the current behavior over this; it seems like a bugfix even.

2 Likes

+1, the inconsistency and having to remember is annoying. Better to collapse them everywhere.

2 Likes

Note that on the linked issue, it is revealed that the “-”/"_" agnosticism of crates.io is part of a more general feature in which crates.io behaves differently from cargo: Case insensitivity.

I’m all for treating -/_ the same in cargo—mostly because I hate underscores with a passion—but if the goal is to make crates.io and cargo consistent, then we should perhaps consider whether we also want to allow the following:

[dependencies]
SerDe = "1.0"
1 Like

Not really. You could end up in situation where you have a crate called say a-crate, which defines the following.

trait Trait {
    fn a_method(&self);
}

Then you have another crate called something which uses a_crate and implements trait Trait for a some struct it.

And then there is another crate which uses a-crate and something. It tried to use a Trait implementation provides by something, but because a_crate and a-crate are different crates, a confusing version incompatibility occurs, even if the same version of a-crate/a_crate is used.

it seems that there is a general agreement that this is a good idea and that it is not possible to do without changing Cargo.

through there are still some open questions, mainly how far to go with treating “-”/"_" as the same character:

  • for crates.io dependencies (yes, that’s what this thread is about :wink: )
  • for “git” dependencies? (probably)
  • for “path” dependencies? (probably)
  • for all dependencies??
    • while I think it is a good idea to also do so for git and path dependencies to prevent any surprises I’m not sure if standarizing it for all possible crate sources is a good idea, e.g. this might lead to problems if cargo at some point supports some not language specific crate source, through then this could be changed if it ever comes to this hypothetical scenario

Also if it treats “-”/"_" the same for package dependencies, maybe it should do so also for other dependencies:

  • cargo features?
    • i.e. “some-feature” == “some_feature”?
    • this would not only needs changes wrt. cargo but also rustc
  • some fields in Cargo.toml?
    • e.g. badges, allowing both travis-ci and travis_ci
  • all cargo fields?
    • probably not, especially wrt. to custom metadata fields this could cause trouble

Also like @ExpHP mentioned crates.io treats crate names case insensitive, which could also be adapted.

  • But given that rust crates shouldn’t ever contain any capital letters this is maybe something where we might want to keep it the current way (and it’s probably also the reason why crates.io converts the names to lower case)

EDIT: for clarification with for “git” / “path” I mean for crate names specified through “git”/“path”, not the whole url/path as this hardly makes sense :wink:

1 Like

I like the idea of normalizing _ and - in crate names!

I'm not sure how that could work?

If I specify a dependency with

[dependencies]
foobarbaz = { git = "https://github.com/rust-lang-nursery/foo-bar-baz" }

should Cargo then try downloading all four names (foo-bar-baz, foo_bar-baz, foo_bar_baz, and foo-bar_baz)? What about the - earlier in the path, are they also normalized?

Cargo will have to probe the server for the right name, which generates unnecessary requests and takes extra time.

Treating _ and - the same makes sense in a crate name, which is sent to a registry like crates.io. There some server-side logic can do the normalization. I don't think it makes sense in other contexts since it will be the client that will have to attempt to do the normalization.

Yes, I was not clear enough with what I mean.

I meant crate names in context of crates specified through git/path and possible other ways introduced in the future.

I completely agree that treating _/- the same in paths or url’s for git does hardly make any sense.

1 Like

I originally wanted to push a RFC yesterday (treating _/- the same for crate names in any context) but noticed this is a braking change, at last for cargo.

Currently people can have a dependency to both some-crate and some_crate in cargo, which won’t compilet except if both are optional and only one of them is ever used at the same time (oh and at last one of them has to be specified through git or path).

This is quite a unlikely scenario but still has to be considered, solution include:

  1. still proceed breaking it, as I believe no one uses this (but then who knows)
  2. Detect such situations and behave the old way if found (which would make the code more complex and creates a wtf!? situration if someone accendentatly stumbles over it in the future aka technical and design dept)
  3. Detect such situation and warn+behave in the old way for some time, then turn the warning in a error

Another think I stumbled about is that extending it to features names makes a lot of sense (but is breaking as you can have both a-b and a_b as a feature)

The reason for this is that when declaring dependence on other features it is mixed with the dependence on optional crates:

some_feature = [ "some-crate", "some_feature" ]

Here some-crate could also be some_crate but some_feature has to use the _ / - correct, which could be confusing.

Again this would be breaking if some one has both some_feature and some-feature which is quite unlikely but not impossible and probably more likely then the case of with crate names (as you can use both names for features in parallel).

My current plan would be to mention this in the RFC but postpone any decision about it to a further RFC.

Using the crates.io index one could write a script checking whether any two crates exist, that only differ by a “-” or a “_”. This would imho be the most proper solution.

crates.io already treats -/ _ equally (maybe expect in some cases?) therefore there can’t be two crates having the same name except in - / _ on crates.io.

The script still could find collisions based on non crates.io-dependencies in crates on crates.io, but then uploading a crate with git/path dependencies is (I think) rarely done.

So if something brakes it’s likely in a non-published crates.

1 Like

As I understand it, you can't actually depend on a git or path dependency in crates.io. You can upload a crate with both a git/path dependency and a version, in which case it will use the crates.io crate at that version.

This is so that none of the crates on crates.io fail to build because some git source stopped being served, etc.

I figure in the feature section, cargo would require sensitivity - that the feature name matches exactly the dependency name.

For instance, both of these would be allowed:

[dependencies]
tokio-core = { version = "0.1", optional = true }

[features]
sync = ["tokio-core"]
[dependencies]
tokio_core = { version = "0.1", optional = true }

[features]
sync = ["tokio_core"]

and the following would be an error:

[dependencies]
tokio-core = { version = "0.1", optional = true }

[features]
sync = ["tokio_core"]
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.