Cargo package aliases


#1

I just accidentally used error_chain instead of error-chain in my Config.toml. I was wondering if error_chain should be reserved, so that if you try to use it, you get an error saying to use error-chain instead. If someone used the name error_chain and implemented different functionality, you may end up with unexpected errors and wasted time. Or, someone may upload a module that looks like error-chain, but does something malicious.

So would it be advisable, for certain popular packages, to reserve aliases for those packages (I think the most common error is using the wrong one of - and _), and print an error saying “this package name is reserved because it is very similar to <x>, did you mean to use that package?”?


#2

I think https://github.com/rust-lang/cargo/issues/2775 and https://github.com/rust-lang/rfcs/pull/940 contain a lot of the past discussion on this topic.


#3

Thanks for the links. They address the issue of using - in crate names (now disallowed), but they allow - to be used in package names in cargo.

I think this still leaves the problem open - I read about including '-' == '_' in case-insensitive equivalences, which seems sensible, but is not backwards compatible. However, even if you don’t apply this rule for all crates, you could still apply it for very popular crates.


#4

Cargo will prevent you from publishing an error_chain crate if error-chain already exists, and vice versa.

I still think Cargo.toml should resolve to error-chain if you write error_chain, and resolve to serde_json if you write serde-json. Then it becomes up to the developer’s preference what they write in Cargo.toml, and they can be consistent. Currently every Cargo.toml will contain a confusing mix of ‘-’ and ‘_’ crates. Would love to see someone write an RFC for this!


#5

Brilliant! I didn’t realise that you were already not allowed to publish crate_a and crate-a, so that means there’s no problem. I also agree that it would be nice if you could use all crate-a format or crate_a format.


#6

Well with the npm-malicious-packages I don’t think that “there’s no problem.” We still don’t have a typo proof system, just a -/_ proof one.

How many crates are levenshtein< 1 from each other?

So I descended into the BurntSushiVers…

  1. check out crates.io-index and use @burntsushi’s walkdir to list all the names.
  2. convert all the names to lowercase and _ -> -
  3. use @burntsushi’s fst to make the list searchable.
  4. use @burntsushi’s fst-levenshtein to search the fst for each name.
  5. save the results to a gist.

Tada 3909 matches!


#7

So that file is overwhelming. Let me see if I can find some more interesting subsets.

The 97 that match one of the top 100 recent-downloads or the 105 that match one of the top 100 downloads.

What are other good questions we could be asking?


#8

Do you think there should be a manual audit of the crates with names similar to popular ones? I think the most dangerous are extra hyphens (e.g. mycrate and my-crate)


#9

I can see numtraits and num-traits

Edit not saying that is malicious, just follows the pattern


#10

Related idea that has been discussed a while ago is to have Crates repository post-moderation