Crates.io squatting


#82

Can we collect the ‘reasonable’ strategies for undoing crate squatting once it has happened.

So far I see:

  • Decay, if there has been a lack of activity on project, it becomes available for challenge
  • Community, add a button to crates.io that says, ‘This is a bad crate’, once enough people vote, some one looks at it
  • Case by case, there is an email like squatting@crates.io where people can send in complaints, and they get looked at

All of these have their up and downsides. Decay could hit the wrong projects. Community could become an incentive for witch hunting or the like, and case by case could be personell expensive and lack transparency.

Decay could potentially be the least personell intensive and most transparent, plus there are other community driven package repositories that do it this way. A button on crates.io could also help with packages that are malicious in other ways. Case by case could be the most accurate, but due to a potential lack of transparency likely the least satisfying for the community.


#83

I think case by case is really the only way it can be done, but it can be augmented by having a reporting function on crates.io that would make it easier for the folks deciding to focus their attention.

People have aired concerns earlier about crates becoming available for reuse: if you depend on some crate and that crate suddenly changes ownership, there’s all kinds of tricky trust issues. Are all the downstream users okay with giving as much trust to the new owner as they did to the previous one? In the end a crate can execute arbitrary code on your machine, so there are big risks here.

At the same time, I do think that shrinking away from trying to fix these situations is not a good long-term strategy. The same person 5 years later might also be less trustworthy for whatever reason; better to hand off their useful crate to some other maintainer who has proved to do good work.

No matter what, the arbitration would likely (1) have to provide transparency on what decisions are made, and roughly on what basis, but (2) must be allowed some leeway; very strict rules can easily be gamed, and that’s a risk, which is why I think actual people deciding together would be the best way forward.


#84

You can combine 2 and 3. Community clicks a button and there is a review team which studies the request.


#85

A combination of all these would be best.

  1. Define which cases are definitely fine (e.g. you get a grace period of 6 months reserving a crate, but after that a report may be accepted)
  2. Define which cases are definitely bad (spam, typosquatting)
  3. The gray area in between will need human input.

#86

I think forcefully taking ownership of crates in anything more than the most trivial cases will be very contentious, and I’m not in favour of it. We absolutely should not take a package from a person only because that package has <5 active users, the publisher has <20 packages, and the community mistrusts the publisher. Things can break for real people when you un-register a package that’s in use. On top of that, the moderation schemes that are currently being proposed can easily take 1 FTE of continued effort, which is way too much for a small organisation like Mozilla Rust to pay.

And if we pursue a policy of un-registering only trivial cases, then is this committee really worth the effort? Determined people can still get around it. And there might still be boatloads of squatting.

Of course there isn’t so much spam right now. I understand that some people are afraid that there will be an “explosion” in spam or whatever, but this is not a given, and it’s also not something that’s much harder to fix retroactively.

But I’m in favour of proactive non-committee solutions like CAPTCHAs, phone validation and rate limiting. All of those I think will be generally accepted by the community. Perhaps even a fee schedule.


#87

Hi! I stopped following this topic after 20 replies so maybe someone already mentioned this but what if we will use as crate unique identification not only its name but combination of author and crate? The same way as Github and other similar services do.

So in Cargo.toml one should use:

[dependencies]
regex = { version = "1", author = "JohnSmith" }

For the backward compatibility if author is not mentioned then cargo should use regex crate from the author who registered it first.

This approach will also prevent problem with changing ownership of the crate.

Of course this will make cargo a bit more complex but I think it worth it.


#88

@HaronK I was thinking the same, but with the original author’s GitHub username. The Cargo.toml author isn’t necessarily unique, and email addresses can change.

AFAIK publishing still requires a GitHub account - assuming other identity providers become supported in the future, that would imply another level of namespacing. So the full logical name would be something like:

idprovider.username.cratename

e.g.

github.fredbloggs.frobnify

#89

@mrec You right. I also see that there can be more than 1 crate author. Is there ‘main’ one? Oh! Even more there are also owners of the crate…


#90

In fact the original author, or even the publisher of the last version does not need to have any permissions to the repository :slight_smile:


#91

We also could introduce crates.io authors. So all currently existing crates won’t have any author but new ones will do. And if in Cargo.toml author is not mentioned then it will look for crate without author.


#92
  1. Create a list of “reserved” namespaces, like rust, lib, std, etc. to prevent obvious misuse.
  2. Create a list of supported identity providers, e. g. GitHub, GitLab, etc.
  3. Namespaces are allocated on a first-come-first serve basis, the first user called foobar gets the “foobar” namespace. This is not changeable.
  4. If another user from a different identity provider tries to claim an existing namespace, there are various options:
  • the user gets his name with an increasing number appended to it
  • the user is allowed to pick a custom postfix to his name
  • the user is allowed to choose a completely different name
  • the user can create a request for a specific name that is reviewed

As step 4 should not happen too often, I think it is possible to think about optimizing user experience for that step without burdening moderators with substantial amounts of work.


#93

I really like how Docker Hub has done it. I’m not that into the details, but I know they have official Repositories and then they have the user ones.


#94

If you’re going to put github.com/username/cratename in the namespaced identifier, you can just as well add https:// to the identifier and use Git URLs instead :slight_smile:


#95

Using the domain name has been flawlessly working for 10 to 15 years in tools like Maven, Gradle, Leiningen, SBT, etc. Squatting just doesn’t exist there.

It’s the right solution. (Yes, there are many other bad things about Maven and its ecosystem, but they got this detail right.)

My suggestion to derive some identifier from a different service is just a very watered down approach of that, but in these times in which worse-is-better reigns supreme, one has to adjust the expectations.

Using the domain (or reverse domain) would also avoid a clash between the existing crate names and the “new” namespaced crate names, and it would be fairly easy to flip a switch between them:

Disallow publishing new crates under the global namespace after a cut-off date, except where the new version adds a “redirect-to” property to the new crate location in its Cargo.toml.

This would allow us to automatically migrate/upgrade users of the existing crates to the new namespaced versions when running cargo update.

Then, at a certain point in time it would be disallowed to specify “global” crates as dependencies in the Cargo.toml.

Ownership could either be verified by sending a mail to postmaster@domain-in-questi.on or by placing the public key at a well-known location on the server belonging to the domain, such that a crate would be signed with a private key, and crates.io would verify the validity by checking with the public key on the server.

This would also solve the issue with the question of what happens when domains change hands … the person not owning the domain would not be able to publish crates under that domain anymore. And the new owner would have to publish crates using a new key, making the change of ownership easily displayable on crates.io.


#96

Why not github.com/notriddle :arrow_right: notriddle.github.io ?

Second of all; aren’t you just moving the squatting problem over to GitHub? You can register names there just as freely as you can register them on crates.io. Top-level domains have cost, which limits squatters in a way that GitHub accounts don’t.


#97

Top level domains also cost money that not ever programmer has.


#98

That’s what people have done for years. It works without any issues: http://central.maven.org/maven2/io/github/ http://central.maven.org/maven2/io/gitlab/

No, I don’t.

  • GitHub limits free accounts to one per person – even if they didn’t, who cares? If people register 20 names on GitHub it doesn’t prevent anyone else from using that name on their domain.
  • If people register 20 crate names under their acount it isn’t swatting, because they are not taking away anyone else’s opportunity to use that name under their own account.

People can use a shared-hosting provider like github.io. It’s slightly more work to allow a list of “known-good hosts” on the crates.io side though, but I think the general expectation is that it should work from day one.


#99

There is a decent amount to think about to be careful with the behavior of Cargo to avoid automatically upgrading to an otherwise compatible version when change of ownership occurs. Having a public key hosted by the registry has other (solvable) issues to address.

Cargo could handle this by:

  • Including the appropriate public key signature in the Cargo.lock file; cargo would check that new releases are signed against the key. Otherwise the user would be prompted/warned/etc.
  • To allow for key rotation for projects which do not change ownership, new public keys can be signed with the old private key. A registry host would prevent new releases being published signed with an old key.
  • In the case that a key is compromised, crates-io (and self-hosted registries) can prevent uploading new packages signed by the poisoned key.

While I generally like the domain name solution, I think it complicates the story around being able to find crates that are well supported by the community (such as serde, etc) and I think its valuable to at least come up with a story about how this could be improved in the context of domain naming (or similar idea such as pet-naming which was mentioned in a previous name squatting discussion).


#100

I totally agree with you on this.

My solution to this issue would be that people (or the community, or even the Rust team) could just publish empty crates using the mentioned redict mechanism.

So if you have a user asking “what are the 20 most common libraries I should probably use” then you could point him to e. g. the rust-community.org namespace, which contains the crates http, json, xml, parsing, serialization etc., all of them redirecting to other people’s crates which have been vetted by the community and checked that they work well with each other.

Combined with pinning the crates, users could even out-source the task of upgrading their dependencies individually.

Instead they just update from e. g. rust-community.org/xml v1 and rust-community.org/http v1 to rust-community.org/xml v2 and rust-community.org/http v2, having the guarantee that they update from one set of libraries that has been vetted and tested in exactly this configuration to the next set of libraries, with the same assurances.

The great thing is that everyone can come up with their own set of libraries and versions, so we don’t even have to worry about one-size-fits-all.


Community-maintained guide to crates
#101

Such curation takes waaaay more work than writing “don’t squat” in the rules and deleting worst spam from time to time.