Crates.io squatting


#122

Shouldn’t the namespaces be based upon another namespace that we don’t control? That way it’s someone else’s problem. For example, we could make it your username on GitHub or something…

FWIW, I think that adding a namespace like that is the right solution here. It allows people to use obvious names like futures if they want to and it also solves squatting.


#123

@kornel I tend to agree, but I can see a lot of merit in crates.io remaining unopinionated given its status as “universal infrastructure”. Back when the namespacing issue first got raised, years ago, I was almost tempted to argue that if you’re not going to enforce any kind of name hygiene you may as well not bother with names at all; just use GUIDs instead:

  • Squatting becomes a non-issue (including typosquatting, since nobody thinks they can type GUIDs from memory).
  • No admin needed.
  • May not be friendly, but most of the friendly names have gone already anyway.
  • Can still specify a friendly name, but not as an identifier, just a not-necessarily-unique aide-memoire when reading manifest dependencies or imports. A bit like the common URL pattern of using a GUID or integer id plus an irrelevant but human-readable slug.

On the downside, it may traumatize people who remember COM. (Which was solving a different problem, the lack of any central registry.)

@mark-i-m Yes, this was discussed earlier in this thread, e.g. here.


#125

One of my worries is that the situation could easily and quickly get much worse. How long would it take for one person with a script to claim every single dictionary word, for example? I don’t like being in the situation where it takes only one bad actor to make life annoying for everybody. And since we’ve already had a number of users publish hundreds of empty crates, I’m not confident that we won’t see more extreme efforts in the future.

Of course the admins could always decide to intervene after the fact if they decide a line has been crossed. But I would suggest some relatively simple technical measures could prevent some of the worst-case abuses before they happen. For example, limiting each user to 500 crates by default. This could be done without any major changes to the system, and is easy to reverse if it causes problems.


#126

There are already legitimate users approaching this limit. And of course GitHub teams are allowed to own crates too. Would it really be surprising if the servo team someday owned more than 500 crates? What about other teams heavily invested in Rust (e.g. what if Google put fuchsia on crates.io)? I definitely think there are tons of legitimate reasons for hundreds or even thousands of crates to be owned by a single entity.

Not to mention that the source of truth for crates.io users - GitHub user accounts and teams - is not exactly hard to get more than one of.

However, I personally think some kind of rate limiting new crate publication would be perfectly reasonable and I wouldn’t be surprised if the crates.io team would accept a PR.


#127

A limit like that doesn’t need to be a hard rule, just a default. Users and groups who legitimately need that many crates could apply for an exception (which would probably only be denied if they were obviously squatting on names they don’t intend to use.) However, rate-limiting might cause automation headaches that could be much harder to work around. There are legitimate reasons to mass-rename and publish a whole mess of crates at once, too.


#128

On the specific issue of company name and trademark, I would point that there is a much larger issue:

  • there are different jurisdictions but a single (central) crates.io.
  • it is possible for two companies in different domains to have the same name.

My current company fought for years trying to acquire the <company>.com domain name simply because another company with the same name had registered it first and was not keen on selling it away. And from research, there appeared two other companies trying to acquire it too.

Which one would you give the crate name too? All four have good reasons to vie for the name.

I would note that even namespacing doesn’t “solve” the issue; the four companies would have fought for the same namespace much like they fought for the same domain name.


#129

I would note that even namespacing doesn’t “solve” the issue; the four companies would have fought for the same namespace much like they fought for the same domain name.

The company with the <company>.com domain gets the com.<company> namespace.

There is neither a reason nor a point for crates.io to get involved in this, that’s one of the core benefits of name-spacing over a free-for-all across a single, global namespace.


#130

This forum we’re currently using requires a certain reputation rank before allowing certain privileges. Why not reserve certain crate naming privileges to authors of high enough reputation?

Based on the discussion, it appears that Crates is failing to provide a means to add a reputational score to crates and crate authors. Crowdsourcing crate reputation as a means to separate the good from the bad is a proven system (tripadvisor, Amazon ratings, etc.).

Personally I’m not a fan of any statement containing the words “it cannot be done” or its equivalent. Yes it can, you just don’t want to (which can be ok, perhaps).

a) Add reputation to authors and crates, reserve specific crate naming conventions to higher reputation authors, also in a graded system (higher reputation can register more names, more freedom in naming)

b) allow user reviews and ranking of crates (helps solve discoverability problems of crates too)

c) Beef up the search systems to filter out low reputation crates (only 3 stars+).

d) Provide tooling to inspect the included crates and their reputation, and warn if low reputation crates are used anywhere in the dependencies (these are bad for security if nothing else).

e) try experimental techniques like AI to suggest relevant crates for a project or finding deviant behaviour. And try allowing for paid-for crate curation lists. The sky is the limit, try stuff and see what works.


#131

I disagree. The best rating is “links”. In other words, usage. The most used crates should get the highest rating. The Authors of the most used crates should get higher ratings. That’s it. Anything else is gameable.


#132

Except that Amazon ratings are heavily, heavily gamed.

You should visit Black Hat World. Find out how the awesome amount of effort that gets put into gaming PageRank.


#133

When shooting down ranking/reputation ideas because they’re not perfect, can be gamed, etc., please compare them to the status quo, not imaginary 100% bulletproof ideal.

For example, there was a paralysis in discussion about crates-io crate ranking, because votes, reviews, links, metadata, IP addresses, user accounts, crate content, usage numbers, etc. all can be faked and gamed in some way.

So because no proposal was absolutely ideal bulletproof impossible to game, all were shot down, and we’ve stayed with status quo that is worse than all of them: raw, unfiltered, game-all-you-want download numbers without any attempt to even filter out bots and repeated installs in CI (and a disclaimer that it’s not a ranking, the crates just happen to be sorted by this not-a-ranking number…)


#134

Which seems problematic if a domain is actually fought over - you wouldn’t want to have to rename your software package, especially if widely used already.


#135

Indeed, take-overs and transfers would be a nightmare for contributors, especially if two crate names inside the package collide (or near-collide). And I do believe collisions are likely: programmers are not the most imaginative for naming, so core, base, etc… are likely to show up in many a namespace.


#136

Guilty.

There’s a reason I use “foo”, “bar”, … :wink:


#137

The status quo is that people get their recommendations through third parties like the Awesome List, GitHub, Reddit, the forum, Stack Overview, and general search engines. Not only is the community not centralized around any one of those places (which, in itself, makes gaming it trickier), they are all better at abuse mitigation than crates.io will ever be.

If I had my way, crates.io wouldn’t even have search or categories, and it would address everything with UUIDs.


#138

Crates have human oriented identifiers, which at this point are baked heavily into the language and cargo ecosystem, so the boat on uuid has unfortunately sailed.

Awesome Lists add some level of moderation for discovery, but going back to the original problem - they can’t fix abuse on crates.io itself.


#139

Is abuse on Crates.io a theoretical or actual problem? It’s not clear to me that there is an actual problem (yet). Just curious what your feelings on this are and how that should influence the discussion and solution.


#140

It’s perhaps a problem, but more importantly an opportunity. The opportunity is to make it from a working system into a system that makes programmer’s work easier.

Exactly. To state a system should not be used without proposing an alternative is to propose doing nothing. And, doing nothing is the worst idea ever i.m.o

The fact that Google still exists and has not been burned down by angry mobs proves that ranking can be done successfully. Sure there are ways to cheat, but if that was a reason to shut down Google… o boy, finding stuff on the internet would be a mess.

Why not implement a selection of ways to track the reputation of crates by reviews, links, usage, etc and let people set up their own custom filters?

Amazon has implemented verified reviews to work against this (the review account must have the purchase registered). A similar thing can be done by designing a web-of-trust for rust, where you can filter the reviews based on the author reputation and exclude reviews from low-reputation authors.

Any problem in this world can be solved given the willpower to do so. I for one am not just going to roll over and accept defeat on the question of improving crate discoverability and naming.


#141

It is an actual problem. It’s starting small scale (people grabbing 20 crate names they like), but it’s inevitable that someone will run a script and grab 1000 or 10000 names. The site allows this, but most importantly the policy allows it, so once someone squats crates io big time, they will have right to argue it’s all A-OK. Applying anti-squatting policy retroactively is a bigger ask than setting one beforehand.

To put it another way: if crates io doesn’t have squatting policy, the first person with a script will set the policy for you.


#142

Earlier in the thread it showed a few people abusing the system by reserving a bunch of names requesting “contact” if they wanted to use these names. this thread’s gotten too long for me to find, but it’s somewhere here.