Crates.io squatting

I understand that exact policy for reclaiming names and enforcement is extra work that the crates-io team may not want to do.

But I’d appreciate if you could at least officially declare squatting as unwanted. The current document gives impression like you don’t care at all, and therefore squatting is condoned.

If you changed the docs to “please don’t squat (we may take action against worst offenders)” it would at least discourage squatting a bit.

Currently we have tragedy of the commons that people who refrain from squatting see good crates taken, and the more squatting is done, the more counter squatting feels necessary.

11 Likes

Personally I think the better alternative is for the tooling (cargo, rustup, etc) to support alternative, curated registries in addition to crates.io.

8 Likes

I wouldn't think it needs that much of a system. If there is a good feedback loop where people can report bad squatters and they get dealt with, there likely won't be a lot of people who will continue doing it.

Or, if the current approach is, we'll wait until the problem goes exponential and then we'll deal with it, I guess I'm just saying this is already making our lived experiences worse and dealing with it sooner seems like a good idea.

1 Like

Okay, so it’s basically that nobody wants to be involved enough in management to prevent these types of problems. Which is totally understandable, because they’d be tough decisions at times and would inevitably generate pushback. But yeah, as others have said I worry that crates.io will gradually become a place where you can’t find a free name, and/or the first page of results for many searches will be empty crates. It’s not like this an issue that we made up or is unique to rust: npm for example has recognized the problem (though I have no idea if their countermeasures are successful).

IMO, a policy of “we won’t set a policy until something is obviously egregious” is rather meaningless. For me, claiming 100 crates with the tagline “contact me” but providing fake or missing contact info, and declining to answer when contacted via reddit and github, is egregious. Not for you. Maybe it would take 1000 crates to tip the scales, I dunno. I don’t mean this as a criticism! It’s just a fact that if there is no explicit policy, then it’ll always be easy to move the goalposts a little further and never do anything. But that seems to be the goal anyway, so, fine.

2 Likes

I wouldn't mind taking on some of this responsibility. It sounds like you might, too? If that's what it will take to move this issue forward, where do I sign up?

1 Like

Could we at least get paths or prefix reservation, e.g. so that Carl can reserve any name starting tokio? (If necessary using another separator, e.g. tokio/zmq.)

There are at least two reasons for this:

  • Project authors get free reservation of sub-project names
  • Users get the assurance that any path starting with this prefix belongs to the same project, and isn't part of some other project entirely, or a trojan
7 Likes

I agree with this. Not just squatters but there should be a way to report crates that can be proved as abandoned. That will make it easier for people to search crates instead of ending up on abandoned ones. This can be extended for squattered crates as well.

I'm up for it as well.

If people truly want to see the end of squatting, then perhaps there needs to be a higher bar for allocation of names (but without preventing publishing).

For example, users are allowed to register a prefix with their handle (like dhardy) and can then publish under that (dhardy/thingy), and only after this can apply for the use of a top-level name (thingy).

3 Likes

And then what? So, it will be permitted to take over a "abandoned" crate and replace it with a new version that is different from the original? That doesn't sound useful.

A better alternative might be a user can only reserve a limited number of top-level name prefixes (say 5). But can create unlimited crates beneath their reserved top-level prefixes. Then, if a user or organization wants additional top-level prefixes, they can be charged a fee with an increasing cost per the number of top-level crates/pre-fixes. So, 1-5 top-level prefixes is free. 6-10 is $5.00 per year each, 11-20 $10…0 per year each, 21-50 $20.00 each, 51-100 $50.00 each, 101-500 $100.00 each, 501+ $1000.00 each. If there is sufficient money raised from this, that money can be used to support the ecosystem. Exact fee schedule would need to be determined and adjusted with some predictable formula. All existing crates would be grand-fathered in for some time period without cost, but, unused crates (by some definition of unused) would eventually be subject to the fee schedule. Grandfathered, in-use crates would never have to pay a fee and would not count against allotments for fee-schedule purposes for new crates.

I’m not buying it. why not just disallow crates with no, or the same code?

Because that would be trivially worked around.

Oh yeah? How do you trivially work around it?

Ppl shouldn’t be allowed to republish, say, hexchat-plugin, under a new name with no code changes. I think we can all agree on that.

Note that crates with only comment/doc changes would also be blocked.

This also goes for the standard hello world examples that come with Rust.

We can even choose to ignore strings, altho that would cause issues with inline non-rust code.

The current scripts can just do cargo new && cd && cargo publish, a lot easier than generating code.

Details of a policy are hard to nail down (e.g. a squatter can generate random code, even if you require code to compile, it can still contain random strings, comments, randomly inserted or removed valid statements).

OTOH I think people should be able to publish project in very early stage to avoid a name being squatted just before it’s published (I know it wasn’t a big deal for fail failure crate, but I’d be quite disappointed).

Malicious squatting is a form of spam, so I’m afraid it’s going to be incredibly hard to define in form of rigid rules (for real-world spam rule-based spam filters are insufficient, so they’re augmented with statistical filters).

Uhmm it is a tradeoff. a few years later we might end up with a huge pile of abandoned crates which means that most names will be taken and we have to resort to namespacing.

The only downside being is that users might get confused and use a new crate thinking its the old (especially for more common names) but we can come up with a way of "deprecating a crate" (and ammending the "no delete policy" to allow deletes for abandoned crates).

Let’s enumerate kinds of problems that we’re potentially dealing with.

  1. Malicious squatting — someone takes names without intention of using them for their own projects (e.g. to annoy people, block someone’s project, make crates-io less useful, or to have collection of names with hope of some future control or profit).

  2. Malware/typosquatting — taking popular or confusingly similar name with goal of tricking people into installing the code. e.g. tokyo-tls instead of tokio-tls.

  3. Spam — a garbage crate that is abusing readme or metadata to get crates-io users to visit/buy something, or some misguided SEO, rather than providing useful Rust code.

  4. Reserved in good faith — someone may hope to work on some project in future. They may eventually publish some code, or they may never get to it.

    4a: Reserved as a defense against typosquatting.

  5. Abandoned trivial projects — there’s a chunk of abandoned crates on crates-io that appear to be someone’s “hello world” or some first attempt at making a crate. These are usually unfinished crates v0.0.1 that someone published as a test and never came back.

  6. Abandoned real projects — crates that were useful, had users, but are no longer maintained. Possibly bitrotten.

13 Likes

That's what namespacing is about. The problem is you end up having a redundant namespace especially for organisations (e.g. uuid/uuid). If you make it optional there is a bit of inconsistency

You could link up the organization/namespace with a default (sub)crate, making “uuid” a “symlink” to “uuid/uuid”.

Further allowing e.g. “uuid/sys” as, well, that…

1 Like

Better the inconsistency than no namespacing.

For example, allow both rand and rand/core (and rand/chacha etc.); I feel the confusion is out-weighed by the advantages.

1 Like

I don't see why that shouldn't be permitted.

Why?

I just don't see this as clear cut as you seem to. You're saying, "We can all agree that..." but, I don't think that is necessarily the case and can't be taken as a given.

1 Like