Crates.io squatting

Wouldn't it be: rand,core/rand, chacha/rand? So, whoever registers a top-level name, like for example tokio, they would then be the only person/organization able to create crates beneath that like, tokio/net, tokio/disk, tokio/fs etc.

While namespaces do provide one way out of this problem, the policy has always been that namespacing would introduce problems of its own. Since improving on the squatting problem does not require namespacing, I feel discussion of any namespacing avenues in this thread (about squatting) is basically off-topic for this thread. If you want to pursue some variant of namespacing, please do so in a separate thread.

4 Likes

I think before you say what is and what is not an appropriate solution to the problem you must first clearly define the problem. To me, “Squatting” isn’t clearly defined yet. That needs to happen first before you can have a reasonable discussion about what is and what is not the appropriate solution to the problem.

2 Likes

Full disclosure: I myself have one crate in the fourth category. Someone on reddit suggested a time-limited reservation system: release version 0.1 within X months or you lose the name. Of course, this doesn’t prevent problem types 1-3 because you could just put in random code, but it may provide motivation for people reserving names in “good faith”.

I’m also not sure where things like @carllerche squatting tokio-* fall in your taxonomy. If @withoutboats is correct that some of them are reserved without the intention of writing a crate, I don’t understand the motivation behind that.

I totally understand the motivation. It is a symptom of the fact that crates.io does not support name-spacing. Because of this, if you want to prevent a bunch of closely named crates that could be published for malicious or misleading purposes to associate with the "brand" of the crate, you have to reserve variations/sub-names to ensure others don't use the name recognition of "tokio" to either push unrelated or malicious crates.

1 Like

I get that part. But it’s bad for users. If I have an idea for doing something involving tokio and compression, I might search for “tokio deflate”. As of today there are two results: squatted tokio-deflate and a crate involving websockets that seems irrelevant. The existence of this squatted crate gives a user no indication of whether tokio-deflate is being worked on, has any intention of being worked on, is feasible at all, etc.

There’s a separate question of whether @carllerche has a “right” to the tokio-* prefix at all, but let’s not discuss namespacing here, it’s even more of a dead-end then a squatting policy.

Yeah, I don’t think namespaces are the solution. They help in one case, but the rest is “meh”:

  1. If arbitrary namespaces are allowed, they don’t solve malicious squatting at all. Instead of squatting crate names, the battle is now for good namespace names.
    If namespacing is “outsourced” to GitHub (or DNS or someone else) that solves the problem only in the sense of making it someone else’s problem and squatting happening on someone else’s site first.

  2. Typosquatting is partially solved, but still possible, so tokyo/tls can be done. tokio/tsl is prevented.
    If namespaces are based on usernames, typosquatting may get worse, since it’s harder to remember spelling of username than just projectname (e.g. winapi crate).

  3. Namespaces don’t do anything for spam (I’m assuming spammers want traffic or SEO, so they can still keyword-stuff their namespace names and drive traffic to namespaced garbage).

  4. People may still want to reserve namespaces in case their personal project grows big, or to have something nicer (project-name/project-name rather than my-silly-username123/project-name).

  5. It would help a lot for trivial/toy projects that would be dormant under someone’s personal namespace.

  6. Not very helpful for abandoned real projects. It’s still a good name(space) taken.

2 Likes

But, what if the "No Name-Spacing" policy is inadvertently encouraging and proliferating "squatting"? Then, doesn't it need discussed in reference to "squatting"? I would think so. Separating the two issues seems fraught with problems because it is my contention that the "No Name-Spacing" policy feeds the "Squatting" beast.

3 Likes

I would call this a BIG IMPROVEMENT(tm). For example, spot the typos that might be typo-squatted maliiciously:

tokio
tokio::foo
tokio::bar
tokio::foobar
tokio::bar::foo::barbar
tokio::some::foo::barbar
tokyo::some::foo::barbar
tokio::some::bar::foo
tokio::tls
tokio::tsl
tokio::hdparm
tokeio::hdparm
tokio::dhparm

Now, how hard is it to verify you haven't inadvertently included a typo crate name when tokio:: is reserved to one owner vs tokio- being open to anyone or anything (as below)?

tokio
tokio-foo
toki-bar
tokio-foobar
tokio-bar-foo-barbar
tokio-some-foo-barbar
tokyo-some-foo-barbar
tokio-some-bar-foo
tokio-tls
tokio-tsl
tokio-hdparm
tokeio-hdparm
tokio-dhparm

With a quick scan of the above lists, where tokio:: is reserved to one person/organization and tokio- is open to the world, how sure are you you haven't made a typo that will give you a typo-squatted crate?

My contention is, that in the first case, it is easy for you to scan the list for outliers (and it can even reasonably be linted against) whereas in the second it is somewhat of an ordeal to verify.

1 Like

I would contend that they do. They force "spammers" to register a top-level name (which can be limited as I've described in an earlier comment through an increasing fee-schedule) which can then be easily revoked/banned if they are found to be "spamming" and meanwhile, their "spam" isn't associated with a legitimate crate "brand".

NOTE: That because the first few "Top-Level" names are free to anyone and you can create any number of sub-name-spaces of a top-level name-space yourself, legitimate users will likely never need to pay a dime.

Yes, and there should be nothing wrong with that, within reason. Hence the possibility of an increasing fee schedule for more top-level names registered to the same person/organization.

This problem will always exist. At some point, it will likely become necessary to define "abandoned" and a policy for reclaiming the name-space.

Making accounts paid would dramatically lower ROI on spam, but that's the payment part helping, not the namespace part helping. You can still ban all crates by a spammer by banning the account (and limit account registration as much as namespace registration), even if that account name is not part of the crate name.

I assume deletion of garbage spam crates is uncontroversial, so even if spammers take good names/namespaces, these could be reclaimed, so that's not a big worry from perspective of squatting, as long as crates-io team is willing to act on this.

That's my point.

OK, under name-spacing, you would name your crate: "oneofyournamespaces::subnamespace::subnamespace::tokiocompression" (or something like that). You wouldn't need to worry about "searching for an available name that was appropriate". You'd just create it.

1 Like

That ignores the financial incentive of being able to attach yourself to the good-will of existing well-known crates. If I'm trying to spam, it's much more valuable if I can create crate called "some-well-known-crate-my-spam" than it is "my-spam".

I feel like many cannot see these issues clearly because they are honest and trustworthy people who really can't put themselves in the shoes of these sort of malicious people. I grew up around a lot of malicious, dishonest people and I've been privy to some pretty shady conversations at the highest level of business and let me tell you, the nonsense that goes on. Honest people just don't think like these kind of crappy people. It's really hard to imagine how malicious they can be.

I’d argue all arguments suggest federation and moderation (you can’t build a moderation team for a centralized service) are the way to go.

I'm not sure that stating a conclusion without supporting arguments and examples is actually an argument in any meaningful sense. I don't think it is useful to just state opinions as facts as opposed to making a contention and providing examples and exposition on why one believes the contention is correct. I'd ask for an examination of how many comments just state opinions as facts and leave it at that? Is that useful? I'm not sure that it is.

users: “we have a squatting problem”

moderators: “we can’t do anything about it” (probably because there aren’t enough moderators and it’d be too expensive to have them)

me: “we need independently moderated crate repositories so as to offload the moderators and provide what the users want”

3 Likes

Have we sufficiently defined the outlines of the problem? Perhaps, but, I for one am not 100% clear on where the line is between permitted and desired not permitted activities and why the line should be drawn exactly there.

OK, so this is somewhat of an economic problem. Is there an economic/financial aspect to a possible solution? Something like only permitting X number of top-level names to be registered by a single organization/person without increasing monetary costs, and, through appropriate name-spacing, making it more difficult to gain financial benefits by glomming onto the brands of well-known crates.

Yes, that also is a possibility, but, is there a way to solve the problem for crates.io (or at least somewhat alleviate it) independent of the existence of externally available, curated repositories?

I've presented some arguments and example of why I believe name-spacing and a progressive fee-schedule could be a reasonable solution to the problem.

Can we collect the ‘reasonable’ strategies for undoing crate squatting once it has happened.

So far I see:

  • Decay, if there has been a lack of activity on project, it becomes available for challenge
  • Community, add a button to crates.io that says, ‘This is a bad crate’, once enough people vote, some one looks at it
  • Case by case, there is an email like squatting@crates.io where people can send in complaints, and they get looked at

All of these have their up and downsides. Decay could hit the wrong projects. Community could become an incentive for witch hunting or the like, and case by case could be personell expensive and lack transparency.

Decay could potentially be the least personell intensive and most transparent, plus there are other community driven package repositories that do it this way. A button on crates.io could also help with packages that are malicious in other ways. Case by case could be the most accurate, but due to a potential lack of transparency likely the least satisfying for the community.

2 Likes