[Pre-RFC] DNS domains as package namespaces

While it's not likely that the domains will lapse, it is definitely something that needs to be considered as an inevitability for something .

I think the exact details of the policy for guarding against lapsed domains for major projects would make for a great section in the eventual RFC, if this thread ever gets to the point where an RFC gets written. It may involve some combination of associating namespaces with a set of crates.io users / teams allowed to push to those namespaces, a non-GitHub source of truth for crates.io teams, and/or requiring 2FA for namespaced crates.

However, right now, this thread is a pre-RFC because I don't know whether the Cargo or crates.io teams are OK with any form of namespaces. Any brain power spent pondering what to do if tokio.rs expires is wasted if I can't convince the relevant authorities that the packages xml and example.com/xml should be allowed to coexist.

The problem isn't the concept of namespaces but solving the very problems being raised in this thread. I've already called this out before. Most of these concerns were included in Child Thread: Survey of registry namespace designs for Cargo and Crates.io which I was expecting your reply to cover.

I'd also recommend posting relevant content on the relevant threads from that previous summary and focus your points on your incremental differences so people can more easily identify and talk to what is new you are bringing to the table rather than having to sift it out of long, detailed posts. Even knowing content is present in your post, it is hard to find.

Dismissing it as "small" I think says more about you then the problem.

Being "optional" is an escape hatch proportionate with the expected value. We would be rolling out a major feature to address a problem but those without a domain name would not be able to participate.

Relying on hosting services has a couple of problems (1) those domains are mutable because orgs / usernames are mutable and (2) migrating hosting services is a breaking change.

While I am for small, incremental RFCs, we need to consider what behavior we will cause with our actions and if that is right. If our answer is to push people to github.io and that is an option with significant problems, then we need to take ownership of that and ensure there is a viable option with a paved path.

Yes, they are seemingly in conflict with each other. Calling it unreasonable is not accurate and is like saying "memory safety without garbage collection" is unreasonable. We shouldn't shy away from solving apparent contradictions.

As I discussed at Child Thread: Survey of registry namespace designs for Cargo and Crates.io , there may be ways to solve the transfer / rename case. Or maybe we see a strong justification for why we should flex on one of these. Or maybe we go with a completely different solution, like organizational tags, that cover most of the use cases without stepping into this quagmire.

1 Like

Sources? In Rust I would say that is not true: if I see burntsushi or another famous Rust library author I definitely trust it more. I will often also look at what other crates an author has written and what their popularity is as a contributing factor to trustworthiness.

2 Likes

Disclaimer: I am on the crates.io team, but this comment is my personal thoughts. I need more information and the team needs to discuss before making any official decisions.

When I look at namespace proposals, the way I approach it is trying to determine if the problems the proposal solves are substantially greater than the problems the proposal creates (because every solution involves tradeoffs). I'm not yet sure my thoughts on this particular proposal; I'm open to being convinced either way, and I would like some more information.

It sounds like this is the primary problem this proposal is aiming to solve; please correct me if I've misunderstood. You cite some examples:

Could you elaborate on why you would rather have namespaces as proposed here, over publishing these crates to crates.io as something like jmillikin-fuse, which is possible today?

The details of how lapsed domains are handled by this solution are going to influence my assessment of this proposal. Having two crates example.com/foo and example.com/bar potentially be completely unrelated because at the time of crate creation, the domain was owned by different entities seems difficult to communicate and ripe for confusion.

Maven Central (Java's package registry) also uses DNS for namespace verification. Could you summarize how and why your proposal differs from what Maven Central does?

3 Likes

The problem isn't the concept of namespaces but solving the very problems being raised in this thread. I've already called this out before. Most of these concerns were included in Child Thread: Survey of registry namespace designs for Cargo and Crates.io which I was expecting your reply to cover.

I believe my reply did cover the concerns from that link. I was careful to reference it as I was writing the different sections. If there's anything you feel was not covered adequately then I'd be willing to either go into more detail or provide examples of options that would still match this proposal.

Note that the reason I am posting this as a pre-RFC and not a fully detailed RFC is because I specifically want an answer to the question of whether namespaces -- that is, multiple packages with the same crate name but different package names -- are one of the designs that the Cargo / crates.io teams are willing to accept.

The last time I brought this topic up was in response to an email to the team, when I asked if they would be willing to have namespaces, and they said yes -- then on the thread, said no. I don't want to spend two months putting together a fully-designed RFC with all edges covered if that time is likely to be wasted.

I'd also recommend posting relevant content on the relevant threads from that previous summary and focus your points on your incremental differences so people can more easily identify and talk to what is new you are bringing to the table rather than having to sift it out of long, detailed posts. Even knowing content is present in your post, it is hard to find.

I think it would be impolite for me to post unsolicited off-topic proposals on someone else's thread. If someone wants to discuss how to better communicate the organizational trust / reputation of a crate, then my topic (namespaces) would be off-topic. Ditto if someone wants to talk about something like serde and serde-derive.

Dismissing it as "small" I think says more about you then the problem.

I'm not dismissing a monetary cost as "small", but when compared to the other costs involved in publishing Rust packages on crates.io it seems strange to focus on a fee that is an order of magnitude lower than any of the other costs involved (a non-smartphone computer, electric power, internet access).

Being "optional" is an escape hatch proportionate with the expected value. We would be rolling out a major feature to address a problem but those without a domain name would not be able to participate.

Being optional means that it is purely additive -- there would be nobody worse off if the registry supports namespaces. A person who is satisfied with the global namespace would continue to have access to it in exactly the same way they do today, but people who are currently unable to use the global namespace would become able to participate in the crates.io ecosystem.

To use a real-world analogy, the addition of a ramp to a building does not disadvantage people who are able to use the stairs.

Relying on hosting services has a couple of problems (1) those domains are mutable because orgs / usernames are mutable and (2) migrating hosting services is a breaking change.

All meaningful identifiers are mutable, and the requirements state that the identifiers must be meaningful, thus any design is forced to contend with some level of mutability.

The advantage of a domain name is that the name itself is not mutable, which means that there is little motivation to advocate for being able to rename a package. This is an advantage over (for example) usernames, which often change along with an individual's legal name and are therefore much less stable.

While I am for small, incremental RFCs, we need to consider what behavior we will cause with our actions and if that is right. If our answer is to push people to github.io and that is an option with significant problems, then we need to take ownership of that and ensure there is a viable option with a paved path.

I don't have any particular preference for github.io other than to note that crates.io currently depends on GitHub for login, and therefore it's an example of a free hosting service that every single crates.io user is currently guaranteed to have access to.

There are other options for free identities rooted in the domain hierarchy, ranging from straightforward to bureaucratic:

  • Straightforward: When non-GitHub crates.io usernames become available they could be used to issue {username}.users.crates.io namespaces. This may or may not be feasible depending on username mutability, but it would at least be under the complete control of the crates.io admins.
  • Bureaucratic: Assign each year a color, each day an animal, and packages within a day a food from a sequential list. I won't be able to publish john-millikin.com/fuse, but I could publish yellow-elk-bagel.ns.crates.io/fuse, which is good enough. There's various schemes for turning unique numeric digits into memorable phrases, any of them could slot in.

Yes, they are seemingly in conflict with each other. Calling it unreasonable is not accurate and is like saying "memory safety without garbage collection" is unreasonable. We shouldn't shy away from solving apparent contradictions.

I think there's a difference between requirements that are difficult to reconcile, and requirements that structurally cannot be reconciled. The requirements as stated cannot be simultaneously satisfied. There is no possible identity that is immutable, human-meaningful, accurately represents the legal identity of the author, and does not change when ownership of the package is transferred.

As I discussed at Child Thread: Survey of registry namespace designs for Cargo and Crates.io , there may be ways to solve the transfer / rename case. Or maybe we see a strong justification for why we should flex on one of these. Or maybe we go with a completely different solution, like organizational tags, that cover most of the use cases without stepping into this quagmire.

I think one of the challenges I have with the threads you linked is that they seem to have an implicit assumption that the primary purpose of namespaces is reputational. There's a lot of discussion of ways to provide an honest reputation signal -- tagging, verification badges, reserved prefixes -- that aren't namespaces and so therefore don't have any of the properties that I'm looking for.

Meanwhile a lot of the possible designs for namespaces don't provide any reputation signal, so they don't help with things like deciding which packages are part of the Serde or Tokio organization and which are independent productions.

In particular the requirements for a large corporate entity that wants to put their name on their packages as a form of branding are completely different from the requirements for an individual user who just wants to be able to publish their packages to the registry and thereby participate in the ecosystem.

It sounds like [multiple packages with the same crate name] is the primary problem this proposal is aiming to solve; please correct me if I've misunderstood. You cite some examples:

Yep, that's correct.

Could you elaborate on why you would rather have namespaces as proposed here, over publishing these crates to crates.io as something like jmillikin-fuse , which is possible today?

My package is an implementation of FUSE, and therefore it is called fuse. If I had forked the FUSE protocol into my own personal variant, then I would publish the library as jmillikin_fuse.

Note that if I had done so, and I had published jmillikin_fuse, then I would not want to prevent someone else from publishing their take on that idea. They should also be able to publish their jmillikin_fuse library if they wanted it to do something different than mine.

In other words, I do not view the current crates.io package names as conveying information about the identity of the author.

There are many crates (random 3 from a quick search: google_maps, amazon-spapi, microsoft-directx) where the format is {organization}-* but the author has no relationship with those organizations. I think this is fine and normal, because package names do not currently contain information about the author.

The details of how lapsed domains are handled by this solution are going to influence my assessment of this proposal. Having two crates example.com/foo and example.com/bar potentially be completely unrelated because at the time of crate creation, the domain was owned by different entities seems difficult to communicate and ripe for confusion.

My own personal instinct is to be very strict about this. If I had authority to dictate the rules, they would be:

  • Namespaces are crates.io entities, like crates. There is a database table of namespaces.
    • Each namespace has a set of owners authorized to make administrative changes.
    • They also have a set of users authorized to publish new crates.
    • Finally, they have a set of users authorized to publish updates to existing crates if and only if those users are also an owner of the crate in question.
  • When a package is published, new or update, the domain is queried to verify that the current owners of the crates.io namespace retain control over the domain.
  • Publishing a package to a namespace requires 2FA.

If a domain expires or otherwise transfers control then the new owners will not be able to publish crates to that namespace. They would need to request intervention from the crates.io admins, which would be a similar process to someone asking to take over a crate.

However, I also recognize that my instincts are in complete opposition to how crates.io currently behaves and what the community expects regarding friction in the publishing process. Thus compromises like only verifying on crate creation.

Maven Central (Java's package registry) also uses DNS for namespace verification. Could you summarize how and why your proposal differs from what Maven Central does?

My understanding is that:

  • Maven has fewer concerns about mutable usernames on third-party hosting. As noted in the thread they allow io.github.* group IDs, along with several other hosting providers with mutable usernames.
  • They're willing to have higher friction on the publishing path, which changes the threat model regarding domain verification/expiration.
    • Cargo authentication is just a bearer token, and if I understand correctly then crates.io has no way to upload a GPG pubkey or enroll a YubiKey or anything that can be used to mitigate an account takeover.
  • Java culture prefers large libraries. Packages like org.apache.commons.compress would be split into ten or more Cargo packages, all of which would be by different authors.
    • The expectations around maintenance and reputation are different than in an ecosystem with lots of small packages like Rust, Go, or JavaScript.
  • Java existed before Maven Central and there's well-established ways to move .jar files around without Maven, so they don't have the same absolute ecosystem dominance as crates.io does.
    • If someone doesn't publish their .jar on Maven Central then it doesn't really matter, support for directly fetching from arbitrary hosting is well-supported.

As for the minor technical differences (TXT records vs HTTPS, com.example.www vs www.example.com) I don't think there's much difference.

Sources? In Rust I would say that is not true:

Sorry, I phrased that badly. In Go there's so many modules published that it's nearly impossible to remember who writes what. And there's no explicit publishing step, so a skilled developer's ten-year magnum opus will be listed right next to their weekend hackathon. You can't necessarily draw accurate conclusions about the quality of a codebase based on who wrote it.

Also, the well-known personalities in the Go community tend to be known more for real-world accomplishments rather than for libraries. For example Brad Fitzpatrick founded LiveJournal and Ken Thompson invented UTF-8. There isn't really the "Ooh, this FLAC decoder was written by DarkDragon19, I know it'll be well-implemented" culture of individual branding.

Okay, that might be the case (I don't know Go, so I can't refute that). But rust currently works differently than that.

Is there any reason not to be able to use an already published crate as a namespace(parent)? That way we don't need a separate naming system and in many cases this is also what you want (e.g. tokio-util could be at tokio@util [1]). In that case namespaces would be more like child-crates (possibly even allowing parent@child@subchild if desired.

This would avoid issues like "I have had my crate published for years now but now can't use the namespace with the same name to put other crates in".


Random thought I had to post (I have no idea if this would actually be a good system):

What if a crate author could simply choose a namespace, even if it is already used by someone else, then store a unique number (or a hash) in Cargo.lock [2] that clearly identifies which one should be used (set on first use and cargo warning loudly if you are adding a crate from a namespace with the same human readable name, with an escape hatch if you really want to use the other one.

Example: Given two namespaces called tokio, one with hash/unique identifier "AAAA" and the other with "BBBB":

  • tokio@tracing: defaults to the first one that was published (or uses the namespace in Cargo.lock)
  • tokio@tracing: {version = "*", namespace_hash = "BBBB"}: Select which one to actually use, making it possible to use them but with some friction.

  1. Imports would probably stay tokio-util, not using the namespace at all. â†Šī¸Ž

  2. Not always committed to git, I know. â†Šī¸Ž

Is there any reason not to be able to use an already published crate as a namespace(parent)? That way we don't need a separate naming system and in many cases this is also what you want (e.g. tokio-util could be at tokio@util ). In that case namespaces would be more like child-crates (possibly even allowing parent@child@subchild if desired.

I think I covered this use case in an earlier post; what you're describing here is something like a reserved prefix. A user might have a project mycoolproject, and want to reserve the mycoolproject-* prefix for use by their crates. It's sort of like a v0.0.0 placeholder but it applies automatically.

In other words the purpose of a reserved prefix is to restrict the set of names other people can use. You'd get those names all to yourself.

Namespaces are a bit different, the goal is to expand the set of names people can use. Namespaces allow lots of different people to publish crates even if some of those crates are different approaches to solving the same problem.

Random thought I had to post (I have no idea if this would actually be a good system):

What if a crate author could simply choose a namespace, even if it is already used by someone else, then store a unique number (or a hash) in Cargo.lock that clearly identifies which one should be used (set on first use and cargo warning loudly if you are adding a crate from a namespace with the same human readable name, with an escape hatch if you really want to use the other one.

I think this would go against some of the security goals that the Cargo and crates.io teams have written about.

One of the concerns about allowing namespaces is they might be used to impersonate popular crates, for example I could publish john-millikin.com/tokio and try to trick people into using it.

If anyone can publish a crate with any name and the only way to tell the difference is to examine the hash in Cargo.lock then it would be really hard to notice when someone is trying to pull a trick.

What is a namespace if not a reserved prefix. I was primarily referring to why use a domain and not an existing crate. In your Cranelift example cranelift.dev/simplejit you used a domain, even though the project already has a "parent" crate that could be used, thus becoming cranelift/simplejit, without having separate ownership/naming sources like domains.

On your serde example there already is a main crate serde, that can be used as the namespace, possibly with a serde/unofficial/foo, thus still having everything bundled under serde.

And on your fuse/sane/sleigh example: Still possible. The main downside is that your (arbitrarily chosen) namespace must not conflict crate names (and will mean you can't publish a crate named like your namespace in the future unless it is put under a different namespace).

The effect is the same, except that each existing crate automatically gets a namespace and there are no problems with domain expiry.

Technically nothing would prevent us from having both crate names and domain names as namespaces, as long as crate names cannot contain a '.' and you can't use the TLD itself:

foo
example.com/foo
serde/foo
1 Like

I'm not following this argument. If you published jmillikin/fuse with namespaces would you be ok with others not being able to publish their jmillikin/fuse? To me it's exactly the same. If you then argue I could publish my own skifire13/fuse why can't I publish skifire13_fuse right?

The way I see it the main motivation for namespaces is to reserve the namespace for my own (or my organization) packages, while preventing others from creating crates in it.


ps: I didn't see this mentioned earlier but note that you can publish a jmillikin_fuse package containing a library called fuse (which will be name that will be used to import it in Rust code).

2 Likes

What is a namespace if not a reserved prefix.

A prefix is part of the crate name, so it's passed along to rustc. The serde-derive package has the crate name serde_derive.

A namespace is part of the package name but is not part of the crate name, it gets stripped off and is not visible to rustc. The example.com/xml package would have the crate name xml.

And on your fuse/sane/sleigh example: Still possible. The main downside is that your (arbitrarily chosen) namespace must not conflict crate names

The main downside of using a prefix when it's not justified by membership in a larger project is that the name would become unclear. A package named example.com/xml is clearly an XML library, but a package named somecoolproject-xml is something related to somecoolproject.

I'm not following this argument. If you published jmillikin/fuse with namespaces would you be ok with others not being able to publish their jmillikin/fuse ? To me it's exactly the same. If you then argue I could publish my own skifire13/fuse why can't I publish skifire13_fuse right?

If I were writing a FUSE mount server and needed a FUSE library I would look into skifire13/fuse as a possible candidate, but would probably not look at skifire13_fuse because I don't know what sort of technology skifire13 is.

edit: For example, there is a program called Ghidra that has its own dialect of XML, and the library I wrote to parse it is ghidra-xml. It's not a general-purpose XML library, it's specific to the Ghidra dialect.

This is a very strict and specific interpretation of the crate name.

I don't see why your own implementation would count as the fuse. Isn't the reference implementation by Linux kernel folks more deserving of being the fuse? If your implementation isn't a fork or a drop-in replacement for the reference FUSE implementation, why would it be the fuse and not your take on it, like jmillikin_fuse? (or something like jmfuse)

If the crate needs to meet some criteria to deserve using a certain name, it seems almost like flipping the namespace the other way: there is some fuse that's the true FUSE project/protocol/api/spec, and there are conforming implementations of it. So it should rather be a fuse.org/jmillikin where the FUSE controls the namespace and decides that your implementation its a valid fuse not a personal variant, while not-true-FUSE libraries would have to pick another namespace fuse-plus-plus.fork/jmillikin

This doesn't seem to be a problem on crates.io. serde_json is accepted as being for the real JSON, despite not being published as the json crate. There's http crate, but also httparse, hyper, reqwest, and a bunch of others that are arguably an implementation of HTTP, but they don't all have to be called */http.

4 Likes

I don't see why your own implementation would count as the fuse . Isn't the reference implementation by Linux kernel folks more deserving of being the fuse ? If your implementation isn't a fork or a drop-in replacement for the reference FUSE implementation, why would it be the fuse and not your take on it, like jmillikin_fuse ? (or something like jmfuse )

I'm having trouble parsing this, sorry. I don't think my implementation would be counted as the reference implementation, that's why it would be under my namespace and not kernel.org.

The namespace represents the identity of the author, the crate name represents its purpose. If a library's purpose is to implement the FUSE protocol then fuse is a good name for it.

This doesn't seem to be a problem on crates.io. serde_json is accepted as being for the real JSON, despite not being published as the json crate. There's http crate, but also httparse , hyper , reqwest , and a bunch of others that are arguably an implementation of HTTP, but they don't all have to be called */http .

It would sure help when trying to figure out which one to use, though. I've mostly been getting by with the json crate, but I've never been able to figure out which HTTP library is the standard so I've just used Go whenever I need to do a project that involves HTTP.

Never meant that. I do mean a separate part that clearly is not part of the crate name (e.g. separated by / or @ in Cargo.toml). Nor does it (have to) appear in the imports (see ps from SkyFire13).

When I said prefix I meant from the viewpoint of dependencies in Cargo.toml, where it is a prefix to a string. In the background: Sure can be called/named without the prefix/namespace if that doesn't result in naming conflicts.

Note that there is an already approved rfc for using packages as namespaces but it serves a slightly different role where the packages truly compose into one API. There will be impedance mismatches if used for organizational namespacing, This is covered in the namespacing thread I linked earlier.

1 Like

Can't we allow them to set the library name to "opt out" of namespacing? You could them have a crate parent and a crate parent::child with library name child that just gets imported as child not as parent::child.

Heck maybe even require opt-in xor opt-out so whether the package is included in the namespace is explicit.

Note that you can do this today (you can publish a package abc_xyz that exposes a crate xyz), though it tends to confuse users.

1 Like