URI crates

There was a proposal about allowing one slash at crate names as a form of namespacing: Pre-RFC: Packages as Optional Namespaces. I'm proposing an URI as a valid way to refer to a crate.

[package]
name = "jresig_shibuya"
uri = "www.jresig.com/shibuya"

[dependencies]
b = { package = "www.qux.com/foo/bar", version = "1.0.0" }

Advantages Over Previous Proposals

URIs can contain a domain (either simple or complex, like q.hol.es) and multiple slashes. The previous proposals only allow one slash at the crate name.

Another advantage is that the URI domain doesn't conflict with flat crate names. Once a crate reference contains a dot, it clearly is an URI reference to a crate. The other proposal does however also suggest of using @ to desambiguate between namespace and flat crate name.

Security

The URIs and their domains belong to crates.io. So if you wonder whether it'll point to any external resource or that the URI can expire, that's wrong. It's not a "HTTP" request.

To clarify: URIs do not involve using HTTP requests to external resources. There's no domain registrar like Google Domains involved as well.

Domain Owners

crates.io may limit a URI domain to a set of users. The proposal does not cover that fully, but crates.io users should be able to have one or more domains and share them with other users as well.

For context, my thoughts on a previous thread regarding this topic, specifically regarding my views on URLs in import statements.

At this point, anyway, the URIs can be verified for consistence.

But does it annoy to refer to which domain belongs a crate?

What if a domain name expires? That's bad enough, but what if someone else comes in and does something malicious? How is it proposed to solve this issue?

3 Likes

Well, I covered about that in the topic:

Remember, that's not a "HTTP" request

It works like XML namespace URIs. Inclusively you still have to add the crate URI to Cargo.toml.

The URI is a structural name for the crate.

Something that might also suffice is support URI-named crates solely in crates.io. The use of URI in code could be unnecessary if one could do this:

[dependencies]
"www.qux.com/foo/bar" = { name = "b", version = "1.0.0" }

This should be easy since name in Cargo.toml is already supported. There's no new syntax involving string literal.


Added this to the topic.

2 Likes

In reality, these are two entirely separate concerns that ought not to be mixed:

  1. There's a notion of namespaces in code. These correspond to crates and modules. There was an RFC that I think makes an important improvement in the status quo here by allowing to break one namespace into multiple crates, thus decoupling the logical design of a system from it's physical one (the set of artefacts corresponding to that logical design).

  2. Orthogonally, there's a desire to have a notion of namespacing in the artefact repository. That ought to be completely separate from the code and only used by cargo in crate discovery! The mapping between a namespaced crate from crates.io to my code is only needed in Cargo.toml.

I fail to see why people repeatedly come up with this bad design of wanting to mangle these two things together. What happens if the ownership changes for example? Do I need now to make breaking changes in my APIs?!

I do think the current restriction of a single flat namespace in crates.io is a severe limitation on usability that ought to be removed. I should be able to publish my colour related crate yigal100/azure without conflicting with microsoft/azure say. But I do not want my code to be littered with my own name (especially since it is a compatibility hazard) nor do I want Microsoft's.

I want a future where I could use standard APIs but I can choose which vendor implementation best suits me in a single line of configuration. E.g OpenTelemetry or ODBC as examples. I don't want to guess ridiculous names that are harder to discover for common needs, like error handling and logging. But again, this is all restricted to the domain of crate discovery and does not belong in my code.

3 Likes

Oh, yeah, I kind of lost the point. Other related proposals started better at this point. So the most ideal, if only one namespace is supported in crates.io, then the crate's flat name shouldn't be changed; the namespace should be in another field. If it's an URI, similiarly the crate should still have its flat name, but the URI should be stored in another field in Cargo.toml to make it discoverable.

The advantage of URIs are that you can have a domain and multiple subpaths. I think something like qux/foo/bar can still be supported, though.

1 Like

I reckon the full identifier of the crate in crates.io should be a separate field produced by crates.io as a result of publishing my crate.

Consider also the case of say Microsoft from above. Say they have hundreds or thousands of crates related to their microsoft/azure namespace. Say they want to sell the business or just reorganise their namespaces for whatever reason. Would they need to update thousands of cargo.toml files and republish all of these crates with new versions?

That's a pain for them and for their users. Instead, they should be able to reassign those crates to another namespace merely via the website or an API.

2 Likes

That is not how I interpreted that sentence. I merely took it as "we're using it for validation, but it's still hosted on crates.io". So if it's merely being used as an identifier, I don't see what this gives us over having "trivial" namespaces, where it's abc/def.

If there's no validation of ownership, what stops me from registering the crate rust-lang.org/rustup?

2 Likes

Independent of the rest of the argument, the current syntax for naming a library crate differently than the package is

b = { package = "www.qux.com/foo/bar", version = "1.0.0" }

Personally, I think it was probably a minor mistake to allow conflating the (cargo) package name with the (source) library name. The majority case will of course want to share the same package/brand name as library name, and for the package in use to be evident from the code useing the library. But I imagine if we just had that package names could be arbitrary and the used crate name was always stated separately in the manifest, there would be less vocal desire for a formal namespacing scheme. (There'd still be desire for some sort of subpackage scheme indicating shared ownership, etc., though, of course; those benefits are more concrete.)

3 Likes

I don't see anything in your proposal that prevents an arbitrary person from registering an arbitrary package for a domain they don't own.

Anything like this would have to include some kind of domain verification to have any value at all; otherwise it's just "please allow some punctuation characters in package names".

(I don't think we should do this even with domain verification, to be clear, but I think doing it without domain verification would make things worse, in terms of spoofing.)

7 Likes

Domain (like q.com and q.hol.es) and multiple slashes.

Right, there should be domain verification. The path (that comes after the first /) should be ignored.

I forgot the syntax. That's what I mean, I think.

Any reason?

1 Like

How would that work, though? I feel like most proposals for domains-as-namespaces I've seen gloss over the details of this, when it's arguably the most important aspect to get right.

Say I wanted to publish seventeencups.net/http to crates.io:

  • How do I prove to crates.io that I am the owner of seventeencups.net?
    • WHOIS wouldn't work, as most domain registrars provide a service that anonymizes the info.
    • I've seen some services require a URL to be accessible (e.g. some sort of marker file uploaded to the web host), but that means you have to have a web hosting set up.
  • What happens if my domain expires, or I decide to move to a different domain?
    • If my package remains as is, could someone could register my old domain and make it look like they own my packages?
    • If my packages move, what happens to the Cargo.toml files that specify the old URL?
3 Likes

Domains should work pretty much like NPM namespaces, GitHub users or GitHub organizations. The domains belong to crates.io. So there's not this concept of expiration. If I'm understanding you right, you're confusing URIs with HTTP URIs to external resources?

I think maybe people in this thread aren't all on the same page about what 'domain validation' would entail?

Let me put it another way:

In order to publish a package called seventeencups.net/http, would I need to be able to prove that I am the person who owns the seventeencups.net domain name (via a domain registrar like Google Domains)?

If so, then consideration needs to be given to what happens if I lose control of that domain name (intentionally or unintentionally).

If not, then I agree with the other comments in the thread that this doesn't really add anything over existing namespace proposals (e.g. seventeencups/http), and potentially could make the situation worse with regards to spoofing crates - what's to stop me grabbing the google.com namespace for myself?

1 Like

No, you wouldn't use any kind of domain registrar.

The domain verification was just to make sure to which crates.io user it belongs. But it may not be necessary.

As someone only reading this discussion so far, I want to emphasize that I find this topic very confusing to read, i.e. very hard to follow.

I believe it’s of utmost importance that the proposal should be very clear what is or isn’t proposed w.r.t. “domain verification”, and then answers can focus on that proposal. Or if there’s – like – 2 alternatives, the proposal should be very much extremely clear what exactly these 2 alternatives to be discussed are, and give them labels to refer to in subsequent discussions.[1] We don’t need everyone who is commenting to make up their own mind about what could or could not be meant.

The current discussion not only having everyone making up their own handful of interpretations, but then also anyone who answers to a comment also relies on their interpretation on what hypothetical setting the comment they’re answering to could be assuming, and so some things are said about some hypothetical settings, but ignored by the person answering because they didn’t catch what the assumed setting was, etc…


  1. But one proposal is probably better than multiple alternatives, and focusing on one does not rule out switching to a new one further down or in a new thread, at a later time. ↩︎

3 Likes