Pre-RFC: JWTs for private Cargo registry authentication

arlosi · April 18, 2023, 4:33pm

Summary

Cargo's current system of having the registry token sent verbatim in the Authorization header is too flexible and makes it too easy to build insecure registries. This RFC restricts the form that the registry token can take to either JWTs or asymmetric tokens. For compatibility, the restrictions only apply to authenticated private registries using RFC#3139.

Motivation

The Cargo team is concerned about stabilizing RFC#3139 as-is, since it allows private registry implementations to send any value in the Authorization header, which can lead to an insecure registry.

RFC#3231 defines a completely new authentication scheme using asymmetric tokens that is significantly more secure but is difficult to integrate into existing authentication systems. To support these existing authentication systems, Cargo should also support short-lived tokens. In particular, Cargo should be compatible with systems like GitHub OIDC to allow CI builds to publish crates to a private registry without using shared secrets.

As a compromise between allowing arbitrary tokens and requiring asymmetric tokens, this RFC proposes allowing JWTs to be sent in the Authorization header in addition to asymmetric tokens. JWTs are used by multiple large identity providers and can have short expiration times that make them less dangerous if leaked.

Guide-level explanation

Tokens for private Cargo registries must be either asymmetric tokens as defined by RFC#3231, or JWTs (JSON Web Tokens).

Cargo will validate that the token for a registry is a JWT by inspecting the header portion of the token. The expiration claim must be set so Cargo can validate that the token is not expired and has a validity period of less than XX days.

Reference-level explanation

Cargo will restrict the forms of tokens used by authenticated private registries (as defined by RFC#3139) to either JWTs (JSON Web Tokens) or asymmetric tokens.

The validation of the token will be performed when Cargo detects an authenticated private registry by either the presence of auth-required: true in config.json or the registry sending an HTTP 401 when accessing config.json (for sparse registries).

If the registry is configured to use asymmetric tokens as defined by RFC#3231, the request can continue. Otherwise Cargo will validate that the token as a JWT.

To validate a JWT, Cargo first remove the Bearer prefix from the token. The remainder of the token will be parsed as either a JWS (JSON Web Signature), or JWE (JSON Web Encryption).

For JWS, the token must be of the form: [header].[payload].[signature]. Cargo will decode the header portion as a JSON object and validate the following:

The typ is JWT
The alg must not be none.

Cargo will validate the payload portion as follows

The exp claim must be set to a date not more than XX (subject to bikeshedding) days in the future.
The nbf claim (if present) is set to a date in the past.

For JWE, the token must be of the form: [header].[key].[iv].[ciphertext].[tag]

Cargo will decode the header portion as a JSON object and validate the following:

The typ is JWT
The alg must not be none
The exp must be set to a date not more than XX (subject to bikeshedding) days in the future. This claim must be replicated into the header so that it can be decoded by Cargo.

Cargo does not perform any cryptographic validation of the token.

Drawbacks

This restricts the tokens that Cargo will allow for private registries. Registries that have existing authentication systems that are not based on JWTs will need to either migrate to JWTs or asymmetric tokens.

Requiring short expiration times means users will need to rotate tokens frequently or use a credential provider to generate them.

Rationale and alternatives

This proposal is fundamentally a compromise between allowing the token to be any value, and requiring registries to use asymmetric tokens.

Alternatives considered include:

Stabilize sending any token to a private registry. We can stabilize a supported asymmetric scheme when it's ready. Pro: fast and every registry happy. Con: registries can do insecure things.
When RFC#3231 is ready, stabilize requiring its use to use RFC#3139. Pro: does not allow insecure registry implementations. Con: large registry providers unhappy, not great support for GitHub OIDC.
Redesign RFC#3231 to use JWTs and be compatible with GitHub OIDC. Con: not a small project.
Require that tokens for private registries come from a credential provider. Pro: ensures long-lived tokens are more securely stored. Con: doesn't address the registry storing the tokens insecurely.

Prior art

NuGet, NPM, Python, and Maven all allow long-lived tokens to be used. NuGet and Python both support credential providers to generate short-lived tokens automatically.

Unresolved questions

How many days is reasonable for expiration?

Should token formats other than JWT be allowed?

Future possibilities

Since this design encourages short-lived tokens, users will need to be able to easily generate them. For CI pipelines, there is usually a token already available. However, for developer machines, users will want to set up a credential process that can generate the short lived tokens from their identity provider. The credential process feature is currently unstable and could be extended to better support generating short-lived tokens.

sfackler · April 18, 2023, 4:53pm

Sending any token to a registry is already stable.

"This is a breaking change" sure seems like a drawback to me that I do not see mentioned.

arlosi · April 18, 2023, 6:10pm

Clarified to "Stabilize sending any token to a private registry".

The proposed restrictions only apply to authenticated private registries using currently unstable RFC#3139 (-Z registry-auth).

scottarc · April 18, 2023, 7:02pm

Why JWTs (which include numerous security and operational risks in its design) instead of a secure token design?

Disclaimer: I designed PASETO, which is one such alternative, but I am not going to evangelize any particular design here. I just want to caution against JWT.

bjorn3 · April 18, 2023, 7:06pm

What about registries that use Paseto (JWT with less footguns), Biscuit (has very powerful offline token attenuation) or any other token method that is better than JWT?

bascule · April 18, 2023, 7:15pm

The motivation for this seems to be short-lived tokens.

For that, any token format that supports offline attenuation, such as Biscuits or Macaroons, can do better because the client can add a new expiration time (e.g. 5 seconds in the future) and send a new credential with every request.

That isn't to say I think mandating any of these formats are necessarily a good idea, just that I don't buy the argument for mandating JWT.

Eh2406 · April 18, 2023, 7:31pm

I would strongly prefer mandating PASETO. My intention with RFC#3231, was to make it the only supported format. Unfortunately, it would be very nice to support GitHub OIDC, a format based on JWT.

arlosi · April 18, 2023, 7:53pm

This proposal does allow the use of PASETO (RFC#3231) tokens (they're referred to as "asymmetric tokens" in the document).

str4d · April 18, 2023, 7:53pm

RFC#3231 asymmetric tokens use PASETO in the v3.public format. So where this pre-RFC says:

it is mandating that either (minimally constrained) JWTs, or (a specific instance of) PASETO be used. So by my reading, it is not the case that this pre-RFC prohibits registries that use PASETO (and registries that use some other PASETO would likely already be close to a position of becoming compatible with RFC#3231); however, it would prohibit registries from using Biscuits or other token methods.

(edit: temporally collided with @arlosi's reply )

str4d · April 18, 2023, 8:02pm

To me, the question is whether this pre-RFC satisfies its own motivation:

The proposed JWT validation is insufficient to prevent JWTs from being used to construct an insecure registry.

For example, by only checking that alg must not be none, it remains possible for an asymmetric JWT to be converted to a symmetric JWT (e.g. changing alg from RS256 to HS256 and then using the asymmetric public key as the symmetric key), and depending on the registry's JWT implementation, this may be accepted.

That being said, regardless of whether the pre-RFC is accepted, I think the proposed validation rules for JWTs are a good idea for cargo to enforce when JWTs are detected as the token protocol being used, and would be a positive improvement on RFC#3139. That would require weakening the "tokens are treated as opaque" wording, but I think that would be worthwhile if JWTs are accepted at all.

Eh2406 · April 18, 2023, 8:02pm

It should probably also be mentioned in the RFC, but once this pattern has been established it is fairly easy for follow-up RFC's to say "cargo will now allow the FOO format as long as it has the following checks" or "cargo will no longer insist on one of the checks for an existing format, because it is inhibiting an important use case and not critical for security". This RFC should not be implied to suggest that these are all the formats we will ever allow cargo to pass through.

Eh2406 · April 18, 2023, 8:04pm

I am very much interested in having experts chime in on other things we should check for. What could cargo do to make sure that registries are not vulnerable to this attack?

josh · April 19, 2023, 12:09am

If you can convince large providers like GitHub to support a better format for their tokens (doesn't have to remove the old one, just add a new parallel one with a better format), that would help substantially. We want GitHub OIDC and similar to work, which means we have to support JWT in addition to a better format like PASETO.

Geal · April 19, 2023, 7:25am

Supporting Github OIDC is a worthy goal, it will help make CI safer. But I do not see how supporting it mandates the use of JWT for the registry API. OIDC tokens are meant for the initial authentication and are exchanged for an access token, they are not meant for usage as the API access token. It's even indicated in the Github OIDC doc.

The OIDC spec indicates that the OIDC token is a JWT, but there is no requirement on the token's access format, even the OAuth2 RFCs do not specify the token content, it can be an opaque string or any other format.

To make Cargo compatible with Github OIDC, you should have a separate RFC indicating that the registries must support authentication with OIDC and have to deliver an API token that Cargo can use in exchange. This is a very different requirement from the API token format.

On the token format(disclaimer: I'm the Biscuit token author, hi!):

Cargo's current system of having the registry token sent verbatim in the Authorization header is too flexible and makes it too easy to build insecure registries

Could you elaborate on that? Which vulnerabilities are you envisioning? As a strawman, a registry that uses random strings as a token could work very well and make implementation simple, I don't see why we would restrict that common use case.

What would make sense here, is not mandating JWT, but instead a RFC that says "if you are using JWTs, here is what you should do". It would have to be more precise in the validation requirements, mention identity providers, etc. As an example, I went in depth on alg claim validation here. Or you can refer to RFC 8725.

JWT are well known but even now, they are full of footguns, so introducing them in a new system that you control should be done carefully. Here it's even more dangerous because it mandates its use in systems that you do not control nor audit. So considering tokens like PASETO or Biscuit should be worth the effort here, because they are built on the knowledge we got from deploying JWTs, and will prevent entire classes of vulnerabilities

Eh2406 · April 19, 2023, 4:39pm

The linked document recommends a two phase system. The user converts a GitHub OIDC token into a specific token by calling a dedicated endpoint, then uses the specific token for the actual publish request. You're not the only one to recommend a two-phase system, all production users of GitHub OIDC do. What is the advantage of the two-phase system over a one phase system? Put it differently why can't I "just" include the GitHub OIDC token with the publish request?

So far the only answers we received for why the two phase system is better is "path dependence", integration with existing code is lessened by having a dedicated endpoint instead of adding a whole new form of identity. The conversations between the Cargo Team and the Infra Team have so far been unconvinced by "because the spec says so" or "because it's easier for most implementers". If you know why the two-phase system is recommended, I would personally be happy for cargo to say "we do not support the JWT nor a one phase system".

Geal · April 19, 2023, 6:16pm

That's a very interesting detail of the workflow, thanks for giving me the opportunity to explain it
A lot of people have tried to do that, and then specs and best practices moved away from that. If you use the ID token as access token:

technically wrong because the ID token's audience is the OpenID client, not the resource server. In some cases, they are in the same service, but often not. Like, the OpenID client could be in the identity provider, outside of the hot path of the API, with different security or scaling guarantees
the github OIDC claims have nothing to do with scopes that would be relevant for a crate registry
the purpose of the ID token is to transmit identifying information. Depending on the OpenID provider there could be a lot of personal information in claims, like email, phone number or gender. That information would then be available in the API client, and 1.this none of its business(technically the client receives the ID token but it should not keep it for long) 2.if the token leaks, that's personal information in the wind, while an access token would have less impact. The Github OIDC does not include identifying information like that, but I can guarantee you that once you have OIDC set up for a system, life finds a way to extend it. Because suddenly you could link your private registry with your company's identity provider to manage who publishes crates, etc
the ID token is not tied to the client. If it is used for API authorization, then leaks, it can then be used with any API client, you can't leverage recent security techniques like proof of possession and sender constraints

At this point, you realize that, as with every authn/authz or cryptographic system, you can either:

follow the spec, do the safe and expected solution that will not need to be scrutinized
try to litigate your way around it and find a safe subset you can use, then heavily justify it, audit it, reevaluate the threat model with every new feature or connection, and keep up with any new vulnerability that would already be addressed by the safe solution

If the Cargo and Infra team are not convinced and want to use the OIDC token as access token, while fully understanding the associated risks they would incur right now and in the future when the system grows (again, OIDC tends to breed infrastructure), then fine. But it must never be a requirement for other registries which would not want to make the same security tradeoffs, and could be set up in a safer way.

So here, we can have one RFC that says a registry should support OIDC, so it can be used safely from CI, and make no assumption on the access token that will be used. Let's leave that to the registry to decide. And if we want to make sure registries work safely with JWT, let's not mandate JWT usage for every one of them, but provide best practices they would have to follow if they want to support JWTs.

arlosi · April 19, 2023, 6:26pm

Honestly, this pre-RFC was difficult to write because it's fundamentally a compromise.

Multiple registry providers want to be able to use authenticated private registries as implemented in RFC#3139, but we're blocked on stabilization over what token formats should be allowed. I don't have any specific love for JWT, and I'm aware that there are so many ways to get it wrong. It happens to be a common denominator among large identity providers, and the proposal in this pre-RFC was the compromise that the Cargo team wanted to try.

As a strawman, a registry that uses random strings as a token could work very well and make implementation simple, I don't see why we would restrict that common use case.

I agree with Geal here. As an example, crates.io currently uses random strings for tokens. If Cargo continues to treat the token as an opaque string, then the registry server can decide what tokens are allowed. This would enable PASETO, Biscuit, JWT, random string, or some new token format in the future.

Based on the feedback so far, I feel like we should not continue with this pre-RFC as written. It's clear nobody really likes JWTs, and since Cargo isn't validating the signature, it would be possible to make a fake JWT-like token that could pass Cargo's check anyway.

I really appreciate all the responses and I'll continue working towards making authenticated private registries available on stable.

Eh2406 · April 20, 2023, 9:49pm

An RFC that does not require code to be implemented in Cargo is just a best practices document. Cargo documentation will only get followed in as much as Cargo has code to check it. A corollary of Hyrum's Law. If we have an easy way to tell Cargo "I'm using a format you don't know about please don't do any checks" then we should expect (approximately) everyone to ignore our checks and do whatever they want.

Indeed randomly generated tokens stored in a database is a secure implementation. As mentioned crates.io uses it. They've only once had to reset everyone's tokens for a security issue. In fact, one of the strategies best attributes is that by observing the randomly generated token you have no idea what the security model the registry is using. This also means that Cargo has no way of distinguishing whether the token is a short-lived randomly generated token or the string "admin". If Cargo allows any random string then we are also allowing 'if you use the password "admin" then you are allowed in'. I wish this was a strawman argument, but hardcoded passwords regularly get large organizations owned.

There are two consistent and defendable positions cargo could take, unfortunately neither of them are satisfying.

Cargo will not inhibit you from using any format you want. The advantage is that any registry can choose to use whatever latest greatest technology they want and that it is what most other package managers do. The disadvantage is that some registries, that didn't realize they had to take security seriously, will have hardcoded passwords and other ridiculously insecure things. We as a community will need to budget time for the second order impacts of people attempting to steal credentials, as is common with other package managers.
Cargo pics a token format and only allows interactions with that format. We would probably pick asymmetric tokens using PASETO. The advantage is that this is a well-designed format with several important mitigations baked in. The big disadvantage is that we are not compatible with other formats, either new more secure formats, or formats that are already hardened. This means existing multiformat package registries will either not support Cargo, or putting a lot of work to implement art bespoke format, or do incredibly ugly things to hide their existing tokens in our asymmetric tokens.

This pre-RFC was an attempt at a compromise. Cargo will not allow "literally any token format", but Cargo will only allow a list of token formats where we know how to check that some attempt with made to use them correctly. The list of formats can be added to as people request them. The ones that were requested on zulip where JWS (notably used by Azure), JWE (notably used by CodeArtifact), and compatibility with GitHub OIDC (admittedly in the nonstandard one phase use).

By the way, if a compromise is not found Cargo is going to end up in one of the two "consistent positions" described above. Which one entirely depends on whether @arlosi or I are more entirely fed up and exhausted of this conversation. Based on his post and how I feel... Neither of us have much energy left.

So what would actually be useful feedback:

Given that your allowing format FOO, I would recommend adding such and such a requirement to prevent this kind of misuse.
I like the compromise but would like to see format BAR, here is a user, here is how that format could be checked, and I'm willing to submit (or at least review) the code adding support.
One of the use cases this compromise was intended to support should not be supported, and here is a new reason why. (one phase GitHub OIDC?)
There is some implementation detail that could be done better. (For example: can we specify which format cargo should expect the token to be in.)
Here is a different compromise.

CAD97 · April 21, 2023, 2:30am

One option could be to do it via config. The first time talking to a registry and on some expiry term, cargo asks what token formats the registry allows. Cargo notes this, and if a nonopaque token format is chosen, does whatever the interaction pattern is for that format and includes whatever misuse protection is deemed reasonable.

An "opaque" strategy where a simple string token is passed along with every access is permitted, but isn't default; it must be chosen as permitted by the registry. The default for custom registries which don't report on the "what token format" API is http or ssh auth, as stable today.

Cratesio is the privileged default registry, there's no intent on changing that, and cratesio packages cannot depend on packages from alternative registries.

The community will have to mitigate attempts at hijacking cratesio tokens, but the security of alternative registries' tokens is mostly just a concern of those alternative registries, without a way of leaking back into the primary OSS community (except via misplaced sentiments that should more properly be directed at the registry instead, and should be countable by showing how cratesio isn't (as) vulnerable to whatever attack). Because cratesio uses its own unique token, it's inherently shielded from credential stuffing attacks.

arlosi · April 21, 2023, 7:54pm

It possible we could use the WWW-Authenticate header for this. We can't use config.json for private registries, as that's a catch-22 (you need to be authenticated to fetch the config).

However, I'm still not sure how this is better than the strategy of the server rejecting the request with an appropriate error message if the token format is unsupported.

This form of auth is only stable today for public registries. I'd like to make it available for private registries as well. It's implemented on nightly, but stabilization is blocked on deciding whether we should mandate a specific token format as @Eh2406 stated above.

I believe it's completely reasonable for a registry server (such as crates.io) to mandate a token format such as PASETO. That decision can be made on the server side. Crates.io can simply decide to reject non-PASETO tokens at some point.

No other package manager I've seen mandates a specific token format client side (NuGet, NPM, Maven, Gradle, pip, nor Ruby). For example, PyPI uses macaroons for its tokens, but pip isn't enforcing that all registries do so. If PyPI wants to change to a different format they can do so without client-side changes and making older clients unusable. The token is an opaque string.

As we've already agreed that random strings are a secure implementation, it appears that the only remaining major concern is hard-coded default credentials.

If a registry server has the egregious security flaw of hard-coding credentials, it likely has other major issues that would not be solved by mandating PASETO as a token format. It could have a web interface that allows login with hard-coded credentials. Or it could hard code an "admin" PASETO public key, then commit the corresponding the private key into the repo.

While mandating PASETO does make it more difficult to hard-code credentials, I don't think it's worth the downside of making it much harder to integrate Cargo into an existing (non-PASETO) authorization system.

I completely agree with this. Mandating asymmetric tokens will effectively prevent our multiformat package registry (Azure Artifacts) from implementing Cargo support. Implementing the format isn't practical when we already have a hardened token format used by all other products. Our security team will not approve the hack of hiding our existing tokens inside Cargo's asymmetric tokens. Other registries might continue doing incredibly ugly solutions such as embedding the token in the user agent string.

If we want to increase the security of the Rust crate ecosystem, I think our time would be better spent on improving crates.io with features like 2FA, PASETO tokens, restricted scopes, and GitHub OIDC publishing -- not forcing a specific token format on private registries.

Topic		Replies	Views
Pre-RFC: Cargo alternative registry authentication cargo	16	3542	December 22, 2024
[ultra-pre-RFC] Client Certificates for Cargo instead of shared tokens cargo	40	2187	December 22, 2024
Securing cargo publishing credentials tools and infrastructure	27	2427	May 24, 2021
Requiring 2FA to Publish to Crates.io tools and infrastructure	90	8599	March 25, 2019
[WIP, Pre-RFC] Federation cargo	32	2719	March 25, 2019