An idea to mitigate attacks through malicious crates

The problem

I assume that most readers here are aware of the recent flatmap-stream attack on Node. If you’re not, here’s the short version: the developer who managed a much-used library turned out to be malicious, and injected malicious code, which flew under the radar and was installed as dependency in gazillions of applications.

The Rust ecosystem is vulnerable to the exact same kind of attack: one could imagine a malicious developer taking over, say, Itertools or some other popular crate through social engineering, and using to to attack Servo users, for instance.

I believe that there are several ways to mitigate such attacks, at several levels, and I would like to suggest a purely-technical security approach, based on permissions.

Permissions

I would like to add the ability to tag crates by what they can do, and have cargo (or perhaps crates.io?) verify these tags when updating dependencies.

For instance, let us assume that we can differentiate the following crates by static analysis:

  • crates that could perform I/O (including any crate that contains unsafe code, or crates that depend on crates that could perform I/O);
  • crates that never perform I/O.

(I know that we would need to special-case things such as logging or build-time I/O, but please bear with me for the moment)

A crate InMemory that never performs any I/O is declared in Cargo.toml

[package]
...
permissions = []

If we write a crate that depends from InMemory, when cargo build builds a version of InMemory for the first time, it runs static analysis to ensure that neither InMemory nor its dependencies feature unsafe code or call into std::fs or std::process.

If this invariant is violated, compilation fails.

Conversely, a crate OnDisk which does perform I/O is declared in Cargo.toml

[package]
...
permissions = ["io"]

We do not need to check anything for this crate, but using it taints its dependents.

We now develop a crate MyCrate that depends on InMemory and OnDisk:

[package]
permissions = ["io"] # Since we depend on a package that requires io

[dependencies]
ondisk = { version = "*", permissions = ["io"] }
inmemory = { version = "*", permissions = [] }

What we gain

If the static analysis is sound, we have safely decoupled code that can perform I/O for code that can’t. In practice, this provides a form of (minimal) crate-level type system.

In particular, if inmemory is every updated to a version that performs I/O, the developer of mycrate will find out about this as mycrate will stop building due to a now invalid dependency to inmemory.

Limitations

This is by no mean a complete solution to malicious code. However, I believe that it would already mitigate all the simple cases.

The example above uses I/O as a permission. It is, of course, possible to think of other permissions.

We probably want to allow some specific releases of some crates to drop some permissions (e.g. log performs console I/O, which sounds fair enough) after passing some kind of audit. To be discussed.

Precedents

About 20 years ago, the MMM web browser used this mechanism to guarantee that OCaml extensions could do no harm (for some definition of harm). I have not heard of any problem caused by this policy, although I admit I haven’t attempted to follow this closely.

9 Likes

I’m definitely on board with the general principle that most crates shouldn’t be doing I/O and many malicious attacks have to do I/O in order to do any real damage, therefore restricting which crates can do I/O is a good strategy.

But I don’t think it has to start as a cargo feature. This feels like something which should be developed and experimented with as a separate tool before we can be sure this actually works well enough in practice to be worth putting in cargo. Are there any limitations of cargo that prevent us from developing a separate tool like this today?

2 Likes

@Ixrec Good point. I think that (if successful) it should end as a cargo feature, but you are right that it doesn’t need to start there.

1 Like

From a member of the cargo team, I am a big +1 on experimentation out of tree! IMO, give it a try, find the pain points, an talk to us on how to make things less painful. If there is something we are doing that makes it hard for the community to experiment with this kind of functionality, then we should fix it.

Related topic:

There’s a subset of “crates that do no IO”: those where all functions are const fn and all trait impls are const impl. My goal is to improve const eval to the point where this is true for many, if not most crates, and we’re just left with a bunch of “does io” and “does ffi” crates.

If a crate goes from being “const” to not, that’s a breaking change already, and will some day be caught by semverver.

3 Likes

What counts as I/O?

Would a crate with generic T: Read or T: Write count as I/O?

Does using Read/Write with &[u8] / &mut [u8] count as I/O?

1 Like

What counts as I/O?

I would say that as a first version, anything that either has unsafe code or calls/imports

  • std::fs; or
  • std::process; or
  • std::net.

edit Added std::net at @Soni's suggestion.

TCP/UDP/std::net?

You are right, std::net should be added.

I think this is more or less the same idea I proposed here:

My observation was there's a common element uniting everything with effectful behavior (or otherwise relying on ambient authority, which is an effectful behavior): unsafe.

What I was ultimately suggesting was something where unsafe blocks are tied to something like cargo features, so feature labels like this could get applied, and crates consuming other crates could potentially opt-in to any unsafe behaviors.

It'd end up looking a lot like this proposal, except it'd rely on something very close to the existing cargo features mechanism, and it'd be useful with any crate, including `std.

1 Like

Glad to see I'm not the only one who thought of this!

I'm not sure what you mean. For me, unsafe, by definition, can hide an effect or an extra-functional property, is this what you have in mind?

My proposal above is just one example of how it can be represented. If crate features are better than my ad-hoc examples, I have no problem with switching to crate features. I was also thinking of making this a fake dependency, which might be a tad ugly.

You would need to introduce a mechanism of crate feature inheritance, right?

How so?

I don’t like this.

Why not base it on rust/std features + unsafe? We already have a registry of std features in the compiler.

I’ll list various ideas that might reduce the risk with less blow back than crate capabilities. In most cases, these should improve security more by avoiding more likely attack vectors.

Any cross-crate testing infrastructure helps. In particular, trait integration tests would create cross crate testing that increases the chances tests fail when doing strange things. I’d expect other lightweight tools for design by contract and formal verification help too.

There are cargo namespace designs based on maintainer identity that address basically all worries about namespaces, while reducing this risk. Anytime a malicious developer must socially engineers their access to an abandoned but popular crate then the original developer has good odd to make them move the crate to their own namespace, so which dependent crates observe as “more than a version update”.

I also suspect the trend towards micro-crates increases the risk because micro-crates have greater odds for being abandoned. It’s likely “moderate sized collection” crates like itertools that sound less likely to face abandonment. In many cases, collections make no sense, but when they do make sense then they should be pursued within reason.

Instead of crate level capabilities, we need a "capabilities oriented std" that provide a the type system layer for capabilities uses it for a capabilities based IO system. It might work with CloudABI, apparmor, etc., but regardless it pushes Rust developers into building clean capabilities oriented crates, which makes security and auditing easier.

Also, the unsafe blocks in unsafe fn RFC helps against such attacks too.

While all these ideas sound like they go in the same direction, most of them seem to address different threat models.

If I read you correctly, this would require moving to a type and effects system, right? I feel that this would be much more powerful, but also much more complicated, than crate-level capabilities. Also, this would require language-level buy-in, which would require a much more sophisticated consensus.

What we have discussed so far can be implemented pretty easily with an external tool.

How about: features (the #![feature(...)] kind) and dependencies.

The compiler already has the ability to check (and disallow) features, and the ability to lint and disallow unsafe code.

This means we just need to let cargo interface with the compiler and allow controlling dependencies.

Something like:

my_dep = { allow_dependencies = { foo = { allow_features = {} } }, allow_features = {} }

(I’m not sure if this is valid toml)

@Soni Yeah, that may be a good idea.

I need to brush up on how features work in the stdlib (I don’t think it’s entirely the same as in crates, right?), but this looks like a basis that could work.

I’m trying to wrap my head around what would be required:

  • patching a few stdlib modules to place them behind the feature capability_io;
  • patching the compiler to automatically place the feature capability_unsafe (or capability_io again?) on all functions and data structures that use unsafe blocks outside of the standard library and on all functions and data structures declared unsafe anywhere.

(also, probably some work on error messages)

Would this be sufficient? Could we replace the compiler patch with a custom compiler driver to the same effect? Perhaps an extension of cargo-geiger?

@bascule Is this also the kind of idea you had in mind?

Yes, unsafe behaviors are, in particular, the ones you are trying to apply permissions to in your proposal.

This already exists: features can be represented as paths through a hierarchy dependencies ala foo/bar/baz/featurename

What I was proposing was a generally useful mechanism and would be consumed by std the same way that crate authors would.

The only difference I suggested was allowing std to allow certain unsafe blocks to be whitelisted, e.g. heap allocations.

I’d like a pile of (preferably true) random numbers from OsRng. It works in different ways on different platforms; I can tell that (IIRC) on macOS and GNU/Linux it reads from /dev/(u)random. How will my crate be classified if I don’t print to screen, don’t log to a file, don’t do networking, but generate lots of (P)RNs?

I think ideally the OS RNG would be a lang item, and exposed through std. There's been some discussion about that on the rand crate: `OsRng` crate? (or optional PRNG dependencies) · Issue #648 · rust-random/rand · GitHub