Glob imports can shadow built-in types

ckaran · August 6, 2021, 1:25pm

You're probably right about this. I was thinking that we could do something similar to std::alloc so that users could choose which crate would be used globally for bignum arithmetic by the compiler, but the more I think about this, the worse the idea sounds to me. Better to just use a crate and all the normal stuff that rust already has and already supports.

I still want (i|u)(8|16|32|64|128|size) + f(32|64) to be reserved by rust though; the fact that it's possible to redefine them is a security hole waiting to happen...

Edit

I just realized that when this topic was split from the original, part of what I was finally convinced of in the original topic was lost. So, to summarize my current thinking from my last post in the other topic, I no longer think that reserving (i|u|f)(8|16|32|64|128|size) as keywords is a good idea. I hope that makes things clearer.

Tom-Phinney · August 6, 2021, 2:02pm

Why are you not asking to also reserve f128, f16 and fb16? RISC-V has had a stable draft proposal for f128 (quad IEEE floating point – the Q extension) for a long time. f16 and fb16 are primarily storage formats, usually employing f32 registers and ALU hardware within a CPU.

See prior IRLO discussion of f128, f16 and fb16 here and (somewhat) here.

zackw · August 6, 2021, 2:11pm

I had that thought too. It might not be a bad plan down the road but I would want to get some experience with bignum APIs in ordinary crates before baking one into the compiler. I have the impression that that hasn't happened yet.

That was my first reaction as well, but after thinking about it some more I don't think it's actually a problem. If you define a private struct i32 (or a variable, or whatever) in your own code it only affects you. If you export pub struct i32 from your crate it only affects clients that do use weird::i32 (in which case they know what they're doing) or use weird::* which is already discouraged, and the consequences are no worse than getting the core Option or Box shadowed unexpectedly.

I can even think of legitimate uses. use checked_integer_overflow::prelude::* anyone?

ckaran · August 6, 2021, 3:07pm

Sure, sounds good to me! The only reason I did the regex I did was because that's what's currently supported in stable rust, but you're right about the other formats as well (it could even be useful for the rust-gpu project).

Yes and no. Yes, it can be useful. No, I don't think it's a good idea.

Have you ever done something like use foo::prelude::*? What if buried deep within it was this redefinition, and you never noticed? The issue is one of programmer confusion, not compiler confusion. The compiler will always know what type you actually mean, and maybe with IDE support, so will the programmer, but I'm willing to bet that in 99% of all code reviews if anyone sees i8 they'll make the assumption that the type is the language primitive and not a newly defined type that happens to have the same name.

The problem with this is when you're writing security conscious code. I know that you can use std::primitive to prevent this kind of problem when it's done by accident (playground):

pub use std::primitive::bool as bool;

pub struct bool {}

fn main() {
    println!("Hello, world!");
}

But that won't work if someone is sufficiently determined to be malicious (playground):

pub use std::primitive::bool as bool;

fn main() {
    #[allow(non_camel_case_types)]
    #[derive(Debug)]
    pub struct bool {foo: String}
    let surprise = bool{foo: "surprise".to_string()};

    println!("Hello, world! {:?}", surprise);
}

(This isn't a particularly devious or clever method of hiding this, but if someone was really clever, they might bury it within layers and layers of proc macros such that the redefinition is only truly visible after complete expansion)

This means that a) every security conscious programmer needs to know that this is an issue, and b) they need to use the complete path for each type that they are using, each and every time they use it. That's not particularly ergonomic...

mathstuf · August 6, 2021, 4:20pm

Isn't banning glob importing good enough here? Sure, I could see it also being hidden in a sea of other imports, but that seems like something clippy can help out with, no?

Sure, if you're going to go all "belts and suspenders", yes. Being absolutely pedantic is never going to be "ergonomic". One could say the same thing about .unwrap() and all the other functions that can panic instead of encoding safety guarantees in their type signatures directly. But these APIs exist and, IMO, if you want to ban them, that's on tooling like clippy (if not clippy itself) to provide.

Tom-Phinney · August 6, 2021, 5:01pm

In that case you might also want to reserve prefixes for posits, which look likely to be a useful alternative storage format for floating-point values, particuarly for ML applications. For that I would propose fp… names, such as fp8, fp16, fp32.

Note that storing floating-point values in posit format in memory does not imply that their processing occurs within a posit ALU with a posit quire. In such a case posits serve simply as a compression format for IEEE 754 floating-point values near ±1.0, reducing demands on cache and other memory and their associated energy usage.

ckaran · August 6, 2021, 7:00pm

I'm not sufficiently well-versed in the art of malicious proc macros to prove this one way or another, but it may be possible to generate a macro that, in addition to doing something useful, has a malicious payload (redefining a primitive type in some manner) which isn't visible to simple analysis (grep won't work, only cargo expand + grep will work). Does clippy analyze code after macro expansion? Are there other ways of hiding that are even more arcane? No idea... and I don't want to find out the hard way!

True, but in this case it can be ever so slightly more ergonomic!

Yes, you're right. For complete safety, you'd need to rewrite your code so that every variable is typed. For security-conscious applications that really want to verify that the types of the objects that they're working with are what they expect, this may be the only way. However, I'm not trying to solve all security problems at once, I'm looking for a fast, simple, 80% solution that turns certain potentially nasty pitfalls into compile-time errors.

Hmmm... are there any tools out there that can rewrite rust code to include the type of each variable? E.g. turn let foo = 1u8 into let foo: u8 = 1u8? If you could run the tool before a code review (or even as a part of the commit process), then you'd know what the types (maybe even the lifetimes) of all variables are, which could help with debugging and security review.

scottmcm · August 6, 2021, 7:02pm

What's your threat model here? This is by far the least of your problem if you have untrusted people writing unaudited code in your program.

And really, it's not u8 and such that I'd be worried about, because custom literals don't exist, so a different type will usually not compile. vec! or similar is far more scary, but I think it's hopefully obvious that we shouldn't have String and Vec and ... all be keywords too.

jjpe · August 6, 2021, 7:33pm

Not really no. Sometimes I use something like use MyEnum::* so that I can match on that enum's variants without needing to specify the enum in every arm. This can be useful to keep things readable when the match patterns themselves are long.

If glob imports are banned, this would cease to work as well. Perhaps use * could be banned, but banning all glob imports seems undesirable.

In addition to the above, if glob imports are banned, a malicious macro could always do something like use MaliciousType as u8 for example.

Of course to avoid detection that MaliciousType would then have to provide the entire API of u8 in order to prevent compile errors. And providing such an API, while doable, isn't exactly trivial, especially if everything is expected to compile. And even then there's the issue of literals, as mentioned above.

ckaran · August 6, 2021, 7:57pm

Neat, I'd never heard of them before!

As for reserving a name for them... Honestly, I'm right on the fence for this one. The issue is that there are many number systems out there that are all useful, so picking and choosing which will be sufficiently important to reserve a namespace for them is difficult. Right at this moment I'm ever-so-slightly against adding a new namespace for fp only because it seems to be fairly new, and I don't know if it'll be the next big thing, or if it'll be something that everyone forgets about in a few years, but which rust will then have to continue to support ad infinitum. Like I said though, I'm only mildly leaning this way, and will follow whatever anyone else suggests...

ckaran · August 6, 2021, 8:07pm

Not exactly a 'threat model', more of a 'work model' and 'mental model'. I rely on the compiler to catch a lot of errors for me, e.g., all of the safety guarantees that rust has that C/C++/etc. don't have. One of the things I rely on is knowing that u8 is definitely the primitive, and not something else that is unrelated, which also means that when I'm reviewing code I tend to make assumptions about what I'm looking at when I see u8. cargo expand will help you catch a great deal of nastiness, but you need to be aware that this is a possibility... which I wasn't until this entire topic came up. Basically, it feels like something that would work well in the underhanded rust code competition.

mathstuf · August 6, 2021, 9:12pm

Well, sure. There is that crate that does stdout stuff in its exported proc_macro, but at that point, I think you're better off with something like crev than automated detection tools. So, yes, it can make u8 for you, but I don't think it can interfere with:

takes_u8(1) will always be std::primitive::u8 since the literals don't care about the local scope (or you get a type mismatch since there's no implicit conversion to user types from {integer} AFAIK
takes_u8(1u8) will as well since the suffixes are not user-influenced either
pub fn my_u8_func(t: u8) will have its callers be very confused unless the proc_macro also "infect" call sites (internal callers can be confused though)
names are "scoped", so if you accidentally use u8 as a variable name, it's not going to care about types or modules named u8 available

I guess I'm just coming up short on how big of an impact a malicious proc_macro can be without someone noticing it in a test suite or regular API usage. But maybe I'm just not creative enough here.

Sounds like rust-analyzer or some IDE integrations that are available.

Sure, but is that at the module level or inside of an impl or fn definition? Even if the scope level isn't that interesting, I think clippy can distinguish between "import of variant names" from "import all symbols under a module", but I'm not 100% sure about that.

mbrubeck · August 6, 2021, 10:18pm

As a first step, I would suggest that either rustc or clippy should have a lint that warns by default when a glob import shadows any name from the built-in types or the prelude.

CAD97 · August 6, 2021, 11:14pm

If you're really worried about prelude items (and primitives) being shadowed, you can just use them explicitly.

If you use std::primitive::u8;, then a glob import cannot bring a u8 type into scope.

The base semantics for glob imports is that the prelude glob is in scope, then explicit globs override the prelude (with two globs importing the same name being ), and then named imports.

If you use std::{primitive::*, prelude::v1::*};, then any glob import that attempts to shadow a prelude or primitive name will cause an ambiguous glob import error (when you use the name).

This doesn't prevent creating a type that clashes with prelude/primitive, but it does stop glob imports from being the cause of confusion.

ckaran · August 9, 2021, 2:27pm

Taken together, these may be enough. Ideally, there'd be some switch within rustc that would do the rewriting as external tools may have bugs that result in them believing the type is different than rustc thinks it is, and the real issue is someone tricking the compiler, not some external tool. So, how hard would it be to add in a new switch to rustc?

I'll probably start doing that with my own code as a stop-gap measure, but I'd really like to see a rewrite switch added to rustc. The issue is that I know I'm not smart enough to spot every possible trick that can be done to obscure the types being used. The ones I suggested earlier are the ones that I, as a mediocre rust programmer, can think of. I've seen some of the code that people on rust team have written on their personal projects, and I know it's beyond my ability to do a serious security review. At least if I had access to the type information I'd know that something was off if someone comes up with a clever way of substituting their own type for some primitive...

I like crev, but it isn't a substitute for doing your own review. The idea of rewriting to include type information is that it makes it easier for us ordinary programmers to review code and make sure it's doing what it should be doing, especially when we're under a time crunch and need to quickly review large piles of code.

mathstuf · August 9, 2021, 3:02pm

I think that "being under a time crunch" is mutually exclusive with quality code reviews (at least based on my experience). Even if all the types were right, the hard work is in understanding the code, of which the types are (IME) a small portion of it. I'd re-evaluate your priorities if you end up having to pull in code on a deadline without the appropriate time to evaluate it against your quality requirements.

ckaran · August 9, 2021, 3:28pm

I agree with what you're saying, but you are implicitly assuming all of the following:

The team is large enough and experienced enough that they can review all of the code and all of the dependencies sufficiently well. When you have a mix of experience levels, this isn't guaranteed.
That management is supportive of the level of effort to perform this kind of review.
Programmers don't make mistakes; I may think I know the type of an instance, but having the compiler tell it to me clearly means I don't have to guess.

Finally, just to be clear, I want this topic to stay focused on the title: 'Glob imports can shadow built-in types'. While I agree with your earlier statements, it can very easily lead us down the rabbit hole of what should be done with regards to proper software engineering, etc. Let's not go there, and instead focus exclusively on topic's title in this thread.

Given that, I want to bring things back to my earlier question: does anyone know what the level of effort would be to add in a new compiler switch to rewrite code with the type information for each variable explicitly written out?

scottmcm · August 9, 2021, 5:27pm

Probably a fair bit, as IIRC type inference is done on desugared code. And, of course, there are things -- like closures -- with types which cannot be named. As well as RPIT values of non-pub types whose names are inaccessible.

I disagree that they "why" is unimportant. For example, if the goal is code review, can it be done in an IDE that shows type hints instead of needing a fully-type-annotated version of the source code?

(Though really, for review of untrusted code, you can't get types, since type inference needs to run build scripts and proc macros and such, at which point it's too late if it's malicious...)

ckaran · August 9, 2021, 5:52pm

I was afraid of that... how does the error handling code work? That is, if I make a mistake with my code like:

fn main() {
    let x: isize = 1u8;
}

I get:

error[E0308]: mismatched types
 --> src/main.rs:2:20
  |
2 |     let x: isize = 1u8;
  |            -----   ^^^ expected `isize`, found `u8`
  |            |
  |            expected due to this
  |
help: change the type of the numeric literal from `u8` to `isize`
  |
2 |     let x: isize = 1isize;
  |                    ^^^^^^

This has all the necessary information to do the rewrite, so if there's a way of hijacking it, it may be sufficient for a proof of concept.

I agree that this is incomplete at best, and that there are always going to be ways of subverting the compiler; it wasn't meant to be 100% perfect. It was just meant as another tool in our belts, kind of like how logging, unit testing, fuzzing, code reviews, etc., aren't perfect, but are helpful.

As for the build scripts, etc., you're right, there is no way to ensure that you won't be hit by a sufficiently motivated and skillful adversary. I'm not even sure if building within a VM is sufficient as there may be ways of detecting if you're building within a VM and behave 'nicely' in those circumstances. All I'm suggesting is making it a little easier on the defenders, and a little harder on the attackers. Make them work for that exploit!

mcy · August 9, 2021, 7:24pm

People have already said the equivalent, but I'm going to just state the obvious: if you are writing secure software and pulling from crates.io in any way at build time, you have already lost and your software is, by definition, not secure. (Having a Cargo.lock that checks hashes is... a workaround, but a very silly one. Just maintain a third_party and don't think about it too hard)

Supply chain attacks, people!

Topic		Replies	Views
Minimal acceptable integer size - i32min, u8min, etc language design	30	2735	November 14, 2020
Problems with floats language design	12	2506	December 24, 2019
Pre-RFC: Rust and Large File Support (LFS) libs	18	4580	March 25, 2019
pre-Proposal: Math Working Group community	13	2083	February 4, 2021
(Pre-?)Pre-RFC: Range-restricting wrappers for floating-point types language design	13	3941	March 25, 2019

Glob imports can shadow built-in types

Edit

Related topics