Add `thiserror::Error` proc macro to `std::prelude`

For field-less enums you can get the discriminant in a const context, so the discriminant needs to be assigned before the linker runs. So this needs to be restricted to enums with fields, or needs an annotation.

This could block the optimization in cases like this:

enum Error1 { A, B };
enum Error2 { C, D };
enum Error { Error1(Error1), Error2(Error2) };

Since the discriminants for the two errors can overlap.

Alternatively the logical discriminant returned by as, and the in-memory representation of the discriminant could be decoupled (not sure if that is already allowed for repr(rust) enums).

For avoiding the "one big crate-wide error enum" problem, I think it might be more helpful to land support for pattern types so functions can indicate which variants of the error they might return. I'm envisioning something like this:

#[derive(Debug)]
enum Error {
     A,
     B,
     OnlyFromFoo,
     OnlyFromBar,
}
impl Display for Error { .. }
impl core::error::Error for Error { .. }

fn foo() -> Result<(), Error @ (A | B | OnlyFromFoo)> { .. }
fn bar() -> Result<(), Error @ (A | B | OnlyFromBar)> { .. }

In my experience, there's a lot of overlap between the errors that different functions in the same crate can return, so without pattern types you end up duplicating a lot of code to include the same variants in a lot of error types.

Not sure what the state of pattern types is, I've seen some RFCs and some threads here, but I haven't kept a close track on how much they're proposing to implement or what the proposed syntaxes are.

3 Likes

Since the layout of Rust enums is unspecified, and the default discriminant is 64-bit wide, I think a one possible approach to enum flattening would be randomization. The first variant of an enum could be assigned a random offset in usize rather than 0. Since enums usually have only a few variants (a handful, almost never hundreds or thousands), the probability of discriminant range intersection for different enums would be about the same as the probability of collision for the lowest discriminant. By the Birthday Paradox, that's \sqrt(2^{-64}) = 2^{-32} , which is vanishingly small. It's not impossible, of course, but it's small enough that almost all enums would have different discriminant ranges and could be flattened, without compromising separate compilation.

One downside would be the possibility of random performance regressions, if the discriminants randomly collide for a given build. But since flattening matters only for very specific pairs of enums, I would expect it to mostly don't matter in practice (not any more than any other random compiler optimization). Another downside is that flattening would be significantly more rare on 32-bit or 16-bit systems. On 32-bit systems the probability of collision is 2^{-16}, which will likely happen reasonably often for reasonably large programs (though the probability of collision for nested enums, where it actually matters, may still be small). On 16-bit systems collisions are quite likely, at 2^{-8}.

It may be possible to devise smarter heuristics for discriminant offsets, which would make collisions even more rare in practice.

3 Likes

I assume that we could also have a per-compilation prefix. For example, if the discriminant is 64 bits, each compilation unit would randomly get one 32 bits prefix, and would be able to have up to 2^32 unique discriminant value in the suffix. If a compilation unit has more than 2^32 enum variant, it could generate a second random prefix, and so on.

And ideally it would be possible to use some linker trick (including when dynamic linking) to have the prefixes guaranteed to be unique by using the same mechanism as code relocation.

1 Like

Doing it randomly would break reproducible builds. Doing it based on the StableCrateId (truncating to 32bit) is not enough to prevent collisions. You only need 65536 crates to get a 50% chance of a collision for a 32bit prefix. You did need to at least use the full 64bit StableCrateId. Only that is guaranteed to be collision free (by rustc already emitting an error in case of a conflict)

Would this matter in practice ? I assume a given binary will not use 65536 dependencies that often, would it?

Or are the benefit only there iif the discriminant are guaranteed to be unique?

2900 crates (which is reasonably realistic) already gives a collision chance of 0.1%, which is way too high for rustc. Every time rustc updates or you update a dependency, you get another chance at a collision, so I think the chance of a 2900 crate project getting a collision somewhere in its lifetime and thus getting a (temporary) random performance regression is close to 100% I think.

2 Likes

Would having a 48 bits prefix and 16 bits suffix (which increase the chances that each compilation unit request more than 1 prefix) help?

And IIRC functions are effectively using only 48 bits out of the 64, so would it be possible to trick the linker and use relocation if needed to guarantee that prefix are unique as-if it was a function address? Maybe this would require to compute the discriminant at runtime by doing &some_function_with_global_unique_adress_guaranteed_by_the_linker + unique_id_in_the_crate?

Probably.

Linker tricks unfortunately don't work across dylibs. There is no way to guarantee that the dynamic linker deduplicates symbols. This is also why generic statics are not supported.

In practical terms it's 8-bits, for example this enum takes 2 bytes:

enum X {
    A(u8),
    B(u8),
}

Using a random 64-bit discriminant would bloat this struct to 9 (unaligned) or 16 (aligned) bytes.

1 Like

I think it could arguably be seeded by the same seed as -Z randomize-layout to address those issues.

TypeId and symbol mangling already have the requirement to have globally unique types and symbols across the program. The longstanding Collisions in type_id was closed so this provides a path to have globally unique error codes as well (I guess?)

I think it would be great to have some built in to check whether some perf regression happened due to randomness. That way devs can assert in the main and just tweak something to fix it. I fear that not having any way to be sure will not please any any certification.

It was closed, but not fixed – rather it was split into a number of different issues (which are still open because there are still problems that need fixing). The relevant one is probably "type_id is not sufficiently collision-resistant".

2 Likes