Global Registration (a kind of pre-rfc)

In the past, there has been discussion about a feature that allows a global slice of elements to be built from parts, spread around a crate or spanning over multiple crates. (see From "life before main" to "common life in main" for past discussion)

The testing devex team has a need for this, and I'd like to start experimenting with this. The following is a proposal I wrote for `#[distributed_slice]` aka a method to enumerate tests · Issue #3 · rust-lang/testing-devex-team · GitHub, which has a little more discussion.

Problem Description

#[test] is magical. Using the attribute, functions spread all over a crate can be "collected" in such a way that they can be iterated over. However, the ability to create such a collection system, or global registration system, turns out to be useful elsewhere, not just for #[test].

The following are just a few examples of some large crates using alternative collection systems (inventory, linkme based on ctor) for one reason or another:

Additionally, one can imagine webservers registring routes this way, although I found nobody doing that at the time of writing.

In almost all the examples above, doing global registration is is an opt-in feature, behind a cargo feature flag. Existing solutions are a bit of a hack, and have limited platform support.

Especially `inventory`, based on `ctor`, which most crates mentioned above use, is only regularly tested on windows, macos and linux, and use on embedded targets is complicated. On embedded targets you must manually ensure that all global constructors are called, or a runtime like [`cortex-m-rt`](https://crates.io/crates/cortex-m-rt) must do so.
However, also `linkme` had 3 cases where it broke with linker errors, or was missing platform support in 2023.

It seems, authors of libraries are wary including registration systems in their library. I conjecture because random breakages due to a bug in a downstream crate, or limited platform support is painful and limiting. Bevy has discussed exactly this, citing limited or no wasm support.

Specifically for the testing-devex team, working on libtest-next. It was proposed by Ed Page (and in in-person conversations) that we should make #[test] less magical so rust can fully support custom test frameworks. This plan was explicitly endorsed by the libs team.

Custom test frameworks are useful for all kinds of purposes, like test fixtures. Importantly, it is essential for testing on #![no_std]. The only way to currently do that is using #![feature(custom_test_frameworks)] It was discussed (in-person) that this is also useful for rust for linux.

In summary:

  • #[test] is a magical registration system which cannot be used for any other purpose than tests.
  • Libraries do seem to have a need for registration systems.
  • Crates offering registration system by using the linker are in use, but it seems platform support and fragility is an issue for downstream crates.
  • To advance the state of testing in the language, having access to a better supported registration system is desired.

Existing solutions

Linkme

Because this pattern is so useful, there are libraries available in the ecosystem that try to emulate this behavior. Primarily, there's linkme's distribted_slice macro, by David Tolnay. As the crate's name implies, this works by (ab)using the linker.

linkme has had issues because it was broken on various platforms in the past. Indeed, it has some platform specific code, though most platforms are now supported. The crate works by creating a linker section for each distributed slice, and placing all elements of that slice in this section. Based on special start and end symbols that are placed at the start and end of this section, he program can figure out at runtime how large the slice has become reconstruct it using some unsafe code.

Inventory

An alternative approach, also written by David Tolnay is inventory, based on ctor. Using ctor you can define "global construtors". Entries in a special linker section that, on various platforms, are executed before main is called. The name and semantics of these sections changes per-platform, and using them users can execute code before main.

This is wildly unsafe, as std is not yet initialized. ctor's README.md on github starts with a large warning to be very careful not to call std functions and to use libc functions instead if you must. In inventory, these ctors each execute a little bit of code to register some element globally before main starts in a linked list.

#[test]

#[test] is unique, in that it does not involve the linker at all. Instead, the compiler collects all the marked elements and generates a slice containing all elements from throughout the crates.

Note: this is also what Custom Test Frameworks does.

This can be both an advantage and a disadvantage.

Advantages:

  1. It's super stable. It is guaranteed to work on any platform
  2. If something goes wrong, you don't get a nasty linker error, but a nice compiler error
  3. Because it works on any platform, it indeed could support custom test frameworks on #![no_std], a part of the reason why we'd want a global registration system.
  4. It might be possible to support during const evaluation, though comments on a recent RFC by Mara show that this can also be undesirable, as it means that all crates need to be considered together during const evaluation.

Disadvantages:

  1. Building a slice at compile time simply does not support registering elements loaded through a dynamic library (though there might be some ways around that: TODO). Rust's story for dynamic libraries isn't great anyway, but this would add another major blocker.
  2. Exactly this might make hot-patching binaries harder. There were some proposals for this floating around but it would make future implementations of this harder. Actually, that goes for this entire feature, whether supported through the linker or the compiler.

Possible alternative solutions

Keep things as-is

Having this feature only supported through downstream crates.

Providing linkme's distributed_slice or inventory as part of the compiler

Either of these methods would have limitations in platform support, but if they were also tested as part of the compiler we might be able to guarantee some sort of stability. I'm especially wary of the ctor based approach, but maybe linkme isn't so bad. It seems to support many platforms, and even has a test of it running on cortex-m #![no_std]. It does require a modified linker script listing the sections used for the distributed slices. Theoretically the compiler could automate those additions to the linker script.

However, it's unclear whether linkme supports WASM, and based on my own testing I don't think it does. I'm unsure what would be required to start supporting that.

Ignoring dynamic libraries: providing distributed slices like #[test]

This is the approach https://github.com/rust-lang/rfcs/pull/3632 takes. Their reasoning is that current similar systems don't either: global_allocator doesn't work with dylibs either. Indeed, tests also don't work across dylibs. However, that's never a concern as tests are usually crate-local and always statically compiled with the binary they're testing.

Ed page also has an opinion about this, and thinks we shouldn't worry too much dynamic linking right now, though we should check with Bevy whether it'd be benificial for them.

Personally, I do think we could keep in mind that dynamic linking exists, and we should make sure that the only possible implementation of a design is not to support dylibs at all.

A hybrid approach: a proposal to move forward

I think there is a hybrid approach we can take. One that does not completely rule out dylibs, but might initially not support them while still meeting most people's needs.

The name "distributed slice" might not be very accurate. With global registration, the ordering of elements is not important, and essentially deterministically random. It's more like a distributed set of elements actually, where the index of elements in the slice is essentially useless.

Instead, I propose to expose a registration system as an opaque type that implements IntoIterator, just like std::env::Args. Initially, we can choose to not even expose a len method, as the lenghth might depend on the number of dynamic libraries loaded. The implementation could then be a slice, or a linked list, or a collection of slices linked together (one per dylib?). Crucially, the key here is that in this way, we expose the minimal useful API for global registration, leaving our options for implementation details completely open, such that we can change the internals of it at any point in the future.

The only downside of this approach that I could find sofar is that iterators are not const-safe (yet). Whether we want to support iteration over globally registered elements in const context is questionable (as highlighted above; then const evaluation depends on all the crates are being compiled and might register elements), but it would restrict that feature. I believe that's acceptable, especially now for experimentation, and where any linker-based approaches wouldn't support that use case either.

We should also make sure that we only implement traits for this opaque type that stay compatible with slices, so we're free to expose a slice in the future if we want to.

I'd like to experiment with that approach, to see if it meets enough people's needs. If not we can consider one of the other approaches highlighted. I propose calling the feature global_registration, not distributed_slice to be more generic.

24 Likes

I have never used these systems in Rust directly myself, but I have used similar things in C++ (static constructor based). There these are fraught with foot guns, even for static libraries.

In particular, the linker can leave out object files from static libraries that aren't referenced from anywhere (unless you pass a specific whole-archive flag to GNU ld). How will the proposed solution work when building staticlibs for inclusion in non-Rust programs? What is to stop the linker from just skipping some of the objects here too?

I personally suspect we'll end up going the fully static compile-time approach, with the assumption that dynamically loaded modules will need a different kind of dynamic registration mechanism. For instance, modules might need to provide some kind of metadata, or modules might have to make a call to register themselves.

In part, I think this is the case because otherwise we'd probably need memory allocation for this. Or do you see any obvious implementation that wouldn't need to allocate memory in the background for this?

That said, I don't see too much harm in initially making this work via an Iterator. Worst case, if someone needs the length, they can iterate it twice, or use collect. We should keep an eye out for how people use this, whether they're running into the limitation often, and whether anyone has concrete plans to extend this to dynamic libraries.

2 Likes

How will this work in staticlibs integrated into C/C++ projects given that the linker is free to discard unreferenced objects / sections? And that it is a foot gun in C++ to do static constructor based registration already. How will this work across multiple rust staticlibs linked into the same C/C++ project?

I don't think we should do this via static constructor-based registration.

I expect that it wouldn't, for the same reason that some Rust features don't work as expected today if you compile Rust into a static library and then link it into other Rust code as a generic static library.

2 Likes

I don't actually know what a linker today would do with a linkme style approach when working with staticlibs, so that could be just as footgunny. And since in a staticlib there is never any main or _start that gets called that Rust is in control of that doesn't sound like it would work either.

I believe it would be very unfortunate if whatever approach is taken would make it impossible to use certain crates that relies on this, just because you are building a staticlib as you are slowly converting your existing C++ code base to rust.

One potential solution: rustc collects all the instances of this at compile time, statically constructs a slice, and provides an iterator for that slice. If you use rustc to compile a staticlib, rustc would have no problem providing such a slice/iterator for everything in that staticlib.

If you're building a staticlib that needs to integrate with a C or C++ project that's also using a mechanism like this, the C or C++ project is presumably instead using linker sections or constructors or some other such approach, and you'd need Rust to integrate with that, which is probably best done via a crate. (Though, such a crate might be able to use this as a starting point or building block.)

6 Likes

Regarding the dynamic libraries support, how hard would it be to make it work with Rust dylibs (i.e. with crate-type = "dylib")?

4 Likes

In my opinion, the approach with slices constructed statically by Rust compiler would be the most straightforward one.

But I think it also could be quite useful to allow registration code to get position of registered entry in resulting "distributed" slice. Imagine a structured logging system which writes to log file only message ID and serialized logged data, you then could reconstruct message (and formatting options) by simply indexing associated "distributed" slice embedded into resulting binary. For this to work, registration "function" should at compile time evaluate to absolute position in slice accumulated by compiler and we should be able to do this registration inside code, not just on statics.

In code it could look like this:

// `struct_logger` crate.

// Global statics must be slices.
// `&str` is used for simplicity, it also could contain
// logging level and formatting information
#[global_static]
pub static LOG_MESSAGES: &[&str] = &[];

// library crate (after macro expansion)

// `global_register` is an intrinsic pseudo-macro
// which always evaluates to `usize` at compile time.
// First argument must reference `#[global_static]` item
// and type of the second argument should be item type of
// the global slice.
const MESSAGE_ID: usize = global_register!(
    struct_logger::LOG_MESSAGES,
    "my log message",
);
struct_logger::write_log_message(MESSAGE_ID, extra_args);

If it's the only logging message in the whole dependency tree, LOG_MESSAGES will evaluate to &["my log message"] and MESSAGE_ID will be equal to 0.

Such feature would also be useful for registering custom error codes in crates like getrandom.

2 Likes

While this is an interesting idea (and reminds me of the defmt crate), I don't quite see how to ensure stability in the slice between compilations (seems useful for logging). Unless we go a step further and have ordered slices for types implementing Ord, but what about equal entries then? I don't think doing deduplication is correct in the general case. Some further thought would be needed there to flesh this out.

I don't think you should rely on de-duplication and stability of the slice in such setup. The idea is that you would generate log translation file for each version of your application (effectively dump of the accumulated slice) or would use application itself to interpret logs.

The fully-static approach, unlike linkme, is the compiler building an extra global, anonymous array, which is either referenced (by the std API that collects such things) or it isn’t. It’s the same as if you had defined such a static yourself, and it is correct for the linker to drop it if it isn’t referenced.

One interesting question is whether the collected set is per-crate or shared across all crates. #[test] is per-crate. linkme is shared due to how it’s implemented (and there are certainly use cases that depend on that). There are use cases for both, but per-crate is easier to implement.

4 Likes

I was assuming it would be possible to choose, there are use cases for both. For example, typetag and inventory needs to be global to work. Tests wants to be per crate.

This is absolutely my instinct as well. It trivially works cross-platform, and not involving linker tricks is a huge bonus.

Seems at least plausible to provide it even if some fancier dynamic linker-aware system existed in the future, because providing the simple, safe, and reliable version is useful.

Maybe just intentionally randomize it? Like we have -Z layout-seed for layout randomization, we could do something similar for the slice order.

1 Like

I would actually prefer some guarantee on this for Ord types. It would be useful to be able to do a binary search over the array. On embedded it is certainly too expensive to copy to ram. And there are others who would prefer to avoid it if possible.

The problem is that the final order will not be known until the full slice is assembled, which means that compiler would have to fix all IDs retroactively. Alternatively, we could use an additional indirection layer, but it will be neither pretty, nor efficient.

Plus, any sort of ordering or de-duplication would inevitable involve something like procedural macros and it's a relatively heavy machinery to involve by default. At this point, it's probably better to imagine an even more general solution. Something like global statefull proc macros which can be queried/mutated by downstream crates and finalized into static with delayed linking. Such feature would allow us to build more complex static types, e.g. static hash tables (imagine a PHF crate with items addable in downstream crates).

1 Like

If we want to support getting a unique index back, then we could give each registered thing a globally unique key (e.g. crate::module::identifier or crate::module::file::line or similar), and sort the slice by that.

My first instinct is that we shouldn't do this though.

That's definitely an interesting use-case. I've been thinking that these would often be collecting things like &'static dyn MyTraits where sorting doesn't really make sense in the first place.

We should perhaps start with doing an inventory not just of existing solutions (linkme, inventory, ctor based things) but also what they are used for, and what things that don't currently use the could use this instead.

This could help decide things such as global/per-crate/both, unsorted/sorted/generic-global-proc-macro-that-can-do-distributed-phf.

One use case for doing a search in the slice is if you register deserialisation handlers for dynamic traits. One solution for this is to key on a user provided key, which you put in the stream and use to look up the deserialiser at runtime. The registry being sorted allows good complexity without having to resort to copying to RAM.

I haven't looked into how exactly typetag does this. It uses inventory, which is ctor based. But perhaps @dtolnay (the author of both of these, of course) could give some insight into what features he would need to be able to do this based on a distributed slice. I imagine if you could do a binary search over a sorted distributed slice, then the need to build a runtime structure would go away.

I myself have some experience with this sort of thing in C++ (again based on static constructors), where we associated each dynamic serialised type with a UUID and used Boost serialization (shudder).

1 Like

What I gather from your all's comments is the following:

  1. People seem to agree that global ctor based systems are a bad idea. I agree, let's at least initially not do that.
  2. Initially only doing things statically is probably fine. The only thing I'm strongly in favor of is not to restrict ourselves so much that we can never change it to work dynamically
  3. If you want to support some kind of FFI (interact with c) you can always link items together like a linked list, not an array and then you don't need allocation
  4. getting the location of an item of the distributed slice when you insert it is interesting, but I don't think this is really something you'll want in practice. I've seen someone write code like this:
#[global_static]
pub static LOG_MESSAGES: &[&str] = &[];

const MESSAGE_ID: usize = global_register!(
    struct_logger::LOG_MESSAGES,
    "my log message",
);

but instead, you can just write this:

#[global_static]
pub static LOG_MESSAGES: &[&str] = &[];


static MSG: &str = "my log message";
global_register!(
    struct_logger::LOG_MESSAGES,
    MSG,
);

And now MSG is your local handle to this string. Maybe I'm not seeing something, but getting an index doesn't seem so useful to me.

  1. I like using something like -Z layout-seed sounds like a great idea. We can't really ensure the ordering. I imagine it's kind of hard to evaluate the ordering at compile time (not having traits and all), and we can't really cause we don't know how many elements might be added in the future. Semi-randomizing it seems like a way better idea.
  2. I agree, finding usecases for this is what we want to do, to figure out what people actually need. I've listed quite a few usecases already, and I guess we can figure out whether a static compiler-generated version of this works for each of those. Whether we want to expose things like len is something that's easy to add afterwards.
  3. About whether this can be used inside a crate or also outside: I think that is solved by visibility. You can do both. If the slice is pub(crate) you can't use it inter-crate and if it's pub you can.

Is there anyone's comment I haven't replied to with this? Let me know, I'm happy to do so.

3 Likes