Global Registration (a kind of pre-rfc)

The idea is to have log entries like { msg_id: 42, timestamp: 12345678 } instead of { msg: "my big big log message repeated constantly", timestamp: 12345678 }. IIUC you propose to use string slice pointer for msg_id. It would work, but most applications probably can use 16 bit IDs, while pointer-based IDs would require 64-bit IDs on 64 bit CPUs.

I mean, the index is supposed to be a usize anyway (if we even have it), that's what makes sense. usizes are pointer sized, so why not just use a reference

I'd still like to be able to have something sorted in the binary, but I guess without conste traits that can run very late in code gen it would be difficult. And there is also the parallel backend when codegen units isn't 1.

It would open up a lot of interesting possibilities if it could be done.

Now a question that rises is why @dtolnay did not use linkme for typetag but rather a actor solution. Is the ctor solution more powerful inherently? What are we missing out on by forgoing it?

usize can be shortened to uN with compile-time asserts to protect against potential overflow (target type could be selected by tweaking logging crate features), which should be possible to do because ID evaluation is done at compile time.

Actually, not strictly. It's only per-crate because #[test] implies #[cfg(test)] and the test items are only registered in the test binary crate. It's only ever used from a single crate in a compilation tree.

The implementation may only work in a single crate, but it would behave exactly the same were it to be substituted with an implementation that worked across multiple crates.

I think it's better to spell this as

#[global_static]
pub static LOG_MESSAGES: &[&str];

as then it's more obvious the value isn't &[] but instead determined in some other way.

Also, if you don't want to require the static implementation, it's probably better to not make it look like it has a slice type. Annoying as it might be, it's probably better to spell it more like const LOG_MESSAGES: impl IntoIterator<Item=&str>. Although both thread_local! and async are sugar that wrap the apparent output type, so it isn't a given.

Does this make MSG the actual item in the slice, or is the item in the slice a different item (with a different address) that MSG? If the former, then it should be an attribute on the item, imo, because it needs to prohibit using custom linker attributes on the item which could mess with item placement.

Theoretically nothing, because you can stick your "constructor" functions in a slice and then invoke them yourself. I think typetag uses ctor for two main reasons: using ctor or linkme have slightly different platform support and behavior, and ctor came before linkme.

Also, I think ctor has better support for dylibs, since ctors in a dylib get run when the dylib is loaded.


I think the maximally functional implementation would be a statically collected slice per crate and a statically collected slice of those slices emitted with the bundled staticlib or binary, with dylib initialization functions atomically linking in their slice-of-slice-of-items.

5 Likes

Hi! I agree with you, the syntax I gave is not quite right. I've been very careful not to use any syntax sofar actually, to limit the bikeshedding. That was something I copied from another user to demonstrate. However, I agree that your alternatives make more sense.

MSG is a reference to a str, so I'd say the reference becomes an actual item in the slice here and the str itself is local here. But I don't think that matters a lot in the end, as long as you can iterate over them all. In the end, the static gets some address, and we can use that address and that address also exists in the distributed slice. If you really want to, global_register could return an address, so it's really only stored in the slice. But imo an address is still better than an index (since indices are hard when considering any linked-like model).

Finally, those are my thoughts as well, that the most flexible and functional is a static slice, possibly linked to other static slices of other dylibs when necessary

The proposed feature deviates significantly from established principles and violates a fundamental invariant of Rust. Therefore, it is not advisable to proceed with this approach.

Interestingly, this proposal appears to be unnecessarily complex and stems from an attempt to over-integrate the feature into Rustc and Cargo.

The primary reason for requiring "life before main" is to prevent end users from accessing "main" within a framework.

An alternative solution is to develop a testing library with appropriate APIs, allowing developers to create their own test runner executable that loads the desired tests.

I don't see #[bench] mentioned, but I believe custom benchmark frameworks could also make use of this kind of registration -- e.g. divan currently uses pre-main hacks too.

What fundamental invariant?

No life before main.

It's fine for me to call this in main. But being able to iterate over types, defined in different crates, would enable a lot.

If this is not called in main explicitly, is it a zero cost abstraction? Would users pay startup time for this, without using it?

Perhaps this feature can run entirely at compile time, with no runtime component to run before main.

2 Likes

This wouldn't cover any of the other usecases mentioned in the first post.

1 Like

If a register is just a linked list of entries(with each entry being in static memory), dynamic linking would be as simple as setting up a constructor to join the two linked list. No memory allocation is required.

3 Likes

Compile time is, by definition, the runtime of the compiler. The distinction you make is pretty meaningless in this context. I run the compiler on my dev machine to produce the runtime program. Likewise, I execute a test runner in my dev machine to exercise the code. As long as that is possible, i gain nothing from executing the test code as part of the compiler process instrad of a separate test runner process.

See above. There is no such thing as a magical life before main. That cost happens regardless of how you call it.

There are really only two options: Either the cost is explicit - the programmer calls initialisation code explicitly in main - or the compiler adds that initialisation code implicitly "before main" or, in other words, on process start-up. That latter approach is against the spirit of Rust and is fraught with peril due to concerns regarding ordering.

The only argument for this feature is "but I don't have access to main". To which the better design approach is, we should give the programmer the ability to define main themselves and have control of their dependencies and the order of initialisation.

That's true! That would work. I wouldn't want to use a constructor for that, but yes, you could link the lists together.

2 Likes

IIRC, that’s similar to how you’d install a TSR or other interrupt routine on old DOS systems: You read the current value and store it somewhere you control to use as a continuation and then overwrite the global with the address of your routine. If you don’t handle the event, it falls through to the previous registrant.

In this case, you’d have something like a static Option<&’static Registry<T>> that starts as None, and Registry<T> would have a corresponding link field for the next crate’s registry.

That sounds straightforward enough that you might be able to convince a modern linker to do it during fixup, but I don’t have any real experience in this area.

2 Likes

I don't know how feasible it is, but an addendum to this is that it could simply be a compile error to not call the initialization code explicitly in main. Sort of an "unused variables" for these particular global variables.

1 Like

I do NOT agree. For me, this feature is about enabling typetag-like use cases, not about calling something implicit or explicit, before or after main. Today, Typetag is a hack which does not work on wasm. I want to be able to iterate over types, because this enables a few use cases. In other languages this is enabled by reflections, but in Rust one has to compose several workarounds. If one has to call something explicit in main to activate this, fine for me.

1 Like

It seems that there has been a misunderstanding regarding my previous response.

What you are seeking is the addition of compile-time introspection to the language. That's a separate topic altogether.
This feature should be incorporated into the language itself, and I believe there was a recent announcement concerning a working group or GSoC project that will explore this possibility.

Proposing this as a collection of workarounds, particularly as a special case of a "life before main" calculation, is not a well-advised long-term vision for Rust.

To reiterate, my previous response addressed your performance query, and as I mentioned, there is no cost-free solution for this. Either a large amount of code is executed during program startup (in which case, this should be made explicit in congruence with Rust's design principles), or if a proper built-in solution is implemented, the compiler would precompute various data structures containing metadata, resulting in an increase in the binary size. Therefore, introspection cannot be achieved without incurring some cost, which is why C++ provides the option to disable its RTTI (Run-Time Type Information), a common practice among its users.

Consequently, if introspection is integrated into the language, its design must incorporate explicit control mechanisms that align with Rust's core principles.

1 Like