I am always curious how Once works. I could imagine the following pseudo code:
If !inited
Init()
Return
Really nothing special here. But since rust make this part of its std lib, I am wondering if they do some magic to optimize the init check away after the first init()?
I mean, modify the function pointer to point to return directly after the first init run? Thus the following calls will not check the init state again? Or even better, eliminate the call altogether?
Of course the runtime modification is dangerous, but should be safe to do for rust low level lib?
No, it's not about optimization, but about thread safety. That's why it's in std::sync. If you didn't need thread safety, it would indeed be (almost) as easy as your pseudo-code.
The thing that Once promises: If multiple threads try to init at the same time, only (exactly) one of them should actually do it and the others should wait until the initialization has completed.
Well okay, perhaps in a sense it's about optimization, too, but nothing like the suggestions you've made. If you consider thread-safety, then you could still follow the pseudo-code you've suggested, if a Mutex is incorporated into the process. But that's adding a lot more overhead to the init check, as soon as the value is already initialized. So compared to a naive mutex-based implementation, it's a question of optimization; when initialization had already happened, later initialization checks shouldn't involve more than a single read from an atomic flag.
Thread safety or not, it is not the issue here. I am referring to the stage after initialization is done.
After it is done, we are still checking if it is inited, but we know if it is inited already.
I am just saying init once, check million times, is kind of waste of computing power.
Your suggestions included ideas like "modify the function pointer", but the (fast path, successful) initialization checking code is generally inlined. There is no function pointer at run time, reading a function pointer and calling it would be more expensive than checking a simple status flag value.
"Eliminate the call altogether", not sure how that should work. Maybe in a dynamic runtime system with JIT compilation, one could re-write the assembly so that initialization checks for global singleton values that will never become de-initialized again can be eliminated. But not all usages of Once involve global singleton values, and Rust isn't doing a dynamic runtime with JIT compilation, anyway.
Good to know. This makes sense.
I was just wondering that this is too much waste no matter how fast it is. It is really should be zero cost!
In real life, after I furnished my apartment, I don’t want to check whether I have a bed in my bedroom or not. Everytime I walk into my bedroom, I would close my eyes, and fall on the bed.
This is exactly what I am talking about.
Based on what you said, the dynamic languages could be even more efficient since it can dynamically modify itself.
I am just saying if Rust can do something like that, of course, in its own low level highly optimized codebase.
On a side note, I am wondering the following scenario:
I have a async app, in my Main.cs, I can initialize all my readonly static variables, then all my futures, threads can start and use static variables freely/safely.
It seems Rust does not support this scenario at all? If my app is async, I HAVE to assume all static variables lives under a multiple threads environment, and must use LazyLock to init it?
But I think in case of Once that optimization makes no sense, because data and flags would probably end up in the same cache line / page, so you will touch them even if not reading the flag.
You can do absolutely anything, just write your own unsafe abstractions. For that kind of things Introduction - The Rustonomicon is a compulsory reading.
Also, I advise to optimize only after measuring real effect - premature optimization is the root of all evil. You can easily do that stuff with unsafe, so just compare it to Once* version. Then, if there is a performance benefit, you may find yourself thinking of an optimization.
By the way, why do you need to rely on statics? Why not pass &'static T to your futures? static_cell - Rust allows to to once get &'static mut T, then you can initialize it, convert to multiple shared references and share between futures
I don't see how can it be more expensive. &'static T is a pointer. Static variable is also accessed via pointer. The only overhead is passing that pointer around, instead of linker compiling. So it is basically zero.
Again, please do not make premature optimizations, they consume your and other's time and code clarity without any performance benefit.
Indirection through a function pointer is not free either. In fact, in most scenarios it's overwhelmingly likely to be vastly less efficient than a simple, 100% predicted, extremely local branch. Both because it precludes inlining (which is an incredibly important optimization) and because function pointers are terrible from the perspective of a modern (ie. this millennium) CPU.
Instead of rewriting assembly one can do page fault shenanigans to initialize a value once without using atomics. But this requires using userfaultfd (linux-specific), making the initialization code async-signal-safe, a helper thread or some other complications.
Rust assumes that statics will be accessed by multiple threads concurrently. So what you described is not supported with statics. Rust is pessimistic because it is designed to support multithreading safely in all scenarios and it cannot not trust the programmer to avoid accessing a static variable at the wrong time.
If the cost of the OnceLock "is_initialized" check is unacceptable (however small), a practical compromise is to use static variables with OnceLock to share read-only data that can be initialized up front. But instead of requiring a check every time the variables are accessed, long running tasks/threads can obtain a reference from the static OnceLock variable when they start or at opportune times, and then reuse the reference freely without "is_initialized" checks from that point onward. The amortized cost of the check would then approach zero. This same approach can be used to share using Arc such that the reference count is only incremented once per task/thread, or at least a very small number of times relative to other work performed.
But you really should think carefully about the true cost of a single atomic load before worrying about trying to optimize for this.