Did Rust ever consider cactus stacks?

At that point it is already too late to switch the stack. In addition it is way to heavy. In addition with a true segmented stack it is rather common to exhaust a segment.

I understand what you're saying, and agree that if it requires a massive runtime or OS to actually work, then it's inappropriate for Rust. What I'm not yet convinced of is that a runtime/OS is required for this to work. I will fully admit that I'm at the very edge of my knowledge though, so it may not be possible to do without a runtime or OS, but I'm not yet convinced that it is.

Why is it too late to switch the stack at that point?

Also, do you have any performance numbers to point to for this being too heavy? I'm genuinely curious (and now I really, really want the stack equivalent of GlobalAlloc so that I can run experiments!)

Because there may already be existing references to the current stack frame, which means you can't move it. But if the stack is too small, you would have to move the current stack frame to the new stack segment. These two requirements conflict.

But... isn't part of the point of cactus stacks to solve that issue? The current stack frame won't be moved, you jump to a new segment that is large enough to handle the allocation request. Or do you mean that you don't know how much stack space you need a-priori (you don't know how large your frame is going to be), a different thread/task allocates the next page, and then you can't grow the current frame?

I'm sorry if I'm being a little dense, I'm really trying to find the precise problem to fully understand what the limitations are. Once we know they completely we can look for solutions.

I was talking about the case where SIGSEGV is used to grow the stack. A SIGSEGV wouldn't happen immediately when allocating the stack frame (at which point switching stack segments is fine), but only once it is actually used (at which point switching stack segments is not fine)

If the first thing you write to the stack frame right after you reserve it is at the appropriate end of the frame, it will generate a page fault (which does not necessarily result in SIGSEGV). At this point you can allocate the frame elsewhere instead and write the same thing there.

To answer the original question, I remember one mention of cactus stacks being considered:

https://www.mail-archive.com/rust-dev@mozilla.org/msg04361.html

But as I wasn't privy to those "discussions going on in the background", presumably only members of the core team from around that time could say more about the details.

2 Likes

Thank you for the link @glaebhoerl. It looks like segmented stacks were also discussed and eventually abandoned.

OK, slightly switching topics here, is it possible for Rust to support the stack equivalent of GlobalAlloc? Or are stacks so inherently low-level that the idea is impossible? Basically, I want to see if it's possible to experiment with other stack strategies in a manner that the compiler can use. Even though there was a decision to abandon other stack strategies in the past, I wonder if newer chips could benefit from revisiting this. The only way to know for certain is to write up a new strategy and then test its performance against the original strategy, which is why I ask.

I think the core problem is not that alternative stack strategies aren't fast enough, it's that using a strategy other than the traditional one has unwanted implications for other parts of Rust.

For example, using a guard page to trigger a new stack segment to be created in memory is "fine" for pure Rust code that understand how to grow the stack, but what happens when you use FFI? The C (or other) code on the other side of your FFI call does not understand Rust's (hypothetical) stack model and expects to be given a normal sized stack to operate on, not a smaller but growable stack. As a result, it will probably run out of stack, triggering the page fault which will most likely need to abort the program as unwinding across FFI calls is generally unsound.

That's obviously very undesirable so the natural conclusion is to grow the stack to a "large enough" size prior to making any FFI calls. But this means that FFI necessarily triggers memory allocation!

Part of the appeal of Rust is that it's just native code. There's no VM, no JIT, no GC, no interpreter. rustc can just spit out plain old .o files that you link with the rest of your codebase. You have all the same OS level tools like debuggers and profilers that you do in C and C++. The magic is at compile time; there's no magic at run time.

Using something like cactus stacks adds magic and complexity to the runtime. How do we teach platform tools to understand stack walking Rust code? Does cross-lang Rust/C++ LTO still work? Can you still incrementally adopt Rust if there is non-trivial work happening at every FFI point?

6 Likes

Ah, C, my old friend and nemesis. Would #[repr(Interoperable_2024)] be of help here, assuming it met its goals and all of the foreign code used it?

I think I need to try to talk my boss into making #[repr(Interoperable_2024)] part of my work objectives...

That "and" is doing a lot of work :laughing:

I mean yeah, if "all foreign code" is using an ABI we've devised that allows this kind of thing, then sure there's no need for C ABI based FFI. But that's extremely unlikely to happen and not really rationale that makes sense to use when debating changing Rust's entire stack model.

Is this just a thought experiment? Ie, could Rust support a target where some kind of segmented stacks are the normal platform behavior? It seems plausible to me that would be possible.

Or are there specific problems in Rust today that you're trying to solve?

I know... :sob: But someone has to do it, the C ABI is no longer useful for modern purposes, and it looks like I'm that someone. The trick is that (just like every other functioning human being on this planet) I have other work that I have to get done in addition to this, which is why I'm not driving it forwards as fast as possible.

Part thought experiment and part wish to solve some of the irritating problems we seem to be left with from settling on the C ABI as our be-all, end-all ABI. I'll put it into the list of things that we'd like to have in a new ABI, and see if I can carve out some time from work to push on it some more.

I wish I could get paid to do this work, I never have enough time to work on Rust stuff properly...

1 Like

I think without a set of well justified problems that this (or a similar feature) would solve, this seems like the wrong approach to take for an ABI designed for language interoperability.

Every ABI feature is going to cost you in terms of time spent debating and designing it with whatever implementors you have on board, engineering resources spent building and supporting the feature * num_implementors and potentially also with runtime performance because what's specified isn't the optimal representation for some language (or it used to be but we figured out a better way and now the legacy one is enshrined forever).

The higher the cost, in all senses, of the ABI, the fewer implementations you're going to have and the less useful it will be.

I'm not even sure this is an ABI thing as the compiler needs to be aware of this when compiling the entire program, not just annotated FFI functions. I guess the compiler could insert code to grow the stack when go from your ABI to regular ABI functions but then you're right back to the "FFI code mandates allocation" problem.

part wish to solve some of the irritating problems we seem to be left with from settling on the C ABI as our be-all, end-all ABI.

Can you elaborate on these?

Note that the OS already uses page faults to lazy commit pages, including reserved but uncommitted stack (at least on Windows: Linux uses terms differently here). I don't think this would be that expensive with basic hysteresis.

Well, stack behavior is on the LLVM side of the rustc-LLVM divide (not counting async "stacks", which are implemented in rustc). So if you implement some custom stack behavior in LLVM, it should be straightforward to get it to work in Clang as well; Rust/C++ LTO will still work, and even without LTO, you'll get fast FFI between Rust and any C/C++ code you compile yourself using your patched Clang. That said, it will still be incompatible with any code using the standard ABI. And the alternate Rust backends (Cranelift, GCC) would need their own implementations if you want to use them.

LLVM's segmented stack support, which powered Rust's old use of segmented stacks, does still exist, and it wouldn't be hard to hook it back up to rustc. It's pretty simple; it mostly just adds code to the beginning of every function to check the stack pointer against a limit and call a runtime function __morestack if it needs more space. __morestack has a 'canonical' implementation in libgcc, on platforms that use libgcc, since this stuff was originally written for the GCC Go frontend. But you can define your own __morestack symbol which takes priority over the libgcc one.

This scheme is... somewhat flexible. An implementation of __morestack can choose how much space to allocate, what allocator to use, how to cache allocations, etc. But of course there are limits. For example, you can't relocate already-allocated stack frames (though that's partly ruled out by Rust's semantics already). And the explicit check adds overhead; if you want to experiment with using fault handlers instead, it would require changes to LLVM.

You're 100% correct on all your points about this; what I meant to say was that I plan on proposing the idea of cactus stacks to that group. It will be debated, and if it turns out that no-one likes it, it will go away. I'm well aware that I'm not the emperor of the universe, and I'm sorry that I implied that I was! :wink:

It's actually a bit of both a compiler and an ABI problem. The compiler is the one doing the growing of the stack, but if you want FFI to work, every compiler and platform is going to need to be aware of it. Either that or you'll need to make all foreign interface calls more akin to RPC calls, and run the foreign code in another process entirely. And while that will work, it will definitely be much slower than anyone would like. Better to either have everyone agree on a common ABI that allows stacks to grow such that different languages interweave their stacks together in the same address space, or just avoid doing so entirely (or only use cactus stacks when you can guarantee that you only use code compiled by rustc).

While I see your argument, I feel it is treating the stack as a special case. By the same argument, should Rust on Linux only allow you to allocate memory in page size blocks, as that is all the OS gives out? It seems we decided providing an allocator in std (not core) is something every program will want.

Rust has support for custom allocators and no-std, so that you can allocate pages directly from the OS if you wish. When you choose to use a global allocator, it defaults to libc's, because that's the common interface provided by most OSes. Even abstractions built on top of that like String are pretty thin and straightforward (it's just an exponentially-growing contiguous buffer, not a rope or refcounted object).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.