[Pre-RFC] expose llvm's stack intrinsics


#1

#Proposal Expose stacksave and stackrestore and create a custom intrinsic stackalloc that directly forwards to an alloca. Implementation and unit test at https://github.com/rust-lang/rust/pull/26059 #Motivation I would like to be able to test out storing arrays with a size only known at runtime (VLA) on the stack. Also I want to test out owned Unsized types on the stack (a StackBox<T: ?Sized> that can’t be moved). This could be done directly in the compiler by allowing such types in normal let statements, but I believe there are too many open questions to even write this out as an rfc. I want to be able to try this out in a library and play around with it to have evidence to support a more complex proposal in the future. #Alternatives

Compiler-support for stack allocated objects

Doing the stacksave -> alloca -> stackrestore inside the compiler directly instead of going through library types. This has a few advantages like the possibility to move such an allocation outside of a loop and probably many others.

use a global arena instead

Can be done in a lib. Basically a second stack for DST. We can invent entire new optimizations for this.

Does this have any disadvantages?

#Drawbacks

  • Modifying trans and typeck without using the changes in rustc
  • stack analysis becomes hard

#2

Would this go toward allowing DST variables?

Because if it did, that would be four different kinds of awesome (I checked).


#3

It would allow studying the effects of DST variables without adding too much pain in the compiler. I don’t think it’s the way it should be in the end, since the compiler must be able to reason about DST types. But we’ll see what the future brings.


#4

What are the benefits of this feature? Are there any actual performance or otherwise improvements that have been measured? I’ve always been against dynamic stack usage because it destroys all hope at easy stack analysis if it’s used.


#5

That’s what I want to evaluate. I could always fork rustc but it’ll be hard to show my implementations to others if they have to build a custom rustc branch or need to run an untrusted executable that i upload.

Also this is necessary for easily writing real time applications written in rust, since you can’t compute a wcet for a heap allocation. A stack allocation is always exactly one pointer addition or an abort.


#6

If you’re accepting stack overflow, you might as well just have a global arena that you use instead of actually using the stack.


#7

Edited first post to reflect your idea. If noone finds a disadvantage of this, i’ll create a crate for a DST-Stack and add a closed note to the topic


#8

Disadvantages of the alternative:

  • slower - you need to do a TLS lookup rather than just subtracting the stack pointer - plus quite a bit more work if you want to be able to resize the arena rather than having a fixed limit like the stack

  • adds a fair bit of complexity in embedded/low-level environments where, e.g., TLS might not even work, unused space in the arena is likely to always take up real RAM, allocation (i.e. to resize the arena) can’t be done anywhere because it could fail, and code size may be important

On the flipside, as @cmr mentioned, allocating dynamically sized objects on the stack is usually a bad idea anyway (especially in embedded where stacks are typically smaller and often unguarded), so maybe it doesn’t matter.


#9

As an additional downside, you now have another region of memory to move in/out of cache instead of using the stack which is already very likely to be in cache (at least if you’re accessing memory near the stack pointer)


#10

Yea, I’d stay with fixed size, otherwise there’s no gain over the heap, it’s basically a second stack.

Not really sure what you mean. The only difference between Stack-pointer and DST-Stack-Pointer would be that one has a dedicated register on many CPUs. I don’t see how TLS plays into this once you have the pointer.

While true, the second stack’s top will also very likely be in cache if you use it often…