@parched Thanks for elaborating
Imagine the case where 0xdeadbeef is the memory mapped flag a DMA uses to
signal it has finished writing some memory and do_something reads that memory.
This is a very interesting example. The way I have been thinking of modeling DMA
is that it would take ownership of the memory (&mut [u8]
) is writing to and
won't return it until after the transmission is done (e.g. read_volatile(®) & 1 == 1
). (This happens to map nicely to futures (impl Future<Item=&mut [u8]>
)). More importantly, written this way the compiler would know that both
memory addresses are related and thus, I think, would not run into the problem
you mentioned. I would deem an API like this as safe.
In the way you have written the DMA transfer, the compiler has no way to know
that both 0xdeadbeef
and the memory accessed in do_something
are related and
thus misoptimizes the code. An atomic AcquireLoad operation is one (expensive
and not available on Cortex-M0 chips) way to solve this misoptimization. In any
case, I would deem such API approach as unsafe
as the ownership over the
memory is not (correctly) specified at all.
It couldn't eliminate one of them because delay_ms should contain a barrier
If delay_ms
use a while loop of read_volatile
calls and as
get_button_*_states
are written then it can move/merge/eliminate-one-of them
though, as you mentioned in your bullet number 2 ("if If read_xxx is a volatile
load ...").
But, I suppose, that you don't want get_button_*_states
to be implemented as
e.g. *GPIO_PORT_A & 0x1 != 0
but rather you want to mark this memory access
as (1) or (2). Which one would be in this case? Because the definitions still
read the same to me: "don't change number" and "don't merge" seems to overlap in
particular.
Imagine the 2 get button functions are part of some generic API you are
implementing where it's possible that they are on different PORTs/word
addresses. If the compiler can see they happen to be on the same word then it
would be good if it could eliminate one of the loads.
My impression is that this optimization would only makes sense to make on very
few cases where it doesn't actually change the behavior of the program. I agree
that it wouldn't be possible at all with read_volatile
.
My gut tells me that having two attributes to achieve these rare (from my POV)
optimizations would be too complex for the average library author to get right
and may actually cause more misoptimization problems if the library author gets
them wrong. There's also the question of how much code size savings this brings
to the table.
Perhaps the rules for these attributes are actually simple and allow some other
nice optimizations; I don't know but I'd be happy to hear about that.
BTW, does LLVM has IR attributes for (1) and (2)? Because we are constrained to
what LLVM offers to do optimizations. Unless, this is doable in MIR and doesn't
involve tons of work to implement.