Pointers Are Complicated II, or: We need better language specs

flatffinger · January 10, 2021, 8:00pm

The actual problems with a Miri for C have more to do with nondeterminism. C has lots of unsequenced points in it, a sort of local parallelism in the semantics for no particularly good reason (it almost never translates to actual parallelism but it licenses some function reorderings by the compiler). To properly catch UB in this situation, you have to analyze all interleavings (and even the non-interleaving versions, where causality is a partial order, not a total order). This is worst case exponential, because UB can be caused in any of those interleavings.

Ah, I think perhaps the point of contention with the notion of loosely-defined behavior is that efficiently producing reliable diagnostics for erroneous behavior requires partitioning all constructs into those whose behavior is fully deterministic and those which are erroneous. I will certainly grant that in many cases it is more useful to have a program stick with fully deterministic defined behaviors, and thus be compatible with such diagnostic tools, than to have a program which selects in unspecified or non-deterministic fashion from among a range of acceptable behaviors, which might run faster but would be incompatible with some of those diagnostic tools. No single approach can be best for all applications, though. In some cases it may be easier for applications to prevent integer overflows from occurring, thus allowing them to be usefully trapped, while in others it may be useful to allow them to occur in situations where multiple ways of processing the code might yield answers that differ in acceptable ways.

It may often be useful to have a program either offer assurances that it is not producing erroneous results as a result of integer overflow, or else simply indicate that it cannot offer such assurances because some integer overflows have occurred, and it cannot prove that none of them could have caused results different from those produced by arithmetically-correct computations. Having the program specify the exact circumstances were the first integer overflow occurred, even if that would be in computations whose output would otherwise have been ignored, might be useful for figuring out what went wrong, but reducing the cost of overflow checking to the point that it could be enabled even in production code processing actual data would for many purposes be more useful than diagnostics to determine whether any integer overflows could occur when the program is fed particular test data.

flatffinger · January 10, 2021, 8:59pm

Do you have confidence however that this way of learning is not happening at the expense of others and not overstepping the boundaries of their good will?

I normally attempt to limit myself to fields where I have more expertise. I came here from a blog post which discussed Undefined Behavior in C and LLVM, and the issues would feel like they should be language agnostic, but the moderators seem to want discussions to be focused on Rust, which is way outside my comfort zone, but I recognize that Rust supports some constructs that should be more widely adopted in other languages. Among other things, from what I can tell "stacked borrow" in Rust works the way restrict should work in C.

My real interests lie in the embedded systems world, and if the effects of gcc, clang, and LLVM were limited to the high-performance-computing world, I'd perfectly happily ignore them. Unfortunately, the fact that such tools are freely distributable is allowing them to establish dominance in other fields like embedded and systems programming. There are many different abstraction models that languages or implementations could use that would all be suitable for many embedded or systems programming tasks, and compatible with much of the corpus of existing programs to accomplish those tasks, and yet still allow substantial optimization.

I've been doing embedded systems work in C for almost 30 years. If the proponents of clang and gcc were to make clear that their products are not intended to be suitable for embedded or systems use, I wouldn't care what they thought about the quality of embedded code. Since they're doing nothing to discourage their product's invasion into the embedded systems area, however, I bristle rather strongly at the fact that they regard whether they regarded most of the existing corpus of embedded code which was developed for commercial compilers as "broken", and thus deliberately refuse to process it in the same efficient and useful fashion as those commercial compilers would do. I'm sure I take this much too personally, but how should I feel about the situation?

CAD97 · January 10, 2021, 10:29pm

If you're writing nonportable code with a commercial compiler (as you specifically claim as your motivation with the C committee specifically allowing you to), what do you care what gcc or clang do; you've purposely decided to tell everyone you're writing nonportable code.

Additionally, at least clang/LLVM allow you to turn individual optimizations on and off. (I don't know about GCC.) That sounds like exactly what you want from a compiler, tbh: telling it what optimizations to use on a compilation unit. It'll take more work than using the default optimization profiles, ofc, but writing nonportable code (which you admit to doing) is a niche use case that necessarily will involve configuration for whatever target you're targeting.

notriddle · January 11, 2021, 12:19am

Mod hat on: Why wouldn’t we? This is the Rust Internals forum.

Mod hat off: But Rust doesn’t need any of that, because it doesn’t have ubiquitous runtime reflection, and its borrow checker ensures that there are no overlapping, mutable slices to worry about.

flatffinger · January 11, 2021, 4:52pm

If you're writing nonportable code with a commercial compiler (as you specifically claim as your motivation with the C committee specifically allowing you to), what do you care what gcc or clang do; you've purposely decided to tell everyone you're writing nonportable code.

A lot of such code will work with all general-purpose compilers when optimizations are disabled. While commercial compilers make a bona fide effort to support such "semi-portable" code efficiently, neither clang or gcc does so. The support forums make clear that the maintainers of clang and gcc view such code as "broken", so the decision not to support such code appears to be related to ideology, rather than to any meaningful difficulty supporting it, or meaningful loss of optimizations that would result from adding such support.

Additionally, at least clang/LLVM allow you to turn individual optimizations on and off. (I don't know about GCC.) That sounds like exactly what you want from a compiler, tbh: telling it what optimizations to use on a compilation unit. It'll take more work than using the default optimization profiles, ofc, but writing nonportable code (which you admit to doing) is a niche use case that necessarily will involve configuration for whatever target you're targeting.

Embedded systems are hardly a "niche" case, though if clang and gcc were to make clear that they are not designed to be suitable for such purposes I wouldn't care how they handle other things. Further, as noted, many of the problematic constructs are supported universally by general-purpose compilers with optimizations disabled, and by most commercial compilers even when configured to produce reasonably efficient code.

So far as I can tell, I don't know any setting other than -O0 which they will process reliably, nor any way of turning off some of the "optimizations" that are incompatible with the guarantees offered by many commercial compilers. For example, if x and y are two arrays of static duration, and a program compares some pointer p to x+n for some n, both clang and gcc will assume that there is no way that an operation involving p might affect y. While it would be rare for this corner case to matter, (1) the Standard explicitly defines behavior in the event that a "just past" pointer for one object is compared to the address of an object that happens to immediately follow it, and trustworthy compiler must refrain from making any optimizations which may alter the behavior of programs in situations which are documented by neither any standard nor compiler documentation, and (2) in some embedded systems, it may be useful to have symbols mark the beginning and end of a region of storage whose size is computed at link time (e.g. all space that isn't used by anything else). While a project involving such code would require linker configuration, it shouldn't require special syntax to work with any compiler that targets suitably-configured linker.

An implementation could legitimately make the aforementioned assumption either with respect to if statements or the pointer-to-integer-to-pointer casts used in the original blog posts, if (1) it ensures that no two objects whose address is either exported or observed will ever be placed consecutively in storage, and (2) it documents that code it produces must be linked only with output from compilers that always lay things out in such fashion.

If clang and gcc offered a "low-level programming mode" that would apply some basic peephole optimizations and register-cache objects whose address was never exported nor observed, but refrain from fancier optimizations, and recommended against using higher optimization settings with low-level code, I'd be thrilled. As it is, though, so far as I can tell, clang is incapable of producing decent quality code without enabling unsound "optimizations". GCC can sometimes produce decent code within functions even when using -O0, if given enough help, but has no way of avoiding the generation of function prologue/epilogue code even for leaf functions, nor avoiding redundant instructions for things like sign extension which a peephole optimizer could eliminate easily.

flatffinger · January 11, 2021, 5:49pm

Mod hat on: Why wouldn’t we? This is the Rust Internals forum.

Is it unreasonable for someone to expect that when blog post that talks about optimization in C and provides a discussion link, regardless of the general subject matter of the linked forum, that discussion of the applicable issues in C would be welcome at least within the particular linked forum post?

Mod hat off: But Rust doesn’t need any of that, because it doesn’t have ubiquitous runtime reflection, and its borrow checker ensures that there are no overlapping, mutable slices to worry about.

The Rust semantics aren't as excessively loose as those in C, but even if the only operation one can do with a "just past" pointer is compare it for equality to other pointers, that can still be enough to trip up optimization. Clang assumes (presumably because LLVM does) that if X and Y are known not to alias, and a pointer p of unknown provenance compares equal to a pointer which is derived from X, then no accesses made via p will interact with Y, ignoring the possibility that p might have been formed from y but coincidentally point "just past" x.

The important part is that Rust has more than one kind of pointer. References have strict aliasing requirements (but not Strict Aliasing Requirements in the C sense, because type punning is allowed even for them), while UnsafeCell and raw pointers do not.

Does LLVM recognize such distinctions, and recognize the formation and destruction of non-aliasing references from raw pointers as events which are sequenced relative to actions on the pointers which precede the formation of the references or follow their destruction? If LLVM can't recognize such distinction, splitting a program into separately-built sections which would allow stronger or weaker aliasing assumptions could yield better code than would be possible if the entire program had to be processed with one aliasing model.

mbrubeck · January 11, 2021, 5:53pm

Yeah, we may need to have a separate talk about blogs using this forum as a de-facto comment section. Regardless, we are asking one last time for people to focus discussion here narrowly on the specifics of that one blog post or on related issues pertaining to the design and implementation of Rust. If you have an axe to grind against C compiler implementers, this is not the place to grind it, sorry.

This is the last time we're going to ask this; future off-topic posts will be removed and/or the thread will be closed.

flatffinger · January 11, 2021, 9:10pm

Regardless, we are asking one last time for people to focus discussion here narrowly on the specifics of that one blog post or on related issues pertaining to the design and implementation of Rust.

Fair enough. I do think some of the issues in the blog post could be of some relevance to many languages including Rust, such as a question of how to design a language spec so that if a program's behavior would be defined in a manner meeting requirements, applying any legitimate transforms would yield a program whose behavior would be likewise defined as meeting requirements.

I think the discussion got off on a bad foot because I hadn't recognized that RalfJung places a much higher value than I do on being able to have tools prove that the behavior of a program with a particular input data set is fully defined by the language spec--a task which would be greatly complicated by having a language spec allow some aspects of program behavior to be non-deterministic at the same time as some other aspects remain defined. On the flip side, I think I place a much higher value in allowing programmers to avoid specifying program behavior more precisely than necessary to meet the programmer's requirements, even when doing so might impede the use of some automated verification tools. I was regarding as self-evident the notion that a good standard should not require that programmers write sub-optimal code in order to ensure compatibility with tools one may or may not need, while I think RalfJung was regarding as self-evident the fact that a good standard should not regard a program as correct unless it would be possible for an automated tool to prove that the standard unambiguously defines its behavior, at least when given a particular piece of test data.

Instead of trying to write language rules in such a fashion as to satisfy both objectives simultaneously (a "flying submarine"), I think it would be better to recognize a family of related dialects that can each satisfy whatever objectives best fit what needs to be done.

camelid · January 13, 2021, 12:02am

Great blog post! I enjoyed reading it

The one thing that confused me is that towards the middle you say:

However, LLVM IR (just like C) does not permit memory accesses through one-past-the-end pointers.

But then at the end you say:

This information needs to be extensive enough such that a hypothetical interpreter can use it to detect all the UB. Doing so makes it obvious that a pointer has provenance, since otherwise it is impossible to correctly check for out-of-bounds pointer arithmetic while also permitting one-past-the-end pointers.

But I thought you said that LLVM doesn't permit one-past-the-end pointers (which I assume can also be generalized to n-past-the-end pointers)? Can you explain more what you mean?

mbrubeck · January 13, 2021, 12:09am

You can create a one-past-the-end pointer, and you can use it to perform address comparisons with other pointers, but you cannot access the memory it points to.

camelid · January 13, 2021, 12:29am

I see, so it is legal to create one-past-the-end pointers and perform some operations on them but it is illegal to dereference them.

digama0 · January 13, 2021, 9:19am

Also, it doesn't generalize. You can't even create pointers that are further displaced than one-past-the-end (in C/LLVM, and in Rust if you use the offset() / wrapping_offset() functions).

steffahn · January 13, 2021, 10:47am

wrapping_offset is different, with fewer requirements and not unsafe. See its nightly docs that recently changed after this issue_{gotta link my own issues ^^}

flatffinger · January 13, 2021, 6:23pm

You can create a one-past-the-end pointer, and you can use it to perform address comparisons with other pointers, but you cannot access the memory it points to.

With regard to the language that clang (and thus LLVM) actually seems to process, a a one-past pointer may be compared for equality with a pointer that does not identify the start of a top-level object, and a pointer that identifies an object may be compared for equality with another pointer that identifies an object, but equality comparisons between a just-past pointer and a pointer that identifies the following object invoke UB. Situations where that would cause problems with real-world programs are rare except when doing certain kinds of embedded or systems programming, and a diagnostic implementation which uses "fat" pointers could reliably identify such situations and trap on them in code that doesn't use integer-to-pointer casts.

If language rules were to specify that only pointers to objects (as opposed to just-past pointers) may be converted to integers, that would avoid ambiguities such as those observed in RalfJung's blog post, since one of the pointer-to-integer conversions in his code example would invoke UB. Such a rule wouldn't cause problems for most real-world programs, and implementations that use fat pointers could trap when it is violated. As it is, however, clang (and I presume LLVM) are prone to treat code which casts two pointers to integers and compares the integers as though it was comparing the pointers in question, which may cause it to treat such comparisons as invoking UB if one of the pointers happens to be a just-past pointer while the other is a pointer to an immediately-following top-level object.

notriddle · January 13, 2021, 6:52pm

Maybe, but Rust can't do that. It already allows you to cast raw pointers to integers without being in an unsafe block, even if those pointers are one-past-the-end. If LLVM adopts such a rule, then the rustc frontend will have to find a workaround.

flatffinger · January 13, 2021, 11:56pm

What would an LLVM-based Rust implementation do if given code equivalent to the following:

#include <stdint.h>
extern int x[],y[];
int test(int *p)
{
    uintptr_t px1 = (uintptr_t)(x+1);
    uintptr_t pp = (uintptr_t)p;
    int flag = (px1 == pp);
    y[0] = 1;
    if (flag)
        *p = 2;
    return y[0];
}

The generated code for clang 11.0.0 is:

test:         # @test
    mov     eax, offset x+4
    mov     dword ptr [rip + y], 1
    cmp     rdi, rax
    je      .LBB0_1
    mov     eax, 1
    ret
.LBB0_1:
    mov     dword ptr [rip + x+4], 2
    mov     eax, 1
    ret

From what I understand, clang defers to LLVM for ,most optimizations, which would suggest that LLVM is responsible for inferring that flag can only be 1 in cases where p points just past the first element of x, and concluding that there are no defined circumstances where p could also identify y[0]. The only way I could see that as being justifiable would be if either the conversion of a just-past pointer to a uintptr_t doesn't yield a valid uintptr_t value, or if an equality comparison between two valid uintptr_t values could result in UB. Note that unlike in ralfjung's original example, the pointer used in the expression *p = 2; has no semantic association with the pointer formed by indexing past the first element of x.

Numerous versions of clang have been released since a bug report was filed on this issue. I thus see no reason to believe that the maintainers of clang and LLVM don't regard this behavior as a correct optimization of the language they're seeking to process. Perhaps LLVM code compiled from equivalent Rust code would express things slightly differently in a way that does not yield that "optimization", but if so it would be interesting to know what difference in the LLVM code would cause the difference in behavior.

digama0 · January 14, 2021, 1:08am

Here's a Rust version of your example good enough to reliably miscompile on godbolt and play, including under Miri.

pub fn test(diff: isize, f: &dyn Fn(*mut u8) -> *mut u8) -> u8 {
    let mut x = [1u8];
    let mut y = [1u8];
    let p = f(&mut y[0]);
    let px1 = (&mut x[0] as *mut u8).wrapping_offset(diff) as usize;
    let pp = &mut y[0] as *mut u8 as usize;
    let flag = px1 == pp;
    y[0] = 1;
    if flag {
        unsafe { *p = 2 }
    }
    return y[0]
}

pub fn main() {
    let x = [1u8];
    let y = [1u8];
    let diff = y.as_ptr() as isize - x.as_ptr() as isize;
    println!("{}", test(diff, &|y| y));
}

Miri knows what's up here though; it will give a UB error at the write to p, saying that it does not have permission to write where it did, namely on top of y, because the mutable borrow of y used to create p has expired due to the first write y[0] = 1. Rust's aliasing rules are strong enough to eliminate examples of this form.

However, this isn't really the operative element of your example. The core of the issue is that clang thinks that the flag is false, so you can put anything in the if statement and clang will just optimize it away - and this is not licensed by Rust's operational model. Here's a simpler example (godbolt, play) that will run fine with optimizations off, and should execute correctly according to Rust rules, but will be miscompiled to an infinite loop with LLVM:

pub fn test(diff: isize) -> bool {
    let mut x = [1u8];
    let mut y = [1u8];
    let px1 = (&mut x[0] as *mut u8).wrapping_offset(diff) as usize;
    let pp = &mut y[0] as *mut u8 as usize;
    return px1 == pp;
}

pub fn main() {
    let x = [1u8];
    let y = [1u8];
    let diff = y.as_ptr() as isize - x.as_ptr() as isize;
    if !test(diff) { loop {} }
}

Aaron1011 · January 14, 2021, 2:10am

I don't think that's a miscompilation. It looks like you're assuming that main's x and y are adjacent, and that test's x and y are adjacent. However, I don't think anything guarantees that - test(diff) can legally return true or false depending on the memory layout chosen by the implementation.

digama0 · January 14, 2021, 2:16am

I'm assuming that for the purpose of getting a reliable miscompilation, but it would also be valid to "just happen" to guess a constant that matches the actual distance between the allocations. (In fact, it seems that Miri uses some randomized padding in its stack layouts, so the example still isn't reliable enough.) I think clang is doing mathematics along the lines (int)x + n != (int)y if x and y are in different allocations and n is a constant, but this is obviously an unsound reasoning principle; one could just iterate over candidate n values to find the right one, because x and y must be at fixed locations.

Aaron1011 · January 14, 2021, 3:15am

I don't think it's a miscompilation at all, though - I think the actual stack allocation is such that test(diff) == true (maybe due to a combination of inlining and re-ordering). Have you found a case where test(diff) == true leads to a contradiction?

Topic		Replies	Views
Formal specs and optimizations in general language design	89	5880	January 8, 2021
Int2ptr and runtime provenance models Unsafe Code Guidelines	24	2980	February 12, 2022
Types as Contracts Unsafe Code Guidelines	118	12432	March 25, 2019
Pointers Are Complicated, or: What's in a Byte? language design	79	7054	March 25, 2019
Comparing dangling pointers language design	43	11199	March 25, 2019

Pointers Are Complicated II, or: We need better language specs

Related topics