Rust platform portability and maintenance


#1

We’ve got a creeping maintenance problem with Rust’s portability. It’s pretty vital to Rust’s future that it can run everywhere - fast, safe, and works everywhere is a powerful story - but the way the Rust codebase is organized for portability is starting to get quite unwieldy. std and libc in particular have grown a tangle of cfg attributes that get uglier every time somebody merges a new platform. How can we restructure our codebase to expand to every platform in the world? I’m hoping some hero steps in and shows us the way.


#2

Stop trying to repackage C, and instead embrace using C portability tools for managing C portability? Like this:

/// Returns the platform-specific value of errno
pub fn errno() -> i32 {
    #[cfg(any(target_os = "macos",
              target_os = "ios",
              target_os = "freebsd"))]
    unsafe fn errno_location() -> *const c_int {
        extern { fn __error() -> *const c_int; }
        __error()
    }
// and a whole bunch more platorm-specific errno variants

seems unnecessary, to me, when you could instead:

pub fn errno() -> i32 {
    extern "C" rust_errno() -> i32;
    rust_errno();
}

and

#include <errno.h>
sysu32_t rust_errno() {
  return errno();
}

lowering the total volume of code, making the libraries more portable, and gaining the ability to use portability constructs that C developers have been developing since forever (while avoiding the gotchas that have crept in interfacing to C code written under the assumption that it would only have C-language clients). I know this won’t work for everything, but this approach could help in a lot of places.


#3

Is there a good reason to manually maintain all the libc function definitions in src/liblibc/lib.rs etc. rather than using bindgen, perhaps even at build time? (One caveat: bindgen doesn’t support turning constants defined using #define into consts, but I don’t think that’s terribly difficult to fix.)


#4

the cfg attributes are just marginally better than preprocessor ifdefs.

So, a more readable and manageable solution might be to pack all the platform specific stuff in it’s own git branch (one for each platform) that keeps merging in the master branch. The master-branch has nothing inside the functions (or some kind of placeholder). This would mean that the master branch is not compilable. Without some new tools this would be unmanageable though.

Of course this won’t work since cfg attributes also exist for the choice of malloc. In the end a branch would be needed for every combination of malloc + platform + whatnot.

I noticed my idea won’t work while writing it down, but maybe it inspires someone to a better idea :slight_smile:


#5

You’d be surprised. I was working on a thing to translate the Win32 headers, and #define constants are such a massive pain. I tried to add in some heuristics for guessing what type they were supposed to be, but it didn’t really work. A lot of the time, they’re basically polymorphic, being used as multiple, different types.

What’s more, unless you want to throw away bindgen and re-write in C++ it to bind to clang’s AST (bindgen is using cindex, which is basically just definitions), you can’t figure out if a given #define is an expression, let alone if it’s constant or not.

The only real recourse is to bind them all by hand… at which point, why were you modifying bindgen, again?


#6

Hmm, I admit I don’t have much experience with Clang’s API, although a quick look at LLDB indicates that its use is hardly elegant either (it writes all expressions to evaluate into a temporary file)… but I’m not sure what the issue is. CXTranslationUnit_DetailedPreprocessingRecord gives you macro definitions, though you only need the names; then you could generate code like

enum { value_of_MACRO_A = ((((((MACRO_A))))) };

(for correctness, add as many parentheses as there are )s in the enum, I guess…) Append this to the translation unit and reparse. Ignore diagnostics, corresponding to definitions which are not valid constant expressions, and treat the rest like bindgen currently does enums. Wouldn’t that work? You still wouldn’t know the correct type, but that’s not the end of the world, as calls to C functions typically require a bunch of casting anyway.

Or if you really wanted to be polymorphic, could always make them functions (probably a bad idea though)… fn SOME_CONSTANT<IntTy: FromPrimitive>() -> IntTy { <IntTy as FromPrimitive>::from_u32(12345).unwrap() }


#7

Who says macros have to result in an integer? Or even a statically-computable value? Oh, and you can’t just rewrite them into Rust macros because they may not be hygenic.

Like I said, I don’t think it’s possible to automatically translate them. The best you can do is translate them by hand and get the generator to check that you haven’t missed any.


#8

In case you do decide to split up your platform definitions based on platform into different libraries… cough cough.

Manually writing bindings for the Microsoft headers is a nightmare, but getting bindgen to be able to understand all those macros and output the correct code isn’t much better. I’ve generally been able to get away with using DWORD for most things and then special casing a few things. That said, if I really wanted a generic constant that had to be available at compile time, I could just export a Rust macro for it, but that’s just kinda ugly.

@comex Using enums for macro values is a Bad Idea, mainly because some macros have overflowing values.


#9

@DanielKeep They don’t have to result in anything even vaguely resembling an expression, but if they don’t, that enum declaration will simply fail, generating diagnostics which are ignored. (Of course, that won’t help if you want to use a macro that isn’t a poor man’s enum, but such macros are far less common in most C libraries.)

@retep998 What do you mean by overflowing? Overflowing what?


#10

@comex The integer literals in a #define will sometimes be outside the bounds of the integer type used for enums. C enums are backed by int which is a signed integer, and signed integer overflow is undefined behavior.


#11

Huh, now that’s one less thing I don’t know about the C standard. Looking at the spec, though, C++(11) does not require it, and even in C mode, given a larger value, both gcc and clang only warn with -Wpedantic, and proceed as you’d expect (sizeof enum becomes 8, value is preserved).


#12
enum Foo {
    Bar = 0xFFFFFFFF,
};
cout << Bar << endl;

This outputs -1. If I do sizeof I get 4. Even if I make it 0xFFFFFFFFFF it still is only 4 bytes. This is using MSVC, not gcc/clang.


#13

Good thing we are talking about libclang then ;p

When it comes to the C++ (as opposed to C) spec, I think MSVC is in the wrong there, as with so many other things. From C++11/14 specs:

For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. If no integral type can represent all the enumerator values, the enumeration is ill-formed. It is implementation-defined which integral type is used as the underlying type except that the underlying type shall not be larger than int unless the value of an enumerator cannot fit in an int or unsigned int. If the enumerator-list is empty, the underlying type is as if the enumeration had a single enumerator with value 0.

Of course, if you were worried about this you could just split each definition into two or more enumerators. I do seem to have a habit of getting lost in pedantry on this forum…