Non-negative integer types

Their exists instance where valid arguments to a function are -1 or 0... An example being the timeout parameter in poll.

The goal for a rusty wrapper here would be to define the timeout argument as Option<u31>, without a NonNegativeI32 type this argument will always be larger than necessary, taking up 8 bytes where it could take up 4. In essence this is a similar argument to the one behind NonZeroI32.

NonNegativeI32 can be implemented in nightly like:

#[repr(transparent)]
#[rustc_layout_scalar_valid_range_start(0)]
#[rustc_layout_scalar_valid_range_end(2147483648)] // 2^31
pub struct NonNegativeI32(i32);

But should be avaible in stable like NonZeroI32.

I've mimicked most of the NonZero types as a crate here (GitHub - JonathanWoollett-Light/non-negative), but I am currently encountering a compiler error.

6 Likes

NonNegativeI32 is arguably also NotHighBitSetU32, so it'd be nice to find a type name that unifies the two.

Of course, the general answer is to expose RefinedU32<MIN, MAX> and RefinedI32<MIN, MAX> and friends, rather than yet more special cases.

2 Likes

Sounds like u31 to me?

9 Likes

I think the ideal solution would be the full set of [u1, u2, u3, ...] and [i1,i2,i3, ...] but this seems a more significant effort (having just u31 may seem inconsistent).

2 Likes

I'd love to have struct U<const BITS: u32> and stop using macros to define all the inherent methods for the individual types.

(Probably not practical right now, though.)

6 Likes

You can also make the “better Option<u32>” today using a biased representation: store as NonZeroU32, and add 1 to your “u31” or “non-negative i32”. It’s not as good because it only has one niche instead of billions, but it does work on stable.

I'm still in favor of ranged integers, which is probably the broadest solution to this problem. It inherently subsumes nonzero, nonnegative, and uX/iX.

10 Likes

I think, i32 is the best type for such cases.

Am I weird for thinking that an intrinsic could work great here?

#[repr(transparent)]
#[niche(0100_1000)]
struct U(u8);

#[repr(transparent)]
#[niche(0000_0001)]
struct NonZeroU8(u8);

// today
#[repr(transparent)]
#[rustc_layout_scalar_valid_range_start(1)]
#[rustc_nonnull_optimization_guaranteed]
struct NonZeroU8(u8);

My students occasionally get so confused by the fact that these particular arguments / return values can be −1 but will never be any other negative number. "Should I write if system_call() < 0 or should I write if system_call() == -1?" they ask, and when I say it's a matter of style, pick one and stick to it, they're, like, but they don't do the same thing, surely one is right and the other is wrong????

We're not using Rust but if Rust could accurately model a data type whose valid values are in ${ n \in \mathbb{Z} | -1 \le n < 2^{w-1} }$, for some number of bits $w$, that would make this particular pedagogical headache go away.

7 Likes

There's a reason that Rust uses Result<T, io::Error> instead of errno, or (sometimes[1]) Option<ptr::NonNull<T>> instead of *mut T. It's definitely important to know when to Keep It Simple and avoid nonbeneficial complexity, but the other side of the coin is Primitive Obsession which is just as bad and similarly leads to incidental complexity. It's not just about Parsing, not Validating or Making Illegal States Unrepresentable; it's also about both communication and composability.

Which of these APIs are simpler to understand at a glance?

/// Waits for some event on a file descriptor.
///
/// Returns either a nonnegative value representing how many file descriptors
/// observed an event, or `-1` indicating a failure that set `errno`.
///
/// A negative `timeout` indicates that no timeout should be set, and
/// `poll` will block until an event occurs.
///
/// [errno]: mod@crate::errno
pub fn poll(fds: &mut [PollFd], timeout: i32) -> i32;

/// Waits for some event on a file descriptor,
/// returning how many file descriptors observed an event.
pub fn poll(fds: &mut [PollFd], timeout: Option<u31>) -> Result<u31, ErrnoSet>;

// the above would potentially be a way to wrap poll's semantics directly;
// but, based on my reading of the man page, it might be better exposed as
pub fn poll(fds: &mut [PollFd], timeout: Option<Duration>) -> usize {
    assert!(fds.len() <= c::RLIMIT_NOFILE);

    let mut tmo;
    let tmo_p = timeout.map_or(ptr::null_mut(), |timeout| {
        assert!(timout.as_secs() <= i64::MAX as u64);
        tmo = c::timespec {
            tv_sec: timout.as_secs(),
            tv_nsec: timeout.subsec_nanos()
        };
        &mut tmo as *mut _
    };

    loop {
        let res = c::ppoll(fds.as_ptr(), fds.len(), tmo_p, ptr::null_mut());
        if res < 0 {
            match c::errno() {
                c::EFAULT => unreachable!(),
                c::EINTR | c::EAGAIN => continue,
                c::EINVAL => unreachable!(),
                c::ENOMEM => panic_str_nounwind(PANIC_ENOMEM),
                    // or some way to respect the user's alloc_error_hook
                _ => unreachable!(),
            }
        }

        return res.try_into().unwrap();
    }
}
// because we can guarantee no failure other than allocation failure (fatal).

Using more expressive types communicates information about the interface in a structured manner. The caller is forced to acknowledge the fallibility of the function. The caller gives "some timeout" or "no timeout" and you don't have to describe in prose what a negative timeout means. You don't have some APIs which require using -1 and some which accept any negative value as representing none. You don't have to remember which functions return an error value directly and which only set errno. You don't have to document and remember if there are magic values other than INFTIM. You can wrap the function without having to re-document semantics added on top of primitives used just because primitives are easy.

Actually, reading the linked manpage, poll(2) is defined to accept any negative timeout value as an infinite timeout, not just -1.


  1. Due to API shortcomings, it's typically significantly easier to work with *mut T than Option<NonNull<T>>. For this reason, the current typical guideline is to use NonNull<T> "at rest" (in your structs) but to use *mut T for any working value. ↩︎

13 Likes

When you write user-friendly API, it is better to use Option and Result, but in this case I dont know, what's the reason to create overengineering custom types. I think, even Option<isize> (16 bytes) is not a problem - do you really have a case when memory consumtion of Optionals is a problem?

When I download a library with custom types (like "duration" etc), every time I spend a lot of time to google, how to convert all this stuff to numeric types. The good example of terrible design is a std::chrono (or std::prono?) library - every time u use it u need to google. And compare it with "not ideologically sustained" simple functions like time(), QueryPerformanceCounter() etc.

The opposite side of primitive obsession are overengineered libraries which are very hard to use.

Notice that with respect to types, there are 2 principle ideologies: Weak typing and strong typing.

C, which was designed with KISS in mind, is predominantly weak typed.

Rust on the other hand is very strong typed and tends to view this as a strengh (and a lot of the savety of Rust builds on it.)

Saying that it is is better to just use i32 everywhere is a valid position, but it does go against Rust's usual ideology: "Make a new type to rule out API abuse at compile time."

4 Likes

Duration is a core type.

Also, unlike C++, Rust has rustdoc built into the standard toolchain. You don't need to fight with Google or documentation tooling; the docs are available locally via cargo doc and online via docs.rs.

It's always a tradeoff and a balancing act, but the default Rust style is biased towards using more and richer types to model the problem domain and categorically prevent certain classes of domain errors.

9 Likes

FWIW, I love std::chrono. Having the distinction between time_point and duration is incredibly helpful, and at $RealJob I've taken advantage of its flexibility to provide convenient types over different clocks. For example, exposing https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime via a chrono type so that subtraction works correctly where it should, and (also importantly) doesn't work if you subtract against other types with different epochs (like GetTickCount64 function (sysinfoapi.h) - Win32 apps | Microsoft Learn ).

1 Like

The only thing that really wrong with Duration is the lack of a literal syntax like C++'s 3s. std::time and chrono have a couple more warts, but as an industry I think we've finally started to figure out time (jury's out on dates, numbers and text, on the other hand)

1 Like

What operations would a ranged integer support?

Check out deranged, which is my proof of concept implementation. As I wanted it to work on stable, I didn't implement Add or other similar traits. They can be implemented on nightly, however. In theory, if they were built-in, you'd only need a single type, with the compiler being able to choose the best backing type.

3 Likes

Have you considered adding expanding_… methods to the existing checked, unchecked, and saturating that returns a type with bounds large enough to cover the entire output domain of the operation?

1 Like

This is my holy grail for a numeric type system. It'd be hard to represent in Rust today, but I'd love to just have Integer<MIN, MAX> with some sort of only-allowed-at-compile-time infinite-precision value for the bounds, where the compiler would just pick a reasonable backing store.

Then all the operators would just work without you ever needing to think about overflow behaviour, and questions about overflow would happen only at the end when you need to pass them to or store them in something with a restricted range, where you'd have the usual complement of into (for widening), try_into, saturating_into, wrapping_into, etc.

That way you'd never need to worry about whether a + b - c should really be a + (b - c) or b + (a - c) and stuff like that. And LLVM already knows to convert (x as u32 + y as u32) as u16 to x as u16 + y as u16 (in release where it's wrapping), so having the larger intermediate results would be zero-cost in the sense that if you didn't actually use the larger intermediate results then it wouldn't be more expensive than the normal way.

(There are a couple things that we be awkward with such things, like trying to do a for (int i = 0; i < n; i = i + 1) loop, but that's ok because we already discourage that, and we could have Integer<MIN, MAX>: Step so that for i in 0..n works for these types without exposing the unsafe dances need to implement that kind of thing.)

5 Likes