Non-negative integer types

There's a reason that Rust uses Result<T, io::Error> instead of errno, or (sometimes[1]) Option<ptr::NonNull<T>> instead of *mut T. It's definitely important to know when to Keep It Simple and avoid nonbeneficial complexity, but the other side of the coin is Primitive Obsession which is just as bad and similarly leads to incidental complexity. It's not just about Parsing, not Validating or Making Illegal States Unrepresentable; it's also about both communication and composability.

Which of these APIs are simpler to understand at a glance?

/// Waits for some event on a file descriptor.
///
/// Returns either a nonnegative value representing how many file descriptors
/// observed an event, or `-1` indicating a failure that set `errno`.
///
/// A negative `timeout` indicates that no timeout should be set, and
/// `poll` will block until an event occurs.
///
/// [errno]: mod@crate::errno
pub fn poll(fds: &mut [PollFd], timeout: i32) -> i32;

/// Waits for some event on a file descriptor,
/// returning how many file descriptors observed an event.
pub fn poll(fds: &mut [PollFd], timeout: Option<u31>) -> Result<u31, ErrnoSet>;

// the above would potentially be a way to wrap poll's semantics directly;
// but, based on my reading of the man page, it might be better exposed as
pub fn poll(fds: &mut [PollFd], timeout: Option<Duration>) -> usize {
    assert!(fds.len() <= c::RLIMIT_NOFILE);

    let mut tmo;
    let tmo_p = timeout.map_or(ptr::null_mut(), |timeout| {
        assert!(timout.as_secs() <= i64::MAX as u64);
        tmo = c::timespec {
            tv_sec: timout.as_secs(),
            tv_nsec: timeout.subsec_nanos()
        };
        &mut tmo as *mut _
    };

    loop {
        let res = c::ppoll(fds.as_ptr(), fds.len(), tmo_p, ptr::null_mut());
        if res < 0 {
            match c::errno() {
                c::EFAULT => unreachable!(),
                c::EINTR | c::EAGAIN => continue,
                c::EINVAL => unreachable!(),
                c::ENOMEM => panic_str_nounwind(PANIC_ENOMEM),
                    // or some way to respect the user's alloc_error_hook
                _ => unreachable!(),
            }
        }

        return res.try_into().unwrap();
    }
}
// because we can guarantee no failure other than allocation failure (fatal).

Using more expressive types communicates information about the interface in a structured manner. The caller is forced to acknowledge the fallibility of the function. The caller gives "some timeout" or "no timeout" and you don't have to describe in prose what a negative timeout means. You don't have some APIs which require using -1 and some which accept any negative value as representing none. You don't have to remember which functions return an error value directly and which only set errno. You don't have to document and remember if there are magic values other than INFTIM. You can wrap the function without having to re-document semantics added on top of primitives used just because primitives are easy.

Actually, reading the linked manpage, poll(2) is defined to accept any negative timeout value as an infinite timeout, not just -1.


  1. Due to API shortcomings, it's typically significantly easier to work with *mut T than Option<NonNull<T>>. For this reason, the current typical guideline is to use NonNull<T> "at rest" (in your structs) but to use *mut T for any working value. ↩︎

12 Likes

When you write user-friendly API, it is better to use Option and Result, but in this case I dont know, what's the reason to create overengineering custom types. I think, even Option<isize> (16 bytes) is not a problem - do you really have a case when memory consumtion of Optionals is a problem?

When I download a library with custom types (like "duration" etc), every time I spend a lot of time to google, how to convert all this stuff to numeric types. The good example of terrible design is a std::chrono (or std::prono?) library - every time u use it u need to google. And compare it with "not ideologically sustained" simple functions like time(), QueryPerformanceCounter() etc.

The opposite side of primitive obsession are overengineered libraries which are very hard to use.

Notice that with respect to types, there are 2 principle ideologies: Weak typing and strong typing.

C, which was designed with KISS in mind, is predominantly weak typed.

Rust on the other hand is very strong typed and tends to view this as a strengh (and a lot of the savety of Rust builds on it.)

Saying that it is is better to just use i32 everywhere is a valid position, but it does go against Rust's usual ideology: "Make a new type to rule out API abuse at compile time."

3 Likes

Duration is a core type.

Also, unlike C++, Rust has rustdoc built into the standard toolchain. You don't need to fight with Google or documentation tooling; the docs are available locally via cargo doc and online via docs.rs.

It's always a tradeoff and a balancing act, but the default Rust style is biased towards using more and richer types to model the problem domain and categorically prevent certain classes of domain errors.

9 Likes

FWIW, I love std::chrono. Having the distinction between time_point and duration is incredibly helpful, and at $RealJob I've taken advantage of its flexibility to provide convenient types over different clocks. For example, exposing https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime via a chrono type so that subtraction works correctly where it should, and (also importantly) doesn't work if you subtract against other types with different epochs (like GetTickCount64 function (sysinfoapi.h) - Win32 apps | Microsoft Learn ).

1 Like

The only thing that really wrong with Duration is the lack of a literal syntax like C++'s 3s. std::time and chrono have a couple more warts, but as an industry I think we've finally started to figure out time (jury's out on dates, numbers and text, on the other hand)

1 Like

What operations would a ranged integer support?

Check out deranged, which is my proof of concept implementation. As I wanted it to work on stable, I didn't implement Add or other similar traits. They can be implemented on nightly, however. In theory, if they were built-in, you'd only need a single type, with the compiler being able to choose the best backing type.

3 Likes

Have you considered adding expanding_… methods to the existing checked, unchecked, and saturating that returns a type with bounds large enough to cover the entire output domain of the operation?

1 Like

This is my holy grail for a numeric type system. It'd be hard to represent in Rust today, but I'd love to just have Integer<MIN, MAX> with some sort of only-allowed-at-compile-time infinite-precision value for the bounds, where the compiler would just pick a reasonable backing store.

Then all the operators would just work without you ever needing to think about overflow behaviour, and questions about overflow would happen only at the end when you need to pass them to or store them in something with a restricted range, where you'd have the usual complement of into (for widening), try_into, saturating_into, wrapping_into, etc.

That way you'd never need to worry about whether a + b - c should really be a + (b - c) or b + (a - c) and stuff like that. And LLVM already knows to convert (x as u32 + y as u32) as u16 to x as u16 + y as u16 (in release where it's wrapping), so having the larger intermediate results would be zero-cost in the sense that if you didn't actually use the larger intermediate results then it wouldn't be more expensive than the normal way.

(There are a couple things that we be awkward with such things, like trying to do a for (int i = 0; i < n; i = i + 1) loop, but that's ok because we already discourage that, and we could have Integer<MIN, MAX>: Step so that for i in 0..n works for these types without exposing the unsafe dances need to implement that kind of thing.)

5 Likes

This is pretty much exactly what I'd like to see eventually. It would "just work" and would trivially have niche value optimization, which is a great benefit to top it off. If I could implement it in user code, I would, but it's simply not possible.

2 Likes

If you felt like taking on a(nother) project, I think the first plausible lang step here is actually figuring out how to have compile-time unbounded integers for use in const generics.

Right now you have to pick which limited-size integer type to use, and for a bunch of stuff there just isn't a good answer. It would be really cool to not have to choose for things that aren't runtime values anyway. And that'd be useful & stabilizable independently from the bigger project of ranged integers.

Proper ranged integers is on my list of things to do, but it's quite low priority.

I actually don't think it would be terribly difficult to have unbounded integers in const generics. The parser already handles any number of digits, and the compiler has a bigint type (as library code).

1 Like

Do you have any example code that uses it?

That uses what?

Sorry, I should have been more clear. Have you written anything that uses deranged?

I have experimentally used it with time, but nothing that was ever committed or published.

I would add that a good example of the holy grail here would be Zig's primitive types e.g. u1 , u2, u3, etc. all which can be efficiently packed if a struct is marked to do so.

In addition to the integer types above, arbitrary bit-width integers can be referenced by using an identifier of i or u followed by digits. For example, the identifier i7 refers to a signed 7-bit integer. The maximum allowed bit-width of an integer type is 65535.

I disagree, actually, because those are just modulo-a-power-of-two, and have all the same overflow problems. The point of my "holy grail" post is in things expanding to the the extent necessary to never have overflow issues, but not more. And because of the careful bounds on the types, there's also infallible divide-by-zero for types with a lower bound above zero -- but all those zig types include 0.

We could add u1, u2, etc to Rust relatively easily. After all, LLVM already supports i1 though i8388608 (although backends probably don't have great support for them, as seen when we added i128 & u128 to Rust).

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.