[Pre-RFC] Removing the panic from to_digit and from_digit

The panics in from_digit and to_digit are rather arbitrary:


pub fn from_digit(num: u32, radix: u32) -> Option<char> {
    if radix > 36 {
        panic!("from_digit: radix is too high (maximum 36)");
    }
    if num < radix {
        let num = num as u8;
        if num < 10 {
            Some((b'0' + num) as char)
        } else {
            Some((b'a' + num - 10) as char)
        }
    } else {
        None
    }
}
...
   fn to_digit(self, radix: u32) -> Option<u32> {
        if radix > 36 {
            panic!("to_digit: radix is too high (maximum 36)");
        }
        let val = match self {
          '0' ... '9' => self as u32 - '0' as u32,
          'a' ... 'z' => self as u32 - 'a' as u32 + 10,
          'A' ... 'Z' => self as u32 - 'A' as u32 + 10,
          _ => return None,
        };
        if val < radix { Some(val) }
        else { None }
    }

Both functions already return a None when the number above the radix, and should also return None rather than crashing the program when the radix is greater than 36. If a crash is still desired, the Option<char> can simply be unwrapped.

This change is very simple to make. Although both functions are stable, this panic is not encoded in their signatures, and nothing in the standard library relies on this hidden panic; in fact from_digit is always unwrapd, while to_digit is either unwrapped or matched a way compatible with the elimination of this panic (though an error message may need to be tweaked). A cursory search of GitHub suggests this is in fact the case everywhere else, with every invocation I could find either passing 2, 10, 16, or 36 as the radix, unwrapping the result, or manually checking whether the radix was greater than 36 and explicitly panicking if so.

I’m not sure what the timeline is for deprecating semantics, even those that don’t break compilation, but I imagine this would take another release cycle.

Since libcore is being stabilized right now, I’d offer that panicking in the core library should be minimized, since every possible panic reduces Rust’s applicability and usefulness to baremetal systems where safety is a concern.

3 Likes

Hm, they don't seem particularly arbitrary at all: they're panicking when a relatively constant parameter is out of a very simple range (i.e. easy to manually verify).

For things like this (where it cannot be encoded in the type signature), the documentation behaviour is important, and both of these are documented as panicking; or, more importantly, they are documented as returning None in specific circumstances.

While it's a better proxy than nothing at all, there's Rust code than just the standard library. I can totally imagine things relying on returning None meaning that the input number/char didn't match the base, and so getting a None in some other cases (if they're not vetting input enough) could be confusing/incorrect.

There's not really a timeline other than the normal cycles, because there's no way to deprecate these gracefully without just introducing new functions with a different name (if this were to happen, returning Result would be better than an Option).

I think that the proposed change is unnecessary and that in fact the existing code is better. An out of range base is almost certainly a coding error and should panic. An out of range character is erroneously formatted input and should return None. An out of range numeric value is probably also a coding error, but in this case it makes some sense to return None (although IMO a panic makes more sense). Of course, as huon notes, it would be better to return a Result, which would make the issue moot.

A far bigger problem is that the names are completely backwards. from_digit converts a number to a digit character (and returns None if the number cannot be converted because it’s not in range), and to_digit converts a digit character to a number (and returns None if the character isn’t a digit in the given base). The documentation for to_digit says

Converts a char to the corresponding digit

but this is totally confused about the relevant concepts and representations: characters represent digits; numbers are numbers. And it is only certain characters that can be converted, namely those which are “digit” characters of the given base … and they are converted to the number (not digit) that they represent. Consider that the character, taken from an input stream, necessarily represents a digit within its context, but once it is converted to a number it is simply a number that can be used without regard to where it came from.

The documentation for from_digit says

Converts a number to the character representing it

This is too is quite confused; what character represents the number 45637? Of course, only certain numbers can be converted … namely, those that result in “digits” (which are characters) of the given base. Numbers that are too large to be represented as a digit are carved up with the familiar mod/div loop into a sequence of digits (the set of characters that are the tokens of a radix notation for numbers, developed many centuries ago).

If this isn’t clear enough, consider what the inputs and outputs of functions called to_digits and from_digits would be.

And as a final authority, talk to your local Unicode expert … those folks have to keep their concepts straight, and certainly know what digits are.

So what really ought to be done is to come up with new functions with properly descriptive names (perhaps u32_to_digit and digit_to_u32) that return Result, and deprecate these.

5 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.