This proposal extends dec and hex schemes upto radix 64. The common case, decimal, retains its cost of 1 sub and 2 tests. In bigger radices numeric digits save one test and letters save even more. Without affecting those costs, it can handle 28 more digits. As a bonus, I don’t need mut
. (This is not to be confused with internet base64 encoding, RFC 4648.)
Beyond 36 (unlike RFC 4648 and mathematicians) most software chose to put lowercase first. Beyond 62, various libraries let the caller supply 2 more “digits” of their choice. Some give at least a default:
Bash, to not collide with arithmetic, chose unusual digits @, _
. The others took only these last two digits from that RFC.
Base2n went with the Url variant -, _
.
Basencode and
convertBase have the normal +, /
. I follow that.
N.b. For easy testability as a normal function, I have temporarily renamed to selfie
. I have fixed the bug of not checking the radix lower bound, but see below. For the same effort it took to add a FIXME, couldn’t we have gotten const then_some
? Restricting such a simple function would never cause a regret.
const fn to_digit(selfie: char, radix: u32) -> Option<u32> {
let digit =
if radix <= 36 {
assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
match selfie {
// If not a digit, a number greater than radix will be created.
..='9' => (selfie as u32).wrapping_sub('0' as u32),
// Force the 6th bit to be set to ensure ascii is lower case.
_ => (selfie as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
}
} else {
assert!(radix <= 64, "to_digit: radix is too high (maximum 64)");
match selfie {
'+' => 62, // as in RFC 2045 & 4648
'/' => 63,
..='9' => (selfie as u32).wrapping_sub('0' as u32),
// Test these in Ascii order.
..='Z' => (selfie as u32).wrapping_sub('A' as u32).saturating_add(36),
..='z' => (selfie as u32).wrapping_sub('a' as u32).saturating_add(10),
_ => return None
}
};
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
The major caller, from_str_radix
, checks the radix bounds. So I propose this more efficient panic-free alternative. If the almost doubling of code is an issue, the identical matches could be factored out. Btw. they have better error diagnostics when not in const context. I would be cool to make those accessible above.
const fn to_digit_unchecked(selfie: char, radix: u32) -> Option<u32> {
let digit =
if radix <= 36 {
match selfie {
// If not a digit, a number greater than radix will be created.
..='9' => (selfie as u32).wrapping_sub('0' as u32),
// Force the 6th bit to be set to ensure ascii is lower case.
_ => (selfie as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
}
} else {
match selfie {
'+' => 62, // as in RFC 2045 & 4648
'/' => 63,
..='9' => (selfie as u32).wrapping_sub('0' as u32),
// Test these in Ascii order.
..='Z' => (selfie as u32).wrapping_sub('A' as u32).saturating_add(36),
..='z' => (selfie as u32).wrapping_sub('a' as u32).saturating_add(10),
_ => return None
}
};
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
fn test() {
for c in ' '..='~' {
print!("'{c}'");
for r in [2, 8, 10, 16, 36, 62, 64] {
// Directly {:8?} is inconsistent on Option and nesting w/o to_string ignores width.
print!(" {r}: {:8} {:8}",
format_args!("{:?}", to_digit(c, r)).to_string(),
format_args!("{:?}", to_digit_unchecked(c, r)).to_string());
}
println!("");
}
}
Only doc updates and for from_str_radix
, also minor panic string and 36 → 64 changes are needed.