Generalized Base Syntax upto 36 or 64

Some languages like Shell have a generalized base syntax, e.g. 17#123456789ABCDEFG or 36#IAMBASE36. Since Rust already has parsing for bases up to 36 built in, we might expose this as syntax. In line with 0b, 0o and 0x, I propose a prefix with radix 0r36rI_AM_BASE_36 == 51_612_192_435_282.

I stumbled across this need, while thinking about a new rust_decimal::dec!() macro. There I want a way to represent these.

For extending upto base64, see my response to @Quercus.

2 Likes

I'm not sure base 36 would be useful enough to be included in the language. Base64 would be useful, but it is has incompatible characters.

2 Likes

Enhanced support for sexagesimal (base 60) might be useful, but it's probably a big thing to ask.

Maybe base 4 could be useful, where "q" for quarter.

0q12301230

base 4 is twice more compact then binary (base 2) and twice less compact then base 16.

What are some practical use cases of this, where you need to define numbers in the code in any base except 2, 8, 10 and 16? Even 8 is rather niche, I have never seen it used outside of Unix mode numbers.

I don't feel like generalised support for literals in weird bases is a feature that would carry its weight.

If you are thinking about making an RFC about this you need to show that there is a genuine need for the feature, and not just for a single project.

18 Likes

Base62 would be more useful (and encompass base60.) For bases upto 36 one can be case insensitive. Beyond, one must decide whether lower- or uppercase letters come first. Alas there is no universal concensus on that. Wikipedia gives them in Ascii order. But most discussions around major languages and the doc for from_str_radix shows lower- before uppercase letters. So I had hoped it would use that order beyond 36. Instead (like many languages) it falls over after 36. If this flies, Rust would need to decide on an order, maybe inspired by what the mentioned doc implies! That would then be the default for from_str_radix (with maybe a companion function for the opposite order.) That’s what Shell does:

$ declare -i a=36#a b=36#A c=37#a d=37#A e=64#a f=64#A; echo 36: $a $b / 37: $c $d / 64: $e $f
36: 10 10 / 37: 10 36 / 64: 10 36

Base64 would of course be great. Now we have the mess that (unlike the name suggests) it mostly uses a completely different order than hex, base36 and base62. Why are some standards hell bent on making life hard? So these should certainly not have a prefix of 0r64r… (which should be 0-9a-zA-Z…), maybe 0s… for internet base Sixty four (A-Za-z0-9….)

In Thor’s day and age and upto the 15th century, base 12 was somewhat used. With 52 letters that would have given us the necessary 64 digits cleanly. But now, being two digits short, we have various sets of two random extra characters (['+', '/'], ['+', ','], ['@', '_'] or ['-', '_']). All of these variants collide with special characters somewhere. When generating, one must choose a variant. But parsers should ideally accept them all, i.e. three different input digits each for 62 and 63.

To have these annoying digits in a Rust number, one must mask them. This could be 0-based 0r64r1\+2\/3 == 0r64r"1+2/3" == 33042371 or internet A-based 0s1\+2\/3 == 0s"1+2/3" == 905670647 (mathematically transitive equals.)

Or we create yet another variant, choosing the two extra digits from characters that won’t collide with anything, i.e. two of ['#', '$', '@'], where Shell precedent suggests '@' == 62. However we can’t have '_' == 63 as underscore is ignored in numbers. Instead we could have 0r64r1@2$3 == 33042371 and 0s1@2$3 == 905670647.

Octal is also seen for chars, though Rust has opted out of this tradition. One is more likely to find the od command installed, than the xd command.

Even if nobody has sufficiently compelling use cases, I’d still want to know: If you were to come across such numbers, which syntax would you be most comfortable with?

Base64 is different because it encodes [u8], not numbers. For instance AAAA and AAAAAAAA are different even though both are sequences of zeros: the first one means [0,0,0], the second [0,0,0,0,0,0].

1 Like

That can be said for any base. Hex is commonly used like that.

Hex is commonly used both ways, decimal is rarely if ever used like that. Base64 is exclusivly used like that from what I have seen.

However, i have yet to see any examples of where constants in weird bases would actually be useful in code. I don't see a point to this discussion without compelling use cases as a starting point.

12 Likes

My reaction would be "base 27, what on earth is wrong with you? Is this secretly an obfuscated code contest? Or is it a cry for help? Do you need someone to talk to?".

And I would likely reject any PR at work for using an obscure base. I would probably reject any use of octal even, unless it is for Unix mode where it is customary. It has never come up, even though we do have some code in languages that I think support obscure bases (shell script, and maybe python too?)

4 Likes

The following is offered because it may be interesting from a cultural perspective regarding number systems, but please, nobody take it as a suggestion on my part that Rust should enhance support for a septemvigesimal number system. :smiley:

3 Likes

A macro or function in a third-party crate, with a nice long name and no attempt at brevity. e.g. third_party_crate::base36!(...)

We'd need a lot more information and compelling use cases before considering first-party support for such literals. We don't add features that nobody has a use case for.

You mentioned that you "stumbled across this" while thinking about a dec! macro, but you don't mention what led you to think there was a need for it.

4 Likes

u32::from_str_radix is already const so that covers all bases up to 36 except the syntax is a bit longer.

If I had to come up with notation from scratch with no regard to history, I'd replace literals like 0b101 and 0x13ff with something like 101$2 and 13ff$g. This would cover all bases up to 35. Single-letter bases rather than 13ff$16 because the latter mixes base 16 with base 10! 101#2 is easier to read but that collides with raw identifiers (r#s).

1 Like

Why not trying this:

b64"V2h5IG5vdCB0aGlzOg==".to_u128_le()

Maybe using string with prefix could be better.

Possibly one resolution for this thread is to just wait until the allocation/const interaction is figured out, and then a base64 const fn would also just work.

Or do it on b"V2h5IG5vdCB0aGlzOg==", which is already a &[u8; 20], so at const-time you could write something that returns a [u8; 13] from it, which doesn't require figuring out const alloc.

It does require figuring out const array length math, but for now you can solve that with just returning ArrayVec<u8, 20> and it's not a big deal that it's a bit wider than needed.

2 Likes

A problem with my proposal is that syntax like 123_i8 becomes impossible in bases where i is a digit.

Another misunderstanding of mine: natural base64 (adding letters beyond hex, and differentiating lower- and uppercase beyond base36) is very different from internet-base64. Not only the digit-meanings differ, but also the alignment. CPU numbers are usually little endian, but internet-base64 is big endian. :frowning:

@Quercus Fascinating article about human diversity! As soon as any of those languages have their digits in Unicode, they might want to use them in a Unicode friendly language like Rust :wink:

1 Like

Why not that? Because there's no use case for it, this entire thread has showed as much.

2 Likes

My impression as a newcomer is that evaluations of proposals for adding any new fundamental features to Rust will place emphasis on the degree to which such proposed features would enhance Rust's usefulness for working with machine-level aspects of programming. Accordingly, the best means of providing tools for working with culturally interesting number systems would be to develop and distribute new crates for that purpose, rather than to make fundamental changes to the language.

Please note that I visited the Rust Internals forum with the intention of merely looking around out of curiosity, and with the intention of not posting anything. However, I've found this discussion particularly interesting, and so departed from that original intention. For at least the foreseeable future, I'll probably just quietly drop in once in a while without "saying" anything. However, I do find the Rust Internals forum as a whole to be quite fascinating. :smiley: