- Feature Name:
ascii? - Start Date: 2023-12-01
- RFC PR: ???
- Rust Issue: rust-lang/rust#110998
This RFC builds off of ACP#179, which just proposed the ascii::Char type.
Click to see ASCII table
Summary
Add an Ascii (![]()
) type, representing a valid ASCII character (0x00-0x7F).
Add a'.', a"...", and ar#"..."# ASCII literals.
ASCII string slices are type &[Ascii],
and owned ASCII strings are type Vec<Ascii> (or Box<[Ascii]>).
ASCII string literals (a"") are type &'static [Ascii; N].
Motivation
See ACP#179.
Sometimes, you want to work with bytes that you know are valid ASCII, and
you want to avoid littering your code with unsafe from_utf8_unchecked
conversions, or .unwrap() calls.
- Avoiding
"string".as_ascii().unwrap() - TODO
Guide-level explanation
TODO
Reference-level explanation
Ascii Type
// core::ascii
#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
#[rustc_layout_scalar_valid_range_start(0)]
#[rustc_layout_scalar_valid_range_end(128)]
#[repr(transparent)]
pub struct Ascii(u8);
Guarantees
The Ascii type is guaranteed to have the same size, align, and ABI as u8.
The Ascii type is guaranteed to be in the range 0..=127 (0x00-0x7F).
Values in the range 128..=255 are UB.
The [Ascii] type is guaranteed to have the same layout/ABI as str and [u8].
The [Ascii] type is always valid UTF-8.
Matching
The compiler allows exhaustive matching on Ascii.
match ascii {
a'\0'..=a'\x7F' => println!("yay")
}
Conversions
Safe conversions from Ascii types to strings, chars, and bytes are provided.
These conversions are zero-cost.
Ascii->charAscii->u8[Ascii]->str/[u8][Ascii; N]->[u8; N]Box<[Ascii]>->Box<str>/Box<[u8]>Vec<Ascii>->String/Vec<u8>
&mut [Ascii] -> &mut str / &mut [u8] is unsafe (just like str::as_bytes_mut).
Checked and unchecked conversions from strings, chars, and bytes to Ascii
types are provided.
The checked conversions only incur the cost of an is_ascii check, and the
unchecked conversion are zero-cost (but unsafe).
char->Asciiu8->Asciistr/[u8]->Ascii[u8; N]->[Ascii; N]Box<str>/Box<[u8]>->Box<[Ascii]>String/Vec<u8>->Vec<Ascii>
Methods
https://github.com/rust-lang/rust/issues/110998#issuecomment-1836101837
Trait Impls
core::str::pattern::Pattern(proposed here).
Formatting
Ascii implements Debug. Behavior matches that of char, except with
\x hex escapes instead of unicode escapes for non-printable characters.
This is already implemented.
Ascii implements Display. Behavior matches that of char and str.
Ascii implements Octal, LowerHex, UpperHex, and Binary. Behavior
matches that of u8. ((Is this correct?))
Formatting for &[Ascii] is an unresolved question.
Associated Constants
Associated constants are provided for all 128 ASCII characters.
This is currently
implemented as an enum with 128 variants.
An enum-based design is still possible.
Additionally, MIN and MAX constants are provided. (0x00 NUL and 0x7F DEL respectively).
Ascii Literals
Three new literal types are added:
- ASCII Character:
a'A' -> Ascii - ASCII String:
a"123456789" -> &'static [Ascii; N] - Raw ASCII String:
ar#"raw ascii literal "hi" \ :)"# -> &'static [Ascii; N]
a'.' and a"..." literals accept Quote and ASCII escape codes.
Raw string literals do not accept escape codes.
The following entries are added to the reference page on tokens:
| Example | # sets |
Characters | Escapes | |
|---|---|---|---|---|
| ASCII character | a'H' |
0 | All ASCII | Quote & ASCII |
| ASCII string | a"hello" |
0 | All ASCII | Quote & ASCII |
| Raw ASCII string | ar#"hello"# |
<256 | All ASCII | N/A |
Interaction with string-related macros:
- The
concat!macro will accept these literals. They will be treated as regular chars/strings. - The
format_args!macro will not accept ASCII string literals as the format string.
Drawbacks
- More complexity
- ?
Rationale and alternatives
- No ASCII literals, just the
Asciitype.- Means code is littered with
"string".as_ascii().unwrap().
- Means code is littered with
- ASCII char literals can be replaced with the variants/constants.
- The literals are shorter and nicer to look at.
- Doesn't solve the ASCII string problem.
AsciiStrandAsciiStringdedicated typesbstr0.1 had these but moved away from them because of conversion hell- https://github.com/rust-lang/libs-team/issues/179#issuecomment-1426922212
- https://github.com/rust-lang/libs-team/issues/179#issuecomment-1527900570
- An
ascii!macro instead of dedicated literals (proposed here)- Could work
- C string literals had the same alternative.
Prior art
- Mentions of ASCII literals:
- ICU4x's AsciiByte
- Memes:
Unresolved questions
- Formatting for
[Ascii](Debug and Display). We likely want to specialize those impls to behave likestr, but I'm not sure what the status/feasibility is on that. - Use an enum instead of a struct? This is how it is currently implemented.
AsciiStrtype alias? (proposed here)
Future possibilities
include_ascii!and other ASCII specific versions of string macros (proposed here).