- Feature Name:
ascii
? - Start Date: 2023-12-01
- RFC PR: ???
- Rust Issue: rust-lang/rust#110998
This RFC builds off of ACP#179, which just proposed the ascii::Char
type.
Click to see ASCII table
Summary
Add an Ascii
() type, representing a valid ASCII character (0x00-0x7F
).
Add a'.'
, a"..."
, and ar#"..."#
ASCII literals.
ASCII string slices are type &[Ascii]
,
and owned ASCII strings are type Vec<Ascii>
(or Box<[Ascii]>
).
ASCII string literals (a""
) are type &'static [Ascii; N]
.
Motivation
See ACP#179.
Sometimes, you want to work with bytes that you know are valid ASCII, and
you want to avoid littering your code with unsafe from_utf8_unchecked
conversions, or .unwrap()
calls.
- Avoiding
"string".as_ascii().unwrap()
- TODO
Guide-level explanation
TODO
Reference-level explanation
Ascii Type
// core::ascii
#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
#[rustc_layout_scalar_valid_range_start(0)]
#[rustc_layout_scalar_valid_range_end(128)]
#[repr(transparent)]
pub struct Ascii(u8);
Guarantees
The Ascii
type is guaranteed to have the same size, align, and ABI as u8
.
The Ascii
type is guaranteed to be in the range 0..=127
(0x00-0x7F
).
Values in the range 128..=255
are UB.
The [Ascii]
type is guaranteed to have the same layout/ABI as str
and [u8]
.
The [Ascii]
type is always valid UTF-8.
Matching
The compiler allows exhaustive matching on Ascii
.
match ascii {
a'\0'..=a'\x7F' => println!("yay")
}
Conversions
Safe conversions from Ascii
types to strings, chars, and bytes are provided.
These conversions are zero-cost.
Ascii
->char
Ascii
->u8
[Ascii]
->str
/[u8]
[Ascii; N]
->[u8; N]
Box<[Ascii]>
->Box<str>
/Box<[u8]>
Vec<Ascii>
->String
/Vec<u8>
&mut [Ascii]
-> &mut str
/ &mut [u8]
is unsafe (just like str::as_bytes_mut
).
Checked and unchecked conversions from strings, chars, and bytes to Ascii
types are provided.
The checked conversions only incur the cost of an is_ascii
check, and the
unchecked conversion are zero-cost (but unsafe).
char
->Ascii
u8
->Ascii
str
/[u8]
->Ascii
[u8; N]
->[Ascii; N]
Box<str>
/Box<[u8]>
->Box<[Ascii]>
String
/Vec<u8>
->Vec<Ascii>
Methods
https://github.com/rust-lang/rust/issues/110998#issuecomment-1836101837
Trait Impls
core::str::pattern::Pattern
(proposed here).
Formatting
Ascii
implements Debug
. Behavior matches that of char
, except with
\x
hex escapes instead of unicode escapes for non-printable characters.
This is already implemented.
Ascii
implements Display
. Behavior matches that of char
and str
.
Ascii
implements Octal
, LowerHex
, UpperHex
, and Binary
. Behavior
matches that of u8
. ((Is this correct?))
Formatting for &[Ascii]
is an unresolved question.
Associated Constants
Associated constants are provided for all 128 ASCII characters.
This is currently
implemented as an enum with 128 variants.
An enum-based design is still possible.
Additionally, MIN
and MAX
constants are provided. (0x00 NUL
and 0x7F DEL
respectively).
Ascii Literals
Three new literal types are added:
- ASCII Character:
a'A' -> Ascii
- ASCII String:
a"123456789" -> &'static [Ascii; N]
- Raw ASCII String:
ar#"raw ascii literal "hi" \ :)"# -> &'static [Ascii; N]
a'.'
and a"..."
literals accept Quote and ASCII escape codes.
Raw string literals do not accept escape codes.
The following entries are added to the reference page on tokens:
Example | # sets |
Characters | Escapes | |
---|---|---|---|---|
ASCII character | a'H' |
0 | All ASCII | Quote & ASCII |
ASCII string | a"hello" |
0 | All ASCII | Quote & ASCII |
Raw ASCII string | ar#"hello"# |
<256 | All ASCII | N/A |
Interaction with string-related macros:
- The
concat!
macro will accept these literals. They will be treated as regular chars/strings. - The
format_args!
macro will not accept ASCII string literals as the format string.
Drawbacks
- More complexity
- ?
Rationale and alternatives
- No ASCII literals, just the
Ascii
type.- Means code is littered with
"string".as_ascii().unwrap()
.
- Means code is littered with
- ASCII char literals can be replaced with the variants/constants.
- The literals are shorter and nicer to look at.
- Doesn't solve the ASCII string problem.
AsciiStr
andAsciiString
dedicated typesbstr
0.1 had these but moved away from them because of conversion hell- https://github.com/rust-lang/libs-team/issues/179#issuecomment-1426922212
- https://github.com/rust-lang/libs-team/issues/179#issuecomment-1527900570
- An
ascii!
macro instead of dedicated literals (proposed here)- Could work
- C string literals had the same alternative.
Prior art
- Mentions of ASCII literals:
- ICU4x's AsciiByte
- Memes:
Unresolved questions
- Formatting for
[Ascii]
(Debug and Display). We likely want to specialize those impls to behave likestr
, but I'm not sure what the status/feasibility is on that. - Use an enum instead of a struct? This is how it is currently implemented.
AsciiStr
type alias? (proposed here)
Future possibilities
include_ascii!
and other ASCII specific versions of string macros (proposed here).