nul and null are two different things. null is a pointer to address 0, while nul is the name of the character with value 0.
That distinction is sometimes elided (people often say "null-terminated" rather than "nul-terminated"), but that's the distinction the API names are based on.
IIRC "NUL" is the abbreviation blessed by ASCII/Unicode for the null character. I'm not sure how I feel regarding the proposal, but a part of me likes that we can differentiate between NUL (the character) and null (the pointer).
The similarity of the established terms is unfortunate, but the distinction is important. from_bytes_with_null would confuse me because in my mind "null" always means a null pointer.
I'm really not convinced it's worth it. The "did you mean?" hint for it should be nigh perfect if you try ptr::nul
error[E0425]: cannot find function `nul` in module `std::ptr`
--> src/main.rs:2:15
|
2 | std::ptr::nul();
| ^^^ help: a function with a similar name exists: `null`
or from_bytes_with_null
error[E0599]: no function or associated item named `from_bytes_with_null` found for struct `CStr` in the current scope
--> src/main.rs:3:11
|
3 | CStr::from_bytes_with_null();
| ^^^^^^^^^^^^^^^^^^^^
| |
| function or associated item not found in `CStr`
| help: there is an associated function with a similar name: `from_bytes_with_nul`
0000;NULL;control
0000;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation
But that does mean that null is a perfectly correct way to refer to U+0000, not something that only means the null pointer. (Arguably the non-abbreviation is generally better, as elaborated 0580-rename-collections - The Rust RFC Book, but I suppose abbreviations do make sense in method names because print_with_line_feed is a bit excessive compared to print_with_lf.)
Yes, but the 0-byte at the end of a CStr is just that : a byte in a series of (otherwise non-0) bytes, not a UTF-8 encoding. Thus the naming follows the ASCII convention (or perhaps even preceding ASCII?):
I have been annoyed by people speaking of ‘NULL-terminated strings’ and writing char c = NULL; enough times to appreciate, for once, a programming language clearly distinguishing the pointer to nothing from code point zero. They are named differently because they are different things, and it’s about time people learned that.
(Relatedly, about the only thing I like about Go is that it named its Unicode scalar value type ‘rune’, in order to distance the programmer, if only just slightly, from the misconception that Unicode scalars are isomorphic to ‘characters’.)
Well, not quite. C0 and C1 control codes don’t officially have names in Unicode, only formal aliases; they had names in version 1.0, but those were withdrawn in version 1.1, and control character aliases were introduced only in version 6.1. The field you are pointing at in the UnicodeData.txt file is the ‘Unicode 1.0 name’ (property Unicode_1_Name, na1). This is what allowed the introduction of U+1F514 BELL in Unicode 6.0, despite U+0007 being known under that name in version 1.0.