Null consistency

We have ptr::null and ptr::is_null, but CStr::from_bytes_with_nul and "nul-terminated" strings.

I think it would be more consistent to replace (deprecate and reëxport) all instances of nul with null.

What do you all think? If we have rough consensus here, I'll prepare a PR for T-libs-api FCP.

2 Likes

nul and null are two different things. null is a pointer to address 0, while nul is the name of the character with value 0.

That distinction is sometimes elided (people often say "null-terminated" rather than "nul-terminated"), but that's the distinction the API names are based on.

22 Likes

IIRC "NUL" is the abbreviation blessed by ASCII/Unicode for the null character. I'm not sure how I feel regarding the proposal, but a part of me likes that we can differentiate between NUL (the character) and null (the pointer).

8 Likes

I think a doc alias should suffice, as the two aren't quite the same as already stated.

1 Like

The similarity of the established terms is unfortunate, but the distinction is important. from_bytes_with_null would confuse me because in my mind "null" always means a null pointer.

2 Likes

I'm really not convinced it's worth it. The "did you mean?" hint for it should be nigh perfect if you try ptr::nul

error[E0425]: cannot find function `nul` in module `std::ptr`
   --> src/main.rs:2:15
    |
2   |     std::ptr::nul();
    |               ^^^ help: a function with a similar name exists: `null`

or from_bytes_with_null

error[E0599]: no function or associated item named `from_bytes_with_null` found for struct `CStr` in the current scope
 --> src/main.rs:3:11
  |
3 |     CStr::from_bytes_with_null();
  |           ^^^^^^^^^^^^^^^^^^^^
  |           |
  |           function or associated item not found in `CStr`
  |           help: there is an associated function with a similar name: `from_bytes_with_nul`

And that seems fine to me.


Note that, according to https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt, the name of U+0000 is "NULL", not "NUL":

0000;<control>;Cc;0;BN;;;;;N;NULL;;;;
0001;<control>;Cc;0;BN;;;;;N;START OF HEADING;;;;
0002;<control>;Cc;0;BN;;;;;N;START OF TEXT;;;;

Though admittedly https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt does list NUL as an "abbreviation"

0000;NULL;control
0000;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation

But that does mean that null is a perfectly correct way to refer to U+0000, not something that only means the null pointer. (Arguably the non-abbreviation is generally better, as elaborated 0580-rename-collections - The Rust RFC Book, but I suppose abbreviations do make sense in method names because print_with_line_feed is a bit excessive compared to print_with_lf.)

5 Likes

Yes, but the 0-byte at the end of a CStr is just that : a byte in a series of (otherwise non-0) bytes, not a UTF-8 encoding. Thus the naming follows the ASCII convention (or perhaps even preceding ASCII?):

# Abbr. Description # Abbr. Description
0 NUL Null 16 DLE Data Link Escape
1 SOH Start of Header 17 DC1 Device Control 1
2 STX Start of Text 18 DC2 Device Control 2
3 ETX End of Text 19 DC3 Device Control 3
4 EOT End of Transmission 20 DC4 Device Control 4
5 ENQ Enquiry 21 NAK Negative Acknowledge
6 ACK Acknowledge 22 SYN Synchronize
7 BEL Bell 23 ETB End of Transmission Block
8 BS Backspace 24 CAN Cancel
9 HT Horizontal Tab 25 EM End of Medium
10 LF Line Feed 26 SUB Substitute
11 VT Vertical Tab 27 ESC Escape
12 FF Form Feed 28 FS File Separator
13 CR Carriage Return 29 GS Group Separator
14 SO Shift Out 30 RS Record Separator
15 SI Shift In 31 US Unit Separator
2 Likes