Improvements to AsciiExt


#1

While working on a pangram exercise, I found that there are some improvements that would be useful in AsciiExt. Also, I’m thinking about a first probable way to contribute to the language so any guidance you can give me will be great.

So, coming with a Python background, dealing with ascii usually involves using some constants present in the string module. For example string.ascii_letters, string.ascii_lowercase or string.punctuation.

Even while some of this constants have equivalents in std::ascii, some additional constants and functions would make it easier to deal with ASCII. For example, for the pangram example, I implemented a custom trait:

trait MyAsciiExt {
    fn is_ascii_letter(&self) -> bool;
}

impl MyAsciiExt for char {
    fn is_ascii_letter(&self) -> bool {
        match *self as u32 {
            65...90 => true,
            97...122 => true,
            _ => false
        }
    }
}

Additional methods would be implemented as is_ascii_punctuation or is_ascii_whitespace.

Does this makes sense at all? Would this be something you could consider as an useful improvement for std::ascii?


#2

Sounds useful and appropriate for AsciiExt.


#3

I’d like to see AsciiExt provide is_ascii_ versions of the majority of C’s isxxx functions (from ctype.h). These should be no-brainers:

  • isalphais_ascii_alpha (or _letter, or _alphabetic)
  • isalnumis_ascii_alphanum
  • isdigitis_ascii_digit
  • isxdigitis_ascii_hexdigit
  • isupperis_ascii_uppercase
  • isloweris_ascii_lowercase
  • isgraphis_ascii_graphic (0x21 … 0x7E)
  • ispunctis_ascii_punct

iscntrl is rarely useful anymore, but its definition is unambiguous (0x00 … 0x1F, 0x7F) so I wouldn’t object to adding it.

isspace, isblank, and isprint are troublesome because there is so much variation in what people mean by “ASCII whitespace”; it seems to me that most “production” software is going to wind up wanting something customized for whatever file format they’re processing. However, leaving them out altogether might be a trip hazard for beginners. I dunno.


#4

That looks almost exactly like what I wanted.

What do you suggest to be my next step? Should this be defined in an RFC? or is a simple PR enough?

Regards,


#5

I was going to say it would be a backwards compatibility issue, but according to the docs AsciiExt was stable since 1.0.0 and two new methods were added to it in 1.9.0. I thought adding methods to a trait was backwards incompatible as any external implementors of the trait would then break because they’re missing the new methods :confused:.

Looking a little more into it those methods were there from 1.0.0 but unstable, so they probably made it impossible to stably implement it anywhere externally, but now that all the methods are stabilized I’m not certain if new methods can be added.

EDIT: I guess they could all be part of a new AsciiExtIs trait or something.


#6

Even while it would be almost transparent for users, I don’t know if having a separate AsciiExtIs trait would introduce confusion about what to use, since we already have is_ascii in AsciiExt.


#7

they could have default impls that panic, and there could be a lint warning you that the trait has new methods you should implement.


#8

Hi!

I’d like to know if this changes would require going through the RFC process taking the approach described by @ker. Any guidance on this?

Best Regards,


#9

According to the stability for libraries RFC, as long as the functions you add to the AsciiExt trait have a default implementation they are fine to add.

These changes seem to me to be trivial enough that you could consider just opening a PR, I’ve definitely observed similarly-sized RFCs get closed (or accepted!) with the comment “just open a PR”.


#10

I implemented the is_ascii_* functions I mentioned above, see language issue #39658 and PR #39659.