Propose new operators to reserve / include for edition 2018


#81

Well, haskell has a backtick at both sides for “word- functions” to be used infix.

a `mod` b    -- same as (mod a b)

We don’t need to copy that style exactly, it’s just most likely to be familiar to at least some people if we do it that way. Backslash (TeX users) also got a mention already :wink:

Some day, maybe


#82

I’ve created a new comment in the github issue for infix operators. We should continue the discussion about this over there.


#83

I can’t believe I forgot to bring this up before, but has anyone proposed the -> operator for raw pointer “unsafe-deref-and-access”?

Even if we don’t want people to use raw pointers much, some people have to, that’s just a fact, and we might as well make their lives as nice as we can while they’re doing their work.


#84

Probably better to guide them to make the unsafe shell as small as possible, and use proper references outside.


#85

No, absolutely not better. You’re already admitting that the unsafe shell has to exist at all, so why wouldn’t you just make that portion of the code as easy to write as possible?


#86

Because it’s yet one more thing to learn and to explain to new people.

With this argument I can add specialized features for anything, because “it makes that portion of the code as easy to write as possible”.

There is a limited capital for additions in this space, and I’d much rather spend this capital on really common things, like the ? operator did.


#87

But by that argument almost this whole topic is stupid and we shouldn’t bother?

And since the -> operator would do the exact thing it does in C and C++, there’s very little danger there. For others who have never used those languages you just say “you deref a raw pointer with the arrow instead, so that you remember how unsafe raw pointers are”.

Alternatively we could make an UnsafeDeref op which uses . just like Deref does, and just forbid a type having both Deref and UnsafeDeref at the same time (similar to Copy and Drop). Alternately we could make normal traits be allowed to be implemented unsafely or constly despite the default definition (there’s an RFC for this).

But something to make raw pointers more ergonomic should be done.


#88

I wouldn’t say “stupid”, but I basically agreed (I won’t repeat all my arguments and previous comments here) that we “shouldn’t bother”. So yes, “don’t do any of it” is an existing point of view.


#89

I would agree that a strong argument for a lot (or even any) new operators being added to the language has not come forth.


#90

The fact that exists in C makes me think that * should have just been a postfix operator, so instead of a->b it’d have just naturally been a*.b (no need for parens like the current (*a).b). And that extends nicely to a***.b if you need it, unlike ->.


#91

I concur. Where are the statistics that show how much existing crates would become more readable if they had been coded with any of the proposed groups of operators?

In late 2017 I considered opening an RFC for multi-glyph lexemes for the wrapped-add/sub/mul and rotate-left/right traits and their assign variants. I discussed this somewhat in an earlier post. Fortunately caution prevailed; I grep’d both rustc and the crypto crates and concluded that only 5% of the uses of add/sub in that corpus needed wrapping arithmetic.

I felt that only 5% applicability in what was a heavily crypto-weighted corpus showed that my proposed operators occurred too infrequently in real code to justify complicating the language. Thus I abandoned my pre-RFC work. (However, I did gain a significant understanding of the internal structure of rustc in the process, so I do not consider it a wasted effort.)


#92

What this suggests to me is that trying to make a bunch of operators from the limited set of ASCII punctuation/non-AlphaNumeric characters is a fools errand and is the true root of the problem. That’s why (above) I argued that if Rust seriously wants to incorporate a lot more operators (of any significant amount) the discussion should start with how to enable within the entire ecosystem the use of Unicode Mathematical/Symbolic characters/operators and then go from there; otherwise, it will just be an endless soup of seemingly random combinations of `~!@#$%^&*()_-+={}[]:;’"\|,.<>?/ which isn’t all that useful and will create mostly unreadable code except for the few fully initiated into whatever DSL is being used at a given point.

My argument would be that if Rust isn’t willing or doesn’t consider it feasible to use Unicode “Operators” then just forget the whole “adding new operators to Rust” exercise and focus on, perhaps, the idea of in-fix method calling.


#93

Unicode operators are fancy to look at, but terrible to type.


#94

Agree. That’s why I mentioned all the caveats about how it might usefully work with editors and IDE’s. For this reason alone, I don’t think it is workable. Not until support for typing these kinds of characters in editors and IDE’s is ubiquitous, easy, and generally desirable. In other words, [strike]NEVER[/strike]UNLIKELY.

EDIT: On the other hand, which is more important: easy typing of code or easy reading of code? It’s an important point to consider with respect to the idea of using Unicode Operators. I would argue the latter, so, perhaps, “easy” isn’t entirely necessary. Perhaps mostly ubiquitous and reasonably non-difficult entry of Unicode Operators is enough to tip the balance in favor of an idea like this.

EDIT: Does anyone think breaking out a separate thread to bike-shed this idea would have any value? For the main set of editors/IDE’s currently used for Rust programming (i.e. have significant usage) how hard (and consistent) is it to insert Unicode operators? How about display? Assuming that most editors/IDE’s that are used for Rust already have reasonable support for this, what about the rest of the ecosystem? Any appetite for this discussion?


#95

Optionally, we could have ascii versions of each unicode operator, and then rustfmt could have a flag to convert unicode to ascii or back (according to project style).

But, unicode support or not, we’d still need an UnsafeDeref trait ;3


#96

A note on Unicode Symbols for operators:

  • in-console editors like vim/emacs, support is whatever your console’s is, unless there’s a plug-in for that (there almost certainly is). I’m sure vim/emacs users would figure out a workflow that’d fit them :stuck_out_tongue:
  • XCode/macOS has the character picker at CMD-Option-Space, and you can find whatever Unicode Symbols in there if you know what to search for.
  • IntelliJ IDEA family of editors display the extended multilingual plane properly, but have no special support for entering characters not on your keyboard (other than regular intellisense autocompletion)
  • Visual Studio family basically depends on your system support for the characters.

Do note, though, that most monospace or “code” optimized fonts don’t have glyphs outside of ASCII, because programming doesn’t typically use them. The good ones will support accented characters as well so at least variable names can be local to the programmer if the language supports it.

I suspect there probably won’t be good support for entering Unicode in IDEs until there’s a need for it, so XCode has the macOS emoji picker because Swift likes to brag about emoji variable names. But if languages are waiting on IDE support for entering Unicode symbols, and IDEs are waiting on a need before putting effort to making an unnecessary process ergonomic, we’re in a deadlock.

The final bit to consider is that most Unicode math symbols are not going to look that great at a single column in a monospace font. So standardizing on column width becomes a lot harder, as you have to specify the font for which you’re optimizing your width, and query the font for the length of lines instead of just counting characters. (This can already be the case with strings where one codepoint ≠ one column.)

I’ve been meaning to test out how programming in a porportional font rather than a monospace one would look and feel for a while now, anyway. You can have a font with the good code requirements: distinct I and l, O and 0, without being monospace (though those are more common). We’re already guiding the rustfmt style towards block indenting over visual alignment anyway, and the only real reason I see for monospace fonts is column alignment and the fact it’s always been done that way.

Sorry for the tangent. The major advantage of only allowing custom operators that aren’t ASCII is that people will (ab)use them less since they’re harder to write, and only really prefer using them where they actually help readability. Though custom operators also means having to have some way to specify associativity/precidence or just punting, saying everything needs to be bracketed, and making everyone upset, because my operator precidence is obvious to me.


#97

I would like to see serious support for Unicode. It’s nominally in the compiler, under a nightly flag, but the above discussion about “column width” and “accented characters” makes clear that Asian fonts such as Chinese – the milk tongue of about 1/5 of the world’s population – has not had much intellectual effort devoted to its support.

Chinese (Hanzi) characters are conventionally displayed as a monospaced font with each character occupying a 16x16 pixel grid. That’s wide enough to support most traditional mathematical single-character operators as well. Any system that purports to support non-European languages should be able to handle such operators.

I personally like the idea of using Unicode operator glyphs, including potentially-ambiguous ones such as ⋏ and ⋎ (which somewhat resemble ˆand v). For entry I would create a small palette of the Unicode math operators that were relevant to my particular need, then copy or drag them into the program text in my IDE. IDEs might eventually develop drop-down menu hierarchies that assisted in the selection of such Unicode operators based on the mathematical domain in which they occur, but such support is not essential in the short term.

I see no reason to support differing operator precedences or implied commutivity and associativity, which rules may not be widely understood by those reading the code. Programmers who employ Unicode operators can express their intent directly without such assistance, using parenthesis where needed.

I do think that anyone who uses such extended Unicode operators should, for the foreseeable future, be required to add a set of definition comments at the root of the crate that explain the operators in a way that non-mathematician readers can understand or at least track down. For example,

/// **Unicode operators:**
/// ∀   For all [domain: existential quantification]
/// ∃   There exists [domain: existential quantification]
/// ℵ0  Alef null, the cardinality of the set of all natural numbers [domain: set theory]

All od these domains can be googled, with reasonable basic descriptive text found in Wikipedia.


#98

Would you (either of you, or anyone else for that matter) be interested in collaborating on an RFC or Pre-RFC for this? I’ve started one (nothing significant yet, just created it a few minutes ago) here: https://github.com/gbutler69/rfcs/blob/master/text/0000-unicode-operators.md


#99

I am willing to collaborate. I suspect that we need to start this by figuring out how we tell rustc that specific Unicode sequences are operators rather than operand identifiers. (E.g., precede the Unicode operator lexeme with a back-tic.) Without that, how is rustc supposed to differentiate between operators and operands?

Related, do we support prefix operators as well, such as the ∀ and ∃ of my up-thread examples, or just infix operators?


#100

I would think that you could tell the compiler what specific unicode code-points are operators, just like, today, you tell the compiler that specific ASCII characters are operators. Today, Mathematical symbols are not part of identifiers, so, I don’t think this should be an issue.

I think we should strive (if possible) to support:

  • arity-1 methods/tTraits as either pre-fix and/or post-fix unary operators
  • arity-2 methods/Traits as in-fix binary operators
  • (possibly) arity-3 methods/Traits as in-fix ternary operators
  • I don’t think any effort should be put into arity-4+ methods as operators unless a compelling argument can be made for specific operators

I’ve started to put together a list of which Unicode Blocks we should legitimately consider for use as operators and I’ve come to the following preliminary conclusion:

  • blocks specific to Mathematical Operators
  • blocks specific to Arrows of various kinds
  • (perhaps) some blocks of other special punctuation and characters

I’ll have something more detailed in the coming days added to the WIP RFC. Please feel free to chime in on this.