Are bitwise operators that necessary?

I know this might be a controversial topic with possibly limited benefits but still I feel it may be part of an interesting and broader discussion about the syntax of Rust and the legacy of C.

In many rfcs for extending the Rust language concerns for possible syntax ambiguities are raised. The compiler is often said to be complex and full of special cases (I let the specialists confirm or disconfirm). One of the reasons is as far as I understand the use of the same tokens with multiple roles depending on the context. For example & is at the same time used for naming ref types, getting the reference to a variable and encoding the binary bitwise conjunction while | is at the same time used for introducing a lambda and encoding the binary bitwise disjunction. : is used for many things (type ascriptions, introducing bounds, loop lifetimes, etc). This diversity of meanings is known to make parsing and semantic interpretation more difficult. That’s why for example it took time to c++ to handle >> correctly when it corresponds to a double closing of templates rather than an operator on its own. Before that a whitespace was necessary between chevrons to desambiguiate.

Rust is aimed to match c/c++ for low level operations and we certainly need a way to express bitwise conjunction, disjunction, negation and the likes. Now in a modern and expressive language like Rust I really doubt these kind of operations are as usual as they are in c (I should certainly get some figures to support my claim but sorry I have no appropriate devices these days to do a search). Interestingly enough bitwise negation uses ! in Rust rather than the usual ~ because pointers were far more common at the time Rust was using sigils to represent them. So my question is: in the end do bitwise operations really deserve sigils on their own when they could just be mere functions ? I know that with traits we can overload such sigils to give them additional meaning but I doubt it is really worth it as they do not have other usual semantics contrary to mathematical operators (afaik and contrary to c++ shift operators are not even used for streams and io).

3 Likes

Regardless of if they do or not, since Rust is stable, they cannot be removed, so any conclusions that are different than the status quo would have to wait for whatever language replaces Rust.

Rust 2?

Previous discussion of “Rust syntax makes it harder to parse for computers”: https://users.rust-lang.org/t/single-pass-languages-and-build-speed/8199.

Usefulness of these operators depend on domain you are working with. They may be not very often useful in high level development (but bitflags is still the third most downloaded crate on crates.io for some reason), but if you are working with something like hardware registers, then bitwise operators and shifts are like bread and butter.

4 Likes

I’ve been using bitwise operators in C/C++ as well as in Rust, at least in the domains I was working on, so I’d miss them badly if they were gone. It would not magically solve all parser ambiguities. And << and >> are solved issues.

Bitwise operations are very useful in a system language, you can use them for generic code (sometimes they are handy even in Python, that’s far from being a system language) and they are handy in other situations like:

Replacing infix operators with regular functions is possible, but the resulting code is less readable.

I like how Ada does it, with infix operators xor, or, and, not. They are more readable than sigils, but lot of people know C-derived syntax for bitwise operators.

If i could have it my way, then types that would allow bitwise operations would be distinct from integers and the operations would be defined as methods, so that the “operator budget” could be spend elsewere.

But that ship has sailed since rust is stable and i think all in all its quite ok the way it is

3 Likes

Did you have any specific examples of ambiguity in mind? The syntax ambiguities I can recall arose mostly because people want certain features to have as little syntax as possible, or very closely resemble other existing features, which is going to lead to ambiguity no matter how perfect the existing feature set is. For instance, we want generators to look pretty similar to closures, so some of the syntax proposals for generators ended up being ambiguous with closure syntax. iirc the fact that | is also a bitwise operator wasn’t a significant part of those problems.

Personally, I feel the stronger argument (though still not that strong) against bitwise operators is that users familiar with boolean operations and not bitwise operations can get them mixed up. Which I assume is part of the reason Python uses “and”, “or”, “not” for boolean stuff. I might’ve argued for doing something similar in Rust if we were still in pre-1.0.

2 Likes

We actually have an interesting survey from Python, which suggests that | and & are at least as frequent as some comparison operators in the standard library (>=, <=, | and & all fall within 10--20 uses per 10k LOC). The point of the PEP linked is that matrix multiplication is actually used much more then them in relevant codes, and deserves a new operator (and it happened---it's now @). I feel this kind of analysis is very valuable in the language design.

6 Likes

Regardless of if they do or not, since Rust is stable, they cannot be removed, so any conclusions that are different than the status quo would have to wait for whatever language replaces Rust.

But that ship has sailed since rust is stable and i think all in all its quite ok the way it is.

@steveklabnik @madmalik I'm certainly aware that there are very little chance that this get changed, at least in Rust 1.0. Now this may not be totally impossible given that for example the possibility of deprecating the object trait syntax is still considered for the future. But there is a deeper reason for starting this conversion. I don't like the idea past decisions should not be discussed just because it is too late to change them. One can only learn from their mistakes if one aknowedges they are mistakes. My perception is that Rust still receives too much influences from C and the C family. I know similarity with C is an argument often raised out of fear that a syntax too unconventional might deter newcomers. However in the end Rust syntax already diverges largely from mainstream languages in several core aspects.

Usefulness of these operators depend on domain you are working with. [ ... ] if you are working with something like hardware registers, then bitwise operators and shifts are like bread and butter.

Bitwise operations are very useful in a system language

@petrochenkov @leonardo This argument could be used for any domain. People may require custom operators for many other domains they are interested in. Only statistics about the actual use of bitwise operators (and comparaison with mathematical operators for example or other common Rust functions) can give us an insight of their relative importance. Again I should have got those but I dont have appropriate tools for now.

but bitflags is still the third most downloaded crate on crates.io for some reason

they are handy in other situations like:

I dont see the point. There is no reason why bitflags or succint data structure should use operators | rather than a usual method or an operator like + if you really need brevity.

If i could have it my way, then types that would allow bitwise operations would be distinct from integers and the operations would be defined as methods, so that the "operator budget" could be spend elsewere.

@madmalik I totally subscribe to this. In the end thinking that bitwise and mathematical operations should apply to the same types is questionable. In C char is nothing but an integral types. In many new languages, including Rust, it is an independent type of data and I guess many people consider this divergence from C does make more sense. The same could be said for fix-sized arrays of bits than have no strong reasons to be the same thing as integral types. Again if a short syntax is really needed sigils + and * would now be allowed for bitwise disjunction and conjunction (and these are not unpreceded notations).

The syntax ambiguities I can recall arose mostly because people want certain features to have as little syntax as possible, or very closely resemble other existing features, which is going to lead to ambiguity no matter how perfect the existing feature set is.

@ixrec I was under a different impression. It seemed to me that many language ambiguities arise because people are using the same sigils, keywords or syntax patterns for unrelated simple features as it does not create ambiguities at the beginning. At this moment it seems to be a good idea not to introduce yet another keyword, not to create an unorthodox combination of sigils and not to use unusual or long syntax pattern. But later on after these features have stabilised one naturally want to extend some of them. This is when friction occurs. Replacing bitwise operators could either reduce this kind of friction or free operators (like ^ << or >>) for more common patterns in rust in the future.

We actually have an interesting survey from Python, which suggests that | and & are at least as frequent as some comparison operators in the standard library (>=, <=, | and & all fall within 10--20 uses per 10k LOC).

Thanks for the link @lifthrasiir. This is indeed very interesting and I agree this would be valuable to have similar figures for Rust codebase (but not only for the standard library). Now my analysis of this survey is a bit different from yours: <= and >= have a role very similar to < and > so it's not surprising one form is more frequent than the other. But if you compare the set of the 4 order operators to the set of the 6 bitwise operators the respective frequencies are 113 and 43 !

[...] or free operators (like ^ << or >>) for more common patterns in rust in the future.

For library authors it'd be possible to (ab)use these for other purposes. But I'm not sure what the community consensus on this is.

I think Rust made two design mistakes in this regard.

Mistake #1: Mix “operator syntax” with “operator semantics”

For example, the operator traits are called after the mathematical operation that they represent, the trait for + is called Add. However, say I want to implement string concatenation for my string type using +. Now I have to implement the Add trait, which conveys “Addition”, but addition is commutative, while string concatenation is not. Calling them OpPlus or even Op+ would have been “cleaner”. Anyhow, we can live with this.

Mistake #2: Implement operators for types where the meaning is overloaded

Consider the example above, would you implement + for your own string type? Rust std does not, because its meaning is overloaded, as explained above. However, it implements ^ to mean xor, when ^ can also mean exponentiation… This is not that bad, since lots of languages use ^ to mean xor and then they go on to use e.g. ** to mean exponentiation instead.

However, the bitwise operators << and >> can mean logical shift left/right, or arithmetic shift left/right. 90% of the time you don’t care since they produce the same result, but when they do not, then you do care and can’t tell what’s what. This sucks.

We can still live with this. For example, I have 4 functions in the bitwise crate: shift_arithmetic/logical_right/left that convert the integer to a signed/unsigned type of the same size, perform the shift, and then convert it back, because particularly in generic code, it is really hard to tell what << and >> are actually doing on integers at all.

So… in a nutshell, I actually care more about what these operators convey to the humans reading the program, rather than how hard they do make the rust compiler to implement. When there are any ambiguities, everything is fine. When there might be some ambiguities, I prefer to eliminate them with a more explicit approach. Explicit approaches that convey the wrong information (Add conveying addition instead of OpPlus) are misleading, but that’s how things are.

3 Likes

I would, because + is used for string concatenation in many popular programming languages (PHP is the only one I can think off offhand where it isn't), and thus I would expect programmers to immediately understand it and expect it to work that way.

And Rust does in fact use + for string concatenation.

fn main() {
    let s = "Hello".to_string();
    let s = s + &" world!";
    println!("{}", s); 
}

If you're using unsigned ints, then there's only one type of shift possible (logical). If you're using signed ints, you probably want arithmetic shift, and in the rare case that you want a logical shift, you can just cast to unsigned first. It's only really an issue in languages that don't have unsigned types and hence want to support "unsigned" operations on signed types.

FYI it comes from Perl, and there are some more examples (Lua .., Visual Basic &, ML ^, Haskell ++ and so on). I do agree that the monoid concatenation is frequently equated to the commutative addition in many programming languages.

For example, in C++, where it is considered a design mistake of the std::string type, mostly because users expect + to be commutative.

And if you are using type inference, you need to look up the type of the int to know whats going on. And if you are using generic ints (e.g. behind a trait), you don't really know.

It's trivial to live with + as concatenation, or << as bitshifts, but I don't think it was worth it. For example, Haskell solved the concatenation issue by just using ++ instead of + as concatenation operator.

It sounds like the solution there is to have traits for signed and unsigned integers, so you can document your contract to callers.

Though I'm really curious what kind of code you're running into this problem with. It's hard to imagine it being a problem in practice. Do you have any examples?

Most bit manipulation algorithms do not care about the type of integer you have, and you want them to work on both. Even then, you still want one or the other, depending on what you want to do. They are just not the same operation.

Any bitwise manipulation algorithm in which you just want to just shifts bits, without preserving the sign bit. This basically means that you always want arithmetic shifts when implementing those, independently of whether you are manipulating bits in signed or unsigned integers.


There are many ways to solve these issues, but << and >> aren't one of them. They change behavior depending the type of the integer you have, and when you don't know which type of integer you have (or want to abstract over that) they become "meaningless". Even if you define them to always do arithmetic shifts, you still need a way to do logical shifts when you need them... so... you would need at least 4 operators :confused: or... just use 4 functions whose name say what they do, like shift_arithmetic_right for example.

Writing down a long word interrupts my flow less than having to stop, go through multiple lines of let-like type inference, trying to backtrace the concrete integer type i am actually dealing with (or creating a compiler error so that the compiler spits it out), and then figuring out whether I want << or do a conversion and then << and then convert back.

By whom?

mostly because users expect + to be commutative

Users mostly don't care about commutativity. They care about convenient string concatenation though, and + does exactly that.

1 Like

Do you have any examples of code like that? I was curious whether you had any real world examples, since this sounds like an anti-pattern to me.

If you find yourself wanting to do logical shifts on a signed int or arithmetic shifts on an unsigned int, it's a sign that you should rethink the design of your code.