Idea: "language warts" RFC repo


#21

This is a design choice. You want more flexibility, other people want code simpler to understand. Rust has chosen to tie their syntax with semantics and that’s one of the ways to “tame” operator overloading in programming languages. Some designers decide to give full operator freedom as in Scala and you see libraries that you use with hieroglyphics (or C++ << in I/O), on the other extreme languages without operator overloading. Rust has chosen to add operator overloading but to tame it enough to avoid the perceived mistakes in C++ operator overloading design (usage) mistakes.


#22

This is a design choice. You want more flexibility, other people want code simpler to understand. Rust has chosen to tie their syntax with semantics and that’s one of the ways to “tame” operator overloading in programming languages.

Do you have a link to where this deliberate design choice was made? I’d like to read any rationale, because we have been trying to come up with semantics for PartialOrd for a while now so that the is_sorted RFC can make some progress, without much success. Any discussion that gave semantic meaning to PartialOrd's operators could help us a lot there (last discussion about this is here).

Also (playground):

fn main() {
    println!("{}", "1".to_string() + "2");  // prints: 12 instead of 3
}

looks more like concatenation than addition to me. So what does impl Add for T mean? The trait is called Add which implies addition, but in some cases it implements concatenation instead, and the trait documentation doesn’t really state anything about the trait semantics.

So sure, tying operator overloading with semantics is a possible direction that a programming language might go, but doing so without concisely stating the semantic meaning (e.g. can I rely on associativity when using Add to constrain a T ?) and then given them different meanings depending on the type, and giving some operators meanings and others no meaning, that would be a pretty weird design direction to choose if you ask me. So I am very skeptical that any of this was deliberate, but if it was, I’d like to know why it was decided to go this way.


#23

looks more like concatenation than addition to me.

I’d argue that the design error is the overloading of Addition for concatenation, not that operators are tied to semantics.

Imo,. push_str is the better solution for that use case and I’d be happy about a depreciation of this Add implementation.


#24

I’d argue that the design error is the overloading of Addition for concatenation, not that operators are tied to semantics.

How so? I think that the current String implementation of Add makes sense for someone that understands string addition as string concatenation. Also, because the semantics of the Add trait semantics are so loosely specified, this implementation does not violate any pre-conditions of Add, so it is valid. It feels that String is lying a little (*), yet many types implementing Add do, so this single deprecation wouldn’t really solve much.

Currently, Add is to loosely specified to be a useful generic constraint. This wouldn’t be a problem if this was the intent, but the fact that it comes with a bit of semantics requirements on its implementations suggests otherwise. At the same time it is too strongly specified to allow types for which + makes sense to actually implement it without kind of having to lie.

We could constraint Add to mean addition in such a way that it is useful as a generic constraint, but then there would be a lot of implementations of Add that would become invalid. For example, we could specify it to denote a monoid, yet that would still allow String to implement it for concatenation, and that would not allow floats to implement because floating-point addition is not associative. And there lies the root of the problem: from every day life and math we are used to use the same operators to denote completely different things that work under completely different rules (e.g. + for integer and floating-point addition) depending on context, that is, the types involved.

So I think that Add should just have been Plus, and be used to overload +, period, without any semantics added to this. This would be pretty upfront that using Plus as a generic constraint is pretty much meaning less. Each concrete implementation can then add whatever semantics they want, and if someone wants to add and use semantics generically they should be using a different trait.

And Add is simple. PartialEq/PartialOrd tie the comparison operators with some semantics that are currently not fully specified, and in very inflexible ways: e.g. < must return bool (which breaks for types for which < returns something else), types can only implement one PartialOrd (meaning that floats, which always have had two different partial orders must choose and there is no clear way to order them using the other relation), etc.

@leonardo 's comments criticized C++, but in C++ orderings are a property of relations between types, not of types themselves, which means that floats implement two different ordering relations, and that’s it. Operator overloading of relations in C++20 is sadly more sound, useful, and convenient, than in Rust.


(*) mainly because concatenation is an operation on its own, and + is used for String concatenation, but + for integers means something else (instead of integer concatenation, which is also a thing).


#25

If + gets deprecated I’ll ask ~ and ~= to be used, see below :slight_smile:

I was writing about the widely criticized C++ << used for I/O.

I prefer D language for this, it uses ~ and ~= for string concatenation and string appending, because I prefer + to be used for commutative operations. (but old Rust used tilde for something else).


#26

But its not actually string concatenation. Its „copy the content of a view into a string to the back of a string buffer, possibly reallocation it“

That may be nitpicky (and i dont want to imply that i said something new to anyone here), but is actually important in the context of a systems programming language. And there is certainly a lot of squinting involved to define an operation between two different types as a monoid operation…


#27

I guess it’s a… monoid action? Is that a thing? Like a group action, but for monoids?

I’m going to call it a thing. It’s a thing now.


Edit: erp, nope, it’s not even that, because you can’t add &str + &str either


#28

I was wondering whether the language/libs teams feel that it’s worth spending some time on fixing some of these warts? Especially in the case of names in the standard library, it’s relatively straightforward to have some slow deprecation schedule (introduce new name in 2018 edition, warn-deprecate in 2021 edition, remove in 2024 edition) that improves cohesion in the long run at the cost of some churn. I tend to be optimistic about cost/benefit on these sorts of things, but that may not be widely shared.


#29

Thanks for having a look and sharing your thoughts!

So if I understood you correctly, you propose to use () for indexing in addition to method calls and [] for generics?

There are multiple things at play here:

No language in existence has ever managed to make generics with <> work without painful hacks and workarounds. For some examples see

  • Rust’s ::<>,
  • Java’s syntax inconsistencies like foo.<Bar>(...) and confusing stuff like new <Integer>MyClass<String>("a", 3), or
  • C#'s approach of trying to parse stuff with unlimited look-ahead and then going back and rewriting the parse tree if the assumption turned out to be wrong.

If there was a way, I’m sure people would have found it by now. <> is “familiar”, but keep in mind that it was pretty much the least painful choice when C++ was created. I think a language designed from the ground up should not settle with “least painful” if there are better options.

I think it’s not a good proposal, as [] is almost universally used for indexing and slicing in other languages, and having buffer(10) will be quite confusing, as most of programmers will think about functions, not indexing.

Indexing can pretty much considered to be a function that takes some number and returns an element. (Similar with slicing.)

There is no useful distinction here, considering that [] does different things on [1,2,3] and &[1,2,3] to begin with, and with Index and IndexMut it’s not even true that [] is used for indexing or slicing – it can do whatever the author came up with.

On the other hand you have an example of a language that uses [] for generics and the gained consistency is staggering. It has none of the issues mentioned above. I can’t even remember anyone arguing that going back to <> would be a good idea – it just works, and the rule of "[] is for types, () is for values" is extremely intuitive for beginners.

Given that all the evidence is out there, I think it’s fair to say that [] for generics is superior on every measure except “similarity with C++ design decisions”.

If you are talking about self-less methods, then I kinda see the point, but I like the coherence which makes Foo.foo() and Foo::foo(bar) equivalent, i.e. . is a nice hint that self is implicitly used in a method call.

True, it’s a minor thing. :: just feels a bit heavy-weight considering how often it is used.

Closures could be made to look much closer to functions, but somehow aren’t.

Any concrete suggestions?

A good question! I think it’s hard to figure out the best approach, because I don’t really like the -> syntax for the result type of a function to begin with. I would have understood why -> was chosen if closures then adopted similar syntax like params -> body, but they didn’t.

The -> also moves the result type closer to the function body which (at least for me) caused some confusion when reading code. E. g. I wondered why something like Option<Foo> { something_that_returns_option() } didn’t result in something like Option<Option<Foo>> until I realized that the Option was part of the signature, not part of the body.

The unneeded naming explicitness makes code harder to write and a little to read.

I don’t think explicit naming is unneeded – especially not in a language where you can rename everything during use anyway. One of the core issues with abbreviations is that there is often only one non-abbreviated way to write things, but multiple ways to abbreviate things.

I personally don’t see a problem with it. And I would’ve passionately hated CamelCase for method and function names.

No, they could have used lowerCamelCase. One of the negative effects snake_case is that it creates additional overhead for function names with more than one word and therefore tilts the scale in favor of single-word names, even if a two- or three-word name would have been more expressive.

Choice of prefix or postfix mostly depends on how nice it will be to read. For example listed methods can be read as “iterate”, “iterate mutably” and “convert into iterator”, while iter_into() would’ve been confusing. Is it “iterate into something” or what?

I think consistency and ease of use is way more important. Many people, especially beginners rely on auto-completion to discover related methods if they encounter an unknown library, so similar functions should end up being grouped together.

And would’ve been a disaster for those who writes code. It’s quite important to see documentation when you work with source code, and no, IDE is not an answer.

I don’t think it would be much of a deal, it would even make it easier to have two windows side-by-side, one with the code you are currently working with, and one with the documentation (which you then could also read nicely formatted in a browser or some markdown reader because it’s just some bog-standard markdown).

Can you elaborate?

Commenting a method without explicitly also commenting the doc comment causes the compiler to error out, because it can’t find the declaration to which that doc comment belongs.

How special syntax results in “abusion”?

Because macros require special syntax at the call-site (!) there is pretty much zero encouragement for macro authors to design macros in a way that minimizes/eliminates surprises. They can just say "but you knew what you got yourself into because of the !" and carry on.

I would have preferred it if the behavior of macros would have been the responsibility of the macro author, with the general rule “if the user needs to know that something is a macro, your macro is wrong”. The current situation just punishes reasonable macro authors that write macros with predictable behavior at the expense of authors that want to abuse the hell out of it.

I don’t see a connection between “over-used”

The lack of var-args (I get it, every language designer hates var-args) just led to plenty of macros that do nothing more that emulate var-args with macros (look no further than vec!). I think this is not a good situation to be in.

How do you propose to implement it instead? Don’t forget about formatting string checks.

At the very least I would have expected that println! and format! would have allowed to directly refer to the interpolated values: println!("{name} is {height} tall") instead of println!("{} is {} tall", height, name). The current API is very inergonomic and error-prone.

Other languages even combine this with formatting instructions and it works well, e. g. println(f"{name%s} is {height%2.2f} tall")

Hope this clears some things up!


#30

In my experience a strategy of “deprecate early, remove/fix promptly” has generally the lowest mid-to-long-term cost associated with it.

Of course promising compatibility forever sounds good and may also help adoption, but it’s not something that is sustainable in the long term.

In general whenever you ask “is this bad enough to deprecate/remove/fix” you should go ahead right now and do it, because the cost/benefit equation will only get worse over time as more code gets written depending on broken stuff.

I’d really like to see a strategy laid out that not only works on some tiny examples, but also works well in the worst case of “every single source file uses feature X, and here is our approach to migrate people safely and with a minimum of manual intervention to the better feature Y”.

I hope Rust devs can come up with some proposal that really enables people to fix/remove things in major releases of Rust, because the current speed at which things are added to the language is not sustainable in the long term.

The current state of Java is a very good example where things turned rapidly from “nothing gets ever removed” to “we remove things at a break-neck speed, because we simply can’t ship some new features with all the existing dead weight” after version 8. Hopefully Rust can come up with a plan that avoids such a scenario.


#31

That’d be redundant… std::io::IoResult. I myself use it like io::Result, which I find clean.


#32

agreed… I found it glaring in my early days of Rust


#33

interesting thought


#34

Actually, closure syntax already uses ->. It doesn’t demarcate the body, but rather, it allows one to supply an optional type annotation:

let func = || -> i32 { 4 };

No, they could have used lowerCamelCase. One of the negative effects snake_case is that it creates additional overhead for function names with more than one word and therefore tilts the scale in favor of single-word names, even if a two- or three-word name would have been more expressive.

Yeuch. For the purposes of feeling “heavy,” I consider camel case to be many times more fearsome, because for certain letter combinations, every time I type them there’s a chance of it coming out DoubleUPpercase, which is one of the most brutally annoying mistakes to have to fix.

(Ironically, this almost never happens to me with underscores, even though they require shift on an American keyboard, because their location on the top row slows me down just enough to ensure that I release shift. I suspect that somebody with an Italian layout might have some choice words for me…)

I would have preferred it if the behavior of macros would have been the responsibility of the macro author, with the general rule “if the user needs to know that something is a macro, your macro is wrong”. The current situation just punishes reasonable macro authors that write macros with predictable behavior at the expense of authors that want to abuse the hell out of it.

IMO, “predictable behavior” is an impossibly high standard for the vast majority of macros. Quite honestly, I feel that even the standard library itself violates the principle of least surprise in a number of its macros by implicitly borrowing the arguments (see format_args!, write!…). I’d hate to think that any function could do this! (or worse, that any function call might lazily evaluate its arguments!)

Macros masquerading as regular functions just ain’t cool.

$ echo hi
hi
$ echo hi >stdout.log 2>stderr.log
$ time echo hi >stdout.log 2>stderr.log

real    0m0.000s
user    0m0.000s
sys     0m0.000s
$ wtf?

At the very least I would have expected that println! and format! would have allowed to directly refer to the interpolated values: println!("{name} is {height} tall") instead of println!("{} is {} tall", height, name). The current API is very inergonomic and error-prone.

Python finally got this just in the last year, and using it for the first time felt like Christmas morning. Once you’ve written in a language with proper string interpolation, you can’t understand how you ever lived without it.


#35

It’s amazing how it cuts down on complexity for beginners.

Instead of relegating generics to this special “this is advanced we skip it until later”, it allows a simple, 1 minute introduction for beginners along the lines of:

Rust has a mandatory parameter list for values, and a optional parameter list for types. Sometimes the types can be derived from values, and the optional parameter list can be omitted.

[] would also make generics more consistent: you wouldn’t have to distinguish generics in types and expressions with different syntax anymore, because ::<> could go away.


#36

Also take a look at D language, it uses () for type/const arguments, and !() for instantiation (plus a special rule of just the bang when there’s just one argument).


#37

It’s probably my least favorite approach to working around the issue that [] has been used up by other things. :disappointed:


#38

Why don’t you like the D syntax solution? It’s sufficiently clean. My least favourite approach here is the C++ template syntax.

Perhaps you like Scala syntax a lot, but I don’t like Scala much, and I don’t think it’s a good idea to conflate function calling with array access. Even if in mathematics you can think of them as the same thing, in a system language, that often has to access memory at low level, it’s good to have a syntax that clearly tells apart the two things.

Lot of the supposed “Rust warts” in this thread aren’t warts in my opinion (despite Rust has some warts, I write posts about the problems since Rust V.1.0 and there are many things I didn’t write about yet because I don’t think I have enough political power to change anything). Each one of them will need discussions, the good ones will need long discussion threads, and often the change will not be regarded as important enough to break compatibility.


#39

There are other ways to get rid of turbofish besides changing off of <>, like using a different syntax for function calls and variable access. Rust is able to avoid the turbofish in types because the parser knows the difference between types and expressions and types can’t have comparisons, so if calls and variables had different syntax, it would allow turbofish to be avoided in all cases, because function calls can’t have comparisons and variable access can’t have generics.

But that would also be super-breaking, and add noise to the common cases, and turbofish can be avoided most of the time using inference anyway.

The turbofish is still totally a wart, though.


#40

Why don’t you like the D syntax solution? It’s sufficiently clean.

D’s syntax is inconsistent between the declaration-site and the use-site (T vs. !T) and has scoping issues due to Algol-style syntax (T t) instead of Pascal-style syntax (t: T).

You can see the latter here:

T add(T)(T lhs, T rhs) {
    return lhs + rhs;
}

The result type T comes before it is actually declared. See Ceylon for a similar issue:

class Id<T>() {
  // Does the result type T refer to the class' <T> in scope,
  // or to the method's <T> that comes after it?
  T id<T>(T val) { ... }
}

Java tried to avoid exactly this with its approach:

<T> T id(T value) { ... }

which is also kinda bad, because declaration- and call-sites (foo.<Bar>(...)) cause confusion for users.

C# decided to go D’s/Ceylon’s route with T id<T>(T value) { ... }, having similar issues as those two.

Kotlin got it right originally with

fun id<T>(value: T): T { ... }

but they reverted back to

fun <T> id(value: T): T { ... }

due to an unfortunate design choice where their extension method syntax interacted badly when used with generic receivers:

fun R.id<R, T>(value: T): T { ... } // ugh, type parameters sandwiched between to uses

I don’t think it’s a good idea to conflate function calling with array access.

But that’s exactly what Rust does with its Index and IndexMut traits!

in a system language, that often has to access memory at low level, it’s good to have a syntax that clearly tells apart the two things

Then I fear Rust isn’t that language. :disappointed:

Rust kinda lives in the worst of both worlds, in which [] has no meaning you can rely on, but is used anyway, so [] cannot be used for better generics.

often the change will not be regarded as important enough to break compatibility


That’s why I’d love to hear some statement on how large changes can be between major versions of Rust. It would be a disappointment if they were comparable to changes in minor versions. If you have a hard limit on the size of fixes, broken things just start piling up until the language collapses under its own complexity.