Implicit widening, polymorphic indexing, and similar ideas


#65

I find this idea very likable, although I guess it will be controversial. (Especially considering that it requires Output of unary plus to be a generic parameter and not associated type, like with the rest of the operators). Unary + is also sometimes used in C for readability

int shifts[] = {-3, -2, -1, 0, +1, +2, +3};

and in C++ it can be overloaded (boost spirit is a notable example). So, in theory unary plus can be useful for Rust even without being a “widening operator”.


#66

My other thought was introducing a unary ^ for it. It might be weird to have its unary function be unrelated to its binary function, but perhaps less so than using unary +. Introducing sugar should be backward-compatible, though, so we can wait to see if a sugar would even be desirable.

EDIT: I guess it really wouldn’t be any more weird than having * and & have very different meanings in their unary and binary forms.


#67

Some statistics of various conversions.


#68

I’d love to see even a limited version of implicit widening.

When interacting with C libraries I have to do a lot of casting between c_uint and usize and this is unsafe due to potential overflows (and lack of overflow safety in Rust is another big problem :()


#69

One of the things that has been considered in other places is that Rust should be usable in 16 bit environments making a 32 bit lower bound on usize problematic.


#70

So, I made some experiment to revive this discussion.

My goal was to replace one of the most popular numeric conversions in the Rust codebase - as usize - with three semantically different actions: widening (lossless conversion), truncation (potentially lossy conversion) and sign conversion (potentially lossy) and to analyse the result.
This was done with three simple traits - Widen, Truncate and ConvertSign:

trait Widen<Target>: Sized {
    // Lossless numeric conversion (equivalent to operator `as`)
    fn widen(self) -> Target;
}
trait Truncate<Target>: Sized {
    // Numeric truncation (equivalent to operator `as`)
    fn truncate(self) -> Target;
}
trait ConvertSign: Sized {
    type TargetSigned;
    type TargetUnsigned;
    // 2's complement sign conversion or no-op for signed numbers (equivalent to operator `as`)
    fn as_signed(self) -> Self::TargetSigned;
    // 2's complement sign conversion or no-op for unsigned numbers (equivalent to operator `as`)
    fn as_unsigned(self) -> Self::TargetUnsigned;
}

My branch with results can be found here:


and the diff between the branch and the upstream is

Some notes:

  1. Widen and Truncate really have to represent “weak” widening and truncation an not “strict”, i.e. both Widen<T> and Truncate<T> are implemented for the type T itself. This is for portability reasons, both between 64-bit and 32-bit machines and different operating systems. (The first couple of commits in my branch show how it looks when WidenStrict and WidenWeak are separate traits.)
    For the same reasons ConvertSign is a single trait and not two - AsSigned and AsUnsigned - as_unsigned()(as_signed()) should be supported for unsigned(signed) types too (but be a no-op, obviously).

  2. Widen and Truncate are identical to std::convert::Into in their form (see below).

Conclusions:

  1. In general I like the result - “harmless” conversions like widening, that you can pretty much throw anywhere without much thinking (but I’m still against making them implicit), are clearly separated from “suspicious” conversions, that require some thinking and analysis.
    Lossless conversions are much more common than lossy - the statistics in my experiment is definitely skewed here, because usize is a relatively wide type, but remember that as usize/isize/u64/i64 are the most popular conversions and they tend to be widening, and conversions to narrower types are much rarer.
    It means that programmer’s attention can be re-targeted from conversions with operator as in general to only their small but potentially more problematic portion.
    Moreover, with dedicated semantically loaded methods for certain conversions the all-purpose operator as itself would be used rarer and could be considered “raw and low level” and requiring more attention.

However, there are some (solvable) problems, diminishing the usefulness of the traits today:

  1. Default type parameters don’t drive type inference. It means that you can’t do the next thing:

    let a: u16 = 10; let b = c[a.widen()]; and you have to give a type hint to widen() somehow.
    (In my branch the type hint is given with an additional trait method widen_ which is clearly a hack and shouldn’t be there in the final design).
    With improved type inference based on default type parameters type hints for widen() would almost never be needed.

  2. Type ascription is not implemented. Even without improved type inference you could supply a target type with type ascription, but there’s no easy and short way to do it without it. (Into has the same problem currently.)

    let a: u16 = 10; let b = c[a.widen(): usize];

  3. widen() is better and safer way to perform conversions, but to compete with core language facilities like as it should be really convenient to use.
    As a minimum the Widen trait should live in the prelude, besides that the method call .widen() is quite long to type (although not as long as as usize) and may be shortened somehow.

  4. While widen is almost unquestionable, the other methods can raise some questions - for example, how lossy conversions should be treated, with silent truncation (like operator as), with panic (like arithmetic operations) or the methods should return Option.

All these notes, conclusions and problems led me to one alternative: don’t use a separate trait Widen for lossless numeric conversions, but use Into instead.

Pros:

  1. Into is already in the prelude and .into() is shorter than .widen()
  2. Into will probably be implemented for integer conversions anyway, because they are perfectly valid safe conversions, and Into is idiomatic for such conversions, and they can possibly benefit from generics using Into.
  3. There’s a non-zero chance, that Into and its friends from std::convert being such basic traits will get some short and convenient language sugar some day in the future.

Cons:

  1. .into() is not as semantically clear as .widen(), but it may be treated just as “safe and lossless type adjustment” without the widening aspect

So, here’s some practical actions that I propose:

  1. Implement Into/From for lossless numeric conversions
  2. Postpone implementing the other mentioned traits (Truncate, ConvertSign) for some time, the “raw and low level” operator as still can be used for them.

[Pre-RFC] Expressive integer conversions (`.truncate()` and `.widen()`)
`lossless_as` operator
#71

Strong +1 on explicitly indicating whether you’re widening or truncating (or sign converting) in general. as is a footgun.


#72

I think there should be a way to do all of these. One with silent truncation (like wrapping operations), one that only panics in debug builds (like standard operations), and one that returns an Option (like checked operations). If anyone wants to always panic, they can use the checked version and unwrap, like with other operations.


#73

In my experiment most of the potentially lossy conversions should report an error on actual truncation, except for several cases related to hashes, random numbers and serialization, i.e. the situation with conversions is very similar to arithmetic operations. This is why I think (debug-only?) panicking should be the default approach and analogs of wrapping_ops and checked_ops would cover special cases.


#74

I feel that implicit widening should be added to Rust. Unlike integer overflows and narrowing conversions, widening conversions can never change the mathematical value of a variable. As such, they are widely recognized as safe and are implemented by widely used languages including C, C++, Java, and C#. Not only that, but the commonly used gcc and clang compilers do not even have options to warn about widening conversions, despite the presence of hundreds of optional warnings including ones for narrowing conversions. Think about what that says about widening conversions. Keep in mind that both these compilers are open source and frequently have new warnings contributed.

The lack of widening conversions makes some logic which is trivial to implement in common languages complex to implement in Rust. A particularly egregious class of examples involves comparing a usize to some fixed-width integer, such as u32 or u64. In such scenarios, it is unknown which of the two types is larger than the other. In C this is of no consequence because the smaller type will be implicitly widened to the larger type. But in Rust the programmer has to carefully write the code to handle both cases. Here are a couple examples:

Check if twice the value of a u64 fits into a usize. In C:

return x <= SIZE_MAX / 2;

In Rust:

x as usize as u64 == x && x as usize <= usize::max_value() / 2

Or, check if a usize value is less than a u32 constant. In C:

return x < LIMIT;

In Rust:

if x as u32 as usize == x {
	// Compare as u32 (for usize smaller than u32)
	(x as u32) < LIMIT
} else {
	// Compare as usize (for usize larger than u32)
	x < LIMIT as usize
}

As you can see, the Rust examples are far more complicated and error-prone. It should not be that way. Note that this type of limit checking is often used in security-sensitive code.

(Of course, if someone can provide simpler Rust code that is guaranteed to work on all Rust implementations, then please go ahead. Keep in mind that the size of a usize is machine-dependent.)

There have been a number of objections raised to implicit widening which I feel are misguided. These can be briefly summarized and rebuted as follows:

1.) “The problem should be solved by polymorphic indexing instead.” Actually, polymorphic indexing only solves a subset of the problems. The examples I gave above had nothing to do with indexing.

2.) “Integer conversions in C are confusing and unsafe.” While this premise is already questionable, it really only applies to “lossy” conversions where the mathematical value changes as a result of the conversion. I am only suggesting lossless, “widening” conversions.

3.) “Implicit widening conversions can ‘hide’ integer overflow bugs.” While this is arguably true in some cases, such bugs are actually caused by integer overflows, not by widening conversions. As such, these bugs can and do occur regardless of implicit widening conversions. Other means such as static analysis and runtime checking are vastly more useful for detecting integer overflow bugs. Helpfully, the latter has already been implemented in Rust; see this RFC: https://github.com/rust-lang/rfcs/blob/master/text/0560-integer-overflow.md.

4.) “For explicitness and type safety, it should be required to explicitly cast between integer types, just like any other type.” Actually, when working with integer types of unknown size, such as Rust’s usize, it may be impossible to know ahead of time which of the two types to cast to without making assumptions about the implementation. In addition, the fact that two variables might have different integer types is of no consequence when performing mathematical operations such as comparisons unless there is possibility of the mathematical result changing as a result of a type conversion. Only in that case is correctness is in question and it is reasonable to require an explicit cast. Finally, requiring explicit casts for widening conversions introduces more use of the ‘as’ operator which can also perform narrowing conversions with the exact same syntax, thereby making to harder to find casts which actually result in changes to mathematical results.

In summary, I think it is clear that implicit widening conversions would increase the usability of Rust and make it easier to write correct programs.

I posted to this thread as this was the only substantive discussion on the topic I could find. If anyone happens to know of a better place to post, then please let me know. I am aware I could post a full RFC and pull request, but I’m not sure I have time for that right now.


#75

I think your comment about (3) does not completely address this scenario that risks hiding integer overflow / wraparound bugs.

pub fn delete_from(offset: u64);

let index: u32;
let size: u32;

delete_from(index * size);

The problem with this code is that with simple widening, it will compute u32 * u32 with an u32 result, then widen that result to u64, which will pass the wrong number to the function delete_from (we can imagine could result in something bad like loss of data, a memory safety issue etc).

Debug assertions for wraparound in unsigned multiplication only help you when the buggy cases are likely to show up during debug use & testing, it does not protect you from the corner cases that may happen later in a release build.


#76
x <= (usize::max_value() as u64) / 2

It’s extremely unlikely that usize will ever be larger than u64, and I expect a lot of Rust code will break if it is, regardless of implicit widening. Some equivalent of uintmax_t seems like an appropriate solution if this is a real concern.

(x as u64) < (LIMIT as u64)

This should be optimized to an appropriate smaller comparison if possible

Which is not to say your argument is invalid. But what happens if instead of comparing, you want to do arithmetic?

let size1: usize = ...;
let size2: u32 = ...; // maybe read from some binary format
let total_size = size1 + size2;

With implicit widening, this will work, but the type of total_size will depend on the size of usize. For one thing, this raises this question of what the result type should be if usize is the same size as u32. Suppose the answer is usize (either because of some rule that it always trumps, or because it was on the left): it will also be usize if usize is larger, so if the programmer only tests their code on 32-bit+ machines, the type of total_size will always be usize. So they may well write:

fn add(size1: usize, size2: u32) -> usize {
    size1 + size2
}

Which will suddenly fail to compile if usize is smaller. (Also, while compilation errors are the common case, there is some chance of generics being used in a way that don’t cause an error but leave the program doing the wrong thing.)

Now, if the programmer wasn’t thinking of small usize then current Rust is arguably worse, because they will instead write size1 + (size2 as usize), which will just do the wrong thing. (I think making as silently truncate was a mistake.) But if they were, it’s easier to do the right thing with integer conversions if you have to make an explicit choice.

Of course, you can make similar mistakes that break 32-bit if you only test on 64-bit, which may impact the Rust community more in practice.

…In general, I feel like there has to be a more principled way to deal with unknown-size integer types than just letting random types in the program vary depending on the machine and hoping things compile. Like the way Rust generics are more principled than C++ templates. I’m not sure that more principled way exists in Rust today, though, or what it could be.


#77

This discussion lacks to take in account of Value Range Analysis. In many situations you can know at compile-time that a value of a larger integer type can always fit inside the type of a smaller integer type. D language has a simplified version of this, that helps avoid some dangerous casts, it’s a very useful feature. In Rust if you don’t want implicit conversions, then allowing conversions with into() in such cases seems acceptable.


#78

In that case you can simply use the panicking conversion functions and hope that the optimizer figures it out.

Value Range Analysis together with optionally allowing function calls depending on the analysis would be totally cool feature, but needs a lot of thought to get right AND usable. Also the panicking conversion function + optimizer is possible right now.


#79

I believe there was discussion at some point of implementing comparisons for heterogeneous integer types, which would solve your examples without the dangers of implicit type conversion. Additionally, it would work in more cases. E.g., neither i64 nor u64 can can be converted to the other without potential truncation, but a heterogeneous comparison operator would allow them to be correctly compared.


#80

Has there been any progress on this issue (implicit widening / polymorphic indexing)?

I like tupshin’s argument (near the top of this thread) that implicit conversions should be platform dependent. In general I dislike writing code with usize and not knowing how big this is.

Can I make the suggestion that there be a min and max supported usize, configurable per project (perhaps defaulting to 32-64bit for compatibility reasons, or 64bit only), and that allowed loss-less conversions (whether by Into or implicit) are defined based on these restrictions? The advantages as I see it are:

  • programmers can use usize while knowing for sure what size (or possible sizes) it has
  • Into (and possibly implicit conversions) can be implemented appropriately (e.g. why can’t .into() convert from usize to u64 today?)
  • writing software which explicitly only supports 64-bit, or 32-bit, or maybe even 16-bit architectures is possible (note that increasing size is not always safe, e.g. if an array index gets serialised to a fixed size binary stream)
  • users/porters know which architectures existing software is designed to work on, and by changing the configured size range can get compiler error messages wherever something needs to be addressed (e.g. in one project I convert lengths to u64 for binary output, but because .into() doesn’t support usize → u64 I use x as u64 which would not give an error message, should x be larger than 64 bits)

#81

Admittedly not having read the entire thread, the main “gotcha” for me is that it’s unergonomic to throw in casts everywhere and it clutters the code. It also makes it harder to distinguish from cass where a cast is losing information.

It would be much nicer to avoid writing either “as” or “into” if the conversion is possible without data loss. I’ll go read the thread now to see some arguments against this, but just wanted to throw in my impressions as someone who uses the language casually and mainly for numerical-heavy code.


#82

So I just wrote this code in a benchmark, those casts do hurt:

fn u8_runner<F: Fn(u8) -> u8>(bench: &mut Bencher, f: F) {
    let mut vs: [u8; std::u8::MAX as usize] = [0; std::u8::MAX as usize];
    for i in 0..u8::max_value() {  //.. i know
        vs[i as usize] = i;
    }
    bench.iter(|| {
        for mut v in vs.iter_mut() {
            *v = bencher::black_box(f(bencher::black_box(*v)));
        }
    })
}

#83

Doing this with any amount of usefulness requires a rather complicated and fragile analysis, probably with an abstract interpreter. I really don’t think that would be a good idea for Rust’s type system to have. Do you have any link to something about how D does this? I’m not finding anything on their website or in DMD that indicates that this is done.


#84

D has various parts that are not easy to find unless you’re a D programmer for some time. D Value Range Analysis works only inside a single expression, a simple example (here 155 is OK, 156 is a compile-time error):

ubyte foo1(in uint x) {
    return x % 101 + 155; // OK
}
ubyte foo2(in uint x) {
    return x % 101 + 156; // Error
}
void main() {}

See the compiler error: https://dpaste.dzfl.pl/3767fcbe05e2

The kind of Value Range Analysis I’d like for Rust is similar, but I’d like the range value to be carried between different expressions. To simplify the analysis the value range should not be carried outside a function (for this it’s better to use contrats or static analysis tools external to the compiler), and should be computed only for constants (let but not let mut). When the flow analysis becomes too much long inside very large functions with a very high complexity I think it’s OK stop the computation of a value range after an amount of time.

In Rust the value range should not be used as in D to perform implicit type conversions (like in the D example above that contains an implicit but lossless u32->u8 conversion in the foo1 function) but allow code like:

fn foo1(x: u32) -> u8 {
    u8::from(x % 101 + 155)
}
fn main() {}

The from perform a lossless explicit cast.