Obviously there are safety issues, like u8
/ u16
overflow on 32bit or 64bit CPU.
I mean compilation types issues. Obviously it hit assert on runtime,
but until you run code you have no idea that for mistake you use u8
or u16
type instead of usize
.
I don't see how this works. If there's a conversion between two integer types of different width, one of the directions necessarily involves losing information. Furthermore, I was referring to the scenario where not even a conversion is necessary for a semantic error (so whether you are actually losing information with the conversion doesn't even matter, and consequently the upcast being lossless doesn't help): if you use u32
as an index or counter or length or whatever, and you hand the indexing/counting operation something that is bigger than 4G elements, you won't be able to access the end of the elements (in the case of indexing) or you'll overflow (in the case of counting).
But my point is exactly that it's not noise. You need the conversion because you are doing something fallible. Hiding it might be more convenient to write but it's certainly more error prone and harder to debug.
The burden is not in the types or the conversion itself. The burden is inherent as it is caused by the kind of impedance mismatch whereby you are trying to do something platform-specific in a more generalized way, or vice versa. Any "solution" that hides the need for extra attention and thought in these cases introduces sloppiness.
Yes, but this ignores situations when you are supposed to use a wider integer for either correctness or generality (with the latter, I again refer to the case when the use of u32
instead of usize
prevents 64-bit platforms for handling data structures larger than 232). In my definition, that is a bug (at best a design error, at worst an actual loss of data), which is currently prevented by a type error and would slip under the radar with the introduction of implicit widening.
First, we’re talking past each other. Your case is about using u64
for files, while I was speaking about using u32
or smaller for things that are known at design time to never ever be big.
So let’s look at them separately:
size > 32-bit
If I’m working with files, I’m supposed to use u64
instead of usize
, and in general there’s little need for implicit widening.
However, if a file format uses 32-bit lengths or offsets (and plenty still do), I’d still need to convert them to u64
for seek, and usize
for read. Suddenly there are legitimately 3 different types involved. All fine and perfectly correct with implicit widening, and noisy with casts.
size <= 32-bit
That’s the case where I’d like implicit widening.
There are many values in programs that are not related to files, and are never even close to being as large as 4 billion. It may be a number of currently open program windows, number of wheels on a car, number of occupied nodes in a fixed-width b-tree.
As soon as you involve such quantity in indexing or length comparison, Rust insists on it being usize
. There is nothing technically wrong in it being usize
, apart from wasting bytes and larger arithmetic instructions on values that are never as large.
So if I decide to store the value in memory in the smallest type it needs, Rust makes me litter the code with as usize
. This doesn’t make anything more correct, doesn’t add useful information.
But it is important only for cases when file big enough for architecture, like > 4GB for 32bit. So it is not common case. Why you can not write code like this:
struct BigFile(Vec<u8>);
impl std::ops::Index<u64> for BigFile {
type Output = u8;
fn index(&self, index: u64) -> &u8 {
unimplemented!();
}
}
and do proper thing only in one place?
But all of them are specific, for one case uN right index, for another it is not, and it's usage is mistake.
If it is boring to convert from uN to usize forth and back, why not implement proper std::ops::Index
?
Why make this decision on language level?
I don't see any way wasting bytes, you can hold index inside any type and convert forth and back into usize only during indexing.
larger arithmetic instructions on values that are never as large
this is valid, but for some cases it would be extra instruction for work with small types, if the most instruction of CPU works with machine words only.
I consider this a non-issue, because arr[usize::MAX]
or let idx = -1; arr[idx as usize]
will also assert at runtime. So one needs to get the value correct when indexing regardless of the type, so I don't see any harm in allowing additional types.
This is the same as how the following compiles, though it obviously panics at runtime:
let shift = std::u128::MAX;
let _ = 1_u8 << shift;
(I agree it would be bad if extending the domain forced it to go from total to non-total, but that absolutely doesn't happen for indexing. If anything, the opposite can happen: if I have a [T; 256]
lookup table, I know that a u8
is in-bounds --- that's why the table is the size it is --- but if I have to index by usize I can get the value out of bounds.)
It absolutely different things. usize
machine word, when you want on 64 bit machine address something above 64bit address space you do something very very specific. It doesn't happens in the most programs.
When because of some mistake you use u8
for indexing overflow happens with very high probability.
I’ve always viewed as
casts as gross but a “cost of doing business”, similar to how items do not participate in type inference. A lot of this flavor of sugar design boils down to compression of pain: the usual case becomes less painful but you wind up with finicky corner-cases.
While I don’t want to make this thread about implicit conversions in the large, I’ll use an example from C++ that I find annoying. std::optional<T>
is constructed by a non-explicit
ctor, and implements operator bool
. So, unit tests that test things returning optionals look like this:
EXPECT_EQ(MyFoo(), kExpected);
unless MyFoo()
types at std::optional<bool>
, in which case you must write
EXPECT_EQ(MyFoo(), std::make_optional(kExpected));
to avoid the implicit converstion of MyFoo()
to bool via operator bool
. The general case is painless but now you’ve got a bit of a trap in the pathological case.
My worry is that implicit conversions (even innocent ones like zero/sign-extending integers) do not play ball with inference and can lead to unpleasant surprises.
For example: if K: Index<u64>
and K: Index<usize>
, and further k: K, i: u32
, what expectation should I have for k[i]
?
- Pick one by some kind of integer hierarchy (C++ does this in some cases via its complicated promotion rules).
- Pick neither, assert compiler error (C++ does this in some cases, such as
char
into eithershort
orlong
).
The former is something I really don’t want because Rust is already complicated and I think C++'s promotion rules are generally agreed to be too complicated. Also, a library addition could cause promotion to chose a different overload… though this is a problem with any overload mechanism, and not necessarily a con.
The latter is far easier to reason about, but is a form of negative reasoning that makes it possible for pure additions to break existing code.
Mind, this doesn’t mean we need to keep as
. I think that providing Index<uN>
implementations for the standard containers is not insane, nor is providing widening Into
conversions and fancy saturating
and wrapping
conversions (though Into<usize>
is definitely not a good idea).
Agreed. "Index into that array with this u16
/u32
/..." is a perfectly well-defined operation. It is partial, just like normal indexing. It could be implemented as
fn get<I>(&self, i: I) -> Option<&T> where usize: TryFrom<I> {
self.get_with_usize(usize::try_from(i)?)
}
If either the cast or the indexing fails, we get None
, otherwise we get Some
. There is nothing lossy about this.
Is there any situation in which this method would not do what is intended, would lead to subtle bugs or would generally not be desirable? Off the top of my head, I don't see any.
The only time this would be problematic is when doing math on small types that would've otherwise not type checked. A somewhat reasonable example I thought of: a matrix type that uses (u8, u8)
for indexing into a manually flattened array. The correct way to write this would be (untested)
struct Matrix<T> {
raw: Box<[T]>,
width: u8,
height: u8,
}
impl<T> Index<(u8, u8)> for Matrix<T> {
fn index(ix: (u8, u8)) -> &T {
let row: usize = ix.0.into();
let col: usize = ix.1.into();
&self.raw[col * self.width + row]
}
}
and the version with unintended overflow is
impl<T> Index<(u8, u8)> for Matrix<T> {
fn index((row, col): (u8, u8)) -> &T {
&self.raw[col * self.width + row]
}
}
Interestingly, the version written with unsafe would be less likely to fall into this trap as it would use unchecked indexing or pointer offsetting after checking bounds itself, both of which would continue to be _size
-only. (A real version would probably store a *mut T
raw box with width * height
as the slice length and use pointer offsetting to index after checking its semantic bounds. (And the "cursed" space saving optimization is to stuff the matrix dimensions in the unused bits of the pointer.))
If you’re on a relatively normal platform where u32
fits into usize
, then you shouldn’t have to deal with an Option
if you write arr[some_u32]
.
For get
, though, that seems fine.
I had imagined the same conversion behaviour as Ralf, and would expect any [idx]
to work like get(idx).unwrap()
and I'd also expect automatic conversion to avoid bugs. Right now an access with these types needs to do the conversion itself and if it fails to consider the differing pointer and usize
sizes this can lead to subtle unexpect non-failure. On all 32-bit targets, this does not panic out of 'luck':
vec![0; 1][(1_u64 << 32) as usize]; // idx `0` on 32-bits
If one were to code on 64-bit where everything 'works fine'—and considering that coercion is an easier conversion, TryFrom
is also quite new and more verbose—it might be tempting to simply insert an as usize
instead of properly converting and it seems to work. But then on a 32-bit deploy suddenly everything goes wrong.
If that were to already be engrained in indexing, the behaviour would be consistent.
// Some(_) everywhere
vec![0; 1].get(0_u64);
// None everywhere
vec![0; 1].get(1_u64 << 32)
// Would panic everywhere
vec![0; 1][1_u64 << 32];
vec![0; 1]
doesn't have an element with index 1
, only with index 0
.
Oops, that should have been .get(0_u64)
. Thanks for pointing it out.
Until one tries to port that code to a (partially-defined but not yet available) RISC-V RV128I system where usize = 128
. It is naïve to presume that Rust will be replaced by another language before a computer exists where usize > 64
.
I’m not sure where you are getting? It reads like you don’t agree with what I said but I also agree with your conclusion, so I just want to make sure I’ve been understood. My point is that treating u32
and u64
as indices for slices would make code more portable if done right. The whole example of as usize
going wrong is written from the perspective of a naiive programmer, to show that current ergonomic choices already may make some code less portable. Of course this becomes even more problematic with future compatibility when thinking about usize = u128
as well.
I’m all for making sure that we don’t assume the size of usize. I don’t think adding some extra impls of Index and IndexMut will lead to that, though. We can handle Index with u8, u16, u32, u64, and u128.
Would it also make sense to allow signed indices for the same reasons?
Allowing signed indices seems much less reasonable to me; some earlier part of the program should have either declared something unsigned or done an appropriate check for negative numbers.
A common argument against indexing using other unsigned types was also that some earlier part of the program should have used usize
instead – I’m just trying to make sure that we’re drawing the line at the right place here. I agree with you that indexing using unsigned integers is more reasonable than indexing using signed integers.
Indices smaller than usize
that are stored in efficient data structures (e.g., in an ECS) naturally give rise to index expressions of uN
types smaller than usize
. Virtually every programmer encounters such situations frequently. For me the primary thrust of this thread was widening such expressions to usize
at the point where they are applied as index expressions, but not earlier. That's the use of the unchecked as usize
conversion that troubles so many of us.