Average function for primitives

dalance · February 12, 2021, 3:22pm

The average of two numbers can be calculated by (a + b) / 2 naively, but if a and b is i32::MAX, the result will be overflowed. Therefore we should avoid overflow carefully like min(a, b) + abs(a / 2 - b / 2).

So I think this implementation may be useful if this is provided through impl of each primitive integer types.

I searched this topic in this forum and RFCs, but I couldn't find. Is this previously proposed? How about this idea?

dalance · February 12, 2021, 3:44pm

The above implementation is wrong. I should care about a % 2 and b % 2 in some cases. About the implementation, this thread may be helpful.

SkiFire13 · February 12, 2021, 3:58pm

If a is i32::MIN then this also underflows

elidupree · February 12, 2021, 4:03pm

Yeah, it's tricky to define this to have consistent rounding behavior and avoid overflow in all cases. IIRC, I've sometimes implemented it like this:

(a >> 1).wrapping_add(b >> 1).wrapping_add(a & b & 0x1)

This rounds towards neginf; the third term is needed to correct for the case where both the first terms round towards neginf separately.

dalance · February 12, 2021, 4:07pm

I think the difficulty of correct rounding implementation may become a motivation to provide this function through standard library.

Nokel81 · February 12, 2021, 4:41pm

Given that rust concretely defines what >> should be (instead of not like in C (iirc)) then I agree that this would be good to have

CAD97 · February 12, 2021, 6:31pm

Prior art: C++20 std::midpoint from P0811r3.

I think std::lerp is over-specified to the point where it's not useful for it's intended purpose (games/gfx) since handling edge cases is too expensive, but I do think that std::midpoint's specification describes the problem well and is generally what you want from a midpoint/mean/average.

There might be a way to do it branchless, but P0811r3 provides the example implementation of (roughly translated into pseudo Rust by me)

const fn midpoint<Int>(a: Int, b: Int) -> Int {
    if a > b {
        a.wrapping_add(a.wrapping_sub(b)/2)
    } else {
        a.wrapping_sub(b.wrapping_sub(a)/2)
    }
}

const fn midpoint<Float>(a: Float, b: Float) -> Float {
    if a.is_normal() && b.is_normal() {
        a/2 + b/2
    } else {
        (a+b)/2
    }
}

Do note that a+(b-a)/2 works no matter the sign so long as there isn't overflow. The trivial (a+b)/2 truncates towards 0; I'm not exactly sure the rounding behavior of this integer implementation is, but P0811r3 claims it's what you'd typically want from an e.g. binary search midpoint.

user16251 · March 1, 2021, 1:12am

The num crate, which is practically standard, provides this function with the Average trait. One way to compute it efficiently is with (a&b) + ((a^b) >> 1) (using arithmetic shift if the values are signed). On Intel processors you can use add and then rcr.

scottmcm · March 1, 2021, 3:21am

That's really elegant. Is there a nice way to get LLVM to emit that? I didn't have any luck using the things I know how to get rustc to emit...

user16251 · March 4, 2021, 12:29am

I don't know how to get LLVM to emit rcr, but the portable alternative (that does and, xor, shift, add) is probably better: it is correct for signed integers as well (replacing logical shift with arithmetic shift), it can be vectorized, and has the same latency as rcr on at least some microarchitectures. That being said, I've never written something where the performance was contingent on integer averages.

scottmcm · March 4, 2021, 7:47pm

I mention the LLVM incantation because needing to emit LLVM intrinsics to get optimal codegen is one of the reasons it might make sense to have a new method in core.

If the num crate can implement it reasonably without needing anything extra, then it might fall in the same category as div_rem: helpers in num that I do like, but that might not be worth having in core.

jhpratt · March 4, 2021, 9:15pm

Honestly I wish there were more methods from the num crate in core, not fewer. There are some methods I am recreating on my end because I don't want to bring in a dependency that would increase the size of my crate by a third. Having the methods in core would avoid this altogether.

user16251 · March 5, 2021, 12:16am

The rcr alternative isn't necessarily optimal because rcr executes in two cycles (when its immediate operand is 1). The and and xor can execute simultaneously on separate ports, so I think that alternative ends up having the same latency. In general I find it challenging to get LLVM to recognize opportunities to use the carry flag.

tspiteri · March 17, 2021, 5:31pm

This rounds towards minus infinity, while the C++ std::midpoint rounds towards the first operand a. That is, if b < a, then rounding should be towards plus infinity. This can be remedied as:

(a&b) + (((a^b) + (b<a) as i32) >> 1)

Edit: would overflow if a^b is !0 and b<a. Better is:

let x = a ^ b;
let up = (b < a) as i32;
(a & b) + (x >> 1) + (x & up)

user16251 · March 21, 2021, 12:48pm

Is there a compelling reason why std::midpoint(1, 4) != std::midpoint(4, 1)? Is it that signed integer division in C++ (and on plenty of hardware) rounds towards zero?

CAD97 · March 21, 2021, 4:49pm

P0811R3 provides the standard definition for std::midpoint over integral types as "Returns: Half the sum of a and b. If T is an integer type and the sum is odd, the result is rounded towards a. Remarks: No overflow occurs."

The stated reasoning in the rest of P0811R3 is that (paraphrasing heavily) rounding towards a is typically what you'd expect when doing e.g. binary search. (Side note: this also allows the caller to control the rounding.)

P0811R3 also notes that you can change the condition to a>=b or a<=b to round half integers up or down, respectively.

From my understanding, there's five options:

Round towards a
Round towards b
Round up (towards MAX)
Round down (towards MIN)
Round towards 0

Personally, I think only the first or the fifth options are really in the running to be picked. The first isn't commutative, but it has predictable and consistent behavior for ++, +-, -+, and --: bias towards the first argument. The fifth leans on the fact that integer division truncates toward zero for familiarity, and gains commutativity, but loses the consistency that midpoint(x, y) == -midpoint(-x, -y).

It's a trade-off. If always rounding toward zero can be done branchless and rounding toward a can't, that'd make me more likely to support rounding toward zero, but I still think rounding toward a is more useful. (Plus, if it inlines, you can just sort a and b on input to get the the rounding you want at no cost.)

user16251 · March 21, 2021, 5:45pm

I looked at P0118R3, but I didn't see any language justifying rounding towards a. Perhaps I missed something.

There is a sixth option: rounding towards even. It's what I would use for fixed-point arithmetic.

All of these options can be implemented in a branch-free way.

tspiteri · March 21, 2021, 6:21pm

I think the paper mentions that for floating-point numbers, it would make more sense to call this mean rather than midpoint, but it is called that to use the same overload for all types in C++. I'm now thinking it could make sense for midpoint to round towards a for applications like binary search, but mean for floating-point or fixed-point could be rounded like other operations.

CAD97 · March 21, 2021, 6:22pm

I could've sworn it specifically called out it's application to binary search; perhaps that's only for the midpoint of pointers? Anyway, the only actual remnants of the justification for the choice they made seems to be

"it aims to replace the simple expression a+(b-a)/2 regardless of type", "Returns: Half the sum of a and b. If T is an integer type and the sum is odd, the result is rounded towards a.", and (for the pointer version) "As with integers, when the midpoint lies between two pointer values the one closer to a is chosen; for the usual case of a<b, this is compatible with the usual half-open ranges by selecting a when [a,b) is [a,a+1)." and "Returns: A pointer to x[i+(j-i)/2], where the result of the division is truncated towards zero."

So that's where I recall the justification: always returning something inbounds for a half open range.

IMHO, while half-even is a decent option for floating point to integer types, it makes less sense for purely integral types, which should stick to consistent truncation semantics.

tczajka · March 22, 2021, 7:45am

With division rounding towards 0, (a+b)/2 corresponds to the fifth option, but a + (b-a)/2 corresponds to the first option (round towards a).

However (a+b).div_euclid(2) and a + (b-a).div_euclid(2) both correspond to the fourth option (round down).

BTW, I think making "/" round towards zero was a serious mistake. This has caused many bugs in C, it's virtually never what you want for negative numbers. The argument that it's more efficient because of hardware is also only partially true, for example "/ 2" can be implemented as a right shift by 1 bit only if you round down rather than round to 0. With rounding to 0 extra instructions have to be emitted.

In this case I think rounding towards a and rounding down are the only reasonable options.

Topic		Replies	Views
Unsigned integer type inference language design	24	4431	March 25, 2019
Thought: switch the default on overflow checking and provide RFC 560's scoped attribute for checked arithmetic language design	20	2366	October 31, 2021
A tale of two's complement	62	24753	March 25, 2019
(Pre-?)Pre-RFC: Range-restricting wrappers for floating-point types language design	13	3936	March 25, 2019
Pre-RFC: Change abs() to return unsigned integers libs	9	2592	March 25, 2019

Average function for primitives

Related topics