Any RFC for Units of Measure?

You just agreed that you don't need magic numbers to give types in a type list a total order so I don't know why you keep re-stating that magic numbers are required. What did I miss?

For example, an alternative to magic numbers would be to just use a type predicate to specify the total order (e.g. Less<A,B>::Bool). IMO the focus here should be on how specifying a total order affects composability.

Looks to me like without these numbers, s*m/s != m, which is a pretty big deal.

That's equivalent to simplifying Mul< Mul< T<1>, L<1> >, T<-1> > down to L<1>. While sorting types can be helpful, I don't see how sorting, and therefore, a total order, is necessary to perform these simplifications. Sorting might help on reducing the algorithmic complexity or improving the efficiency of the simplification, but that's about it. What am I missing?

What am I missing?

Well, maybe I'm the one missing something, but I had the impression that these numbers were key to deciding equality between types. The alternative being manually specifying something like your Less<A, B>::Bool, which doesn't really scale up, either.

I’ve been working on a (yet-unpublished) crate called sigh (as in SI pronounced as a word). It does more or less what you propose at the type level, and also accounts for SI prefixes. For example, it’s possible to add X cm to Y m and get the correct result. What it can’t do yet is composite units (i.e. m/s, mAh etc), as I haven’t quite figured out how to do that yet.

If anyone is interested, I could open up and publish the (unfinished!) code to make room for collaboration. The main reason I haven’t done so yet is because the project is not finished.

I think it’s better to treat the two issues as separate. It would be first worth answering (in the prior art section after discussing a bit what Boost.Units does) whether we need a total order between units for something at all. What would that buy us? What does that buy Boost.Units? Does this total order need to be global (which as you say won’t scale) to buy us something, or is it enough for it to be system local? And finally, depending on what it buys us, and how much it costs, then explain the rationale of whether we want it or not and why.

I think that independently of the implementation chosen, if we need a total order, and this order must be global to be useful, then that’s bad news. Adding a new base unit would mean having to specify its order with respect to the system somehow, but crates do not know about the base units specified by other crates, so I really don’t know how such a global total order could be implemented in a way that scales.

As to the specific way in which one can implement total ordering of base units, I’d say it would be better to just ignore the specific implementations until the other questions are properly resolved. I’ll read the F# unit publications these days, since maybe the questions I am posing are dumb if one has read those (but the RFC should not assume that everybody has).

The part I’m interested in is composing units, so I think that this is either orthogonal or complementary :slight_smile:

Fair enough. I believe that the order is one of the conditions to being able to compare m*s and s*m, to realize that m*s/m is m, etc. If I'm right, that makes it a blocker.

Well, F# transparently defines the order as follows:

  • type variables come first;
  • concrete types come later;
  • between types of the same kind (variables vs. concrete), sort by lexicographical order of the fully resolved type name.

In Rust, at compiler-level, I believe that we can somehow specify the order between concrete types with a #[derive(Something)], or possibly by piggy-backing on TypeId. For type variables, well, I'm pretty sure that there is already an ordering on them in the compiler.

Now, I need to think of how much we actually need type variables there. I can think of cases in which they are necessary in F# but less so in Rust, due to the different way in which operators are implemented. It might be that we don't need the full power of type variables here.

Good point.

A * B and B * A must give the same type to have an ergonomic library, so either one adds a concept of “unordered tuples” to the language, or one uses a total order to decide whether they should both be (A, B) or (B, A) (depending on whether A <= B or A > B).

The total order needs to be across all crates if you want to multiply units from different crates, which can either by accomplished by compiler support, or by giving each unit a type-level number which contains a type-level encoding of the crate name.

The ordering of type variables needs to be delayed until they are resolved, and then is just the ordering on their concrete values. Resolving the ordering relation on unresolved type variables obviously doesn’t work at all.

So, I’m currently running a few experiments.

It looks like we could achieve something roughly comparable to Boost (including error messages that no human being is meant to understand), without the need for magic constants or a magic order between types, once we have specialization, if specialization is sufficient to implement

trait IfEqThenElse<T,U,True,False> {
  type Type;
}
struct IfEqThenElseImpl;
// `else` case
impl<T,U,True,False> IfEqThenElse<T, U, True, False> for IfEqThenEsleImpl {
  type Type = False;
}
// `then` case
impl<T,True,False> IfEqThenElse<T, T, True, False> for IfEqThenEsleImpl {
  type Type = True;
}

That’s assuming that we eventually do get specialization, that it works as I understand it, that it is powerful enough to implement this operator and that we can find our way around unused type args, etc.

Now, given the amount of type-level hackery required to actually implement units of measure using this technique, I believe that this is a poor implementation of units of measure. More precisely:

  • code is hard to read, hence hard to trust;
  • torturing the type inference algorithm is a recipe for consuming lots of time and memory and ending up exhausting the recursive search limit;
  • no human being will be able to make use of the error messages.

To me, this confirms that we should investigate a type system-level solution, by opposition to a type-level solution. I see the following ways to go forward:

  1. Proceed with the ongoing RFC, attempting to propose the smallest change to the type system that will be powerful enough to encode units of measure without the above problems.
  2. Experiment with a solution based on a library and a linter implementing a minimal type system dedicated to this problem, which would be executed after the main type system.

I believe that 1. would eventually bring the largest benefit, but if it possible to decompose the problem sufficiently to make it fit in 2., this would the best way to determine experimentally whether it’s a good idea in the first place.

Agreed on the general idea. Note that the equivalence relation / normalization is a bit more complicated than this, as we need to somehow simplify A * B * C * B^-1 into A * C.

Agreed. As I mentioned somewhere else, we can probably piggy-back on the TypeId mechanism already in place in Rust.

If I understand what you have in mind, this means that we cannot simplify α * α ^-1 (where α is a type variable). It is possible that this will make some systems impossible to solve. I am also pretty sure that if we attempt to unify α * α ^-1 * kg = kg, we will end up with a maximally generic (i.e. any substitution will work) value for α, so if we wait until we only have concrete types, we won't be able to simplify away this specific α, which will be awkward, possibly problematic.

This doesn't really fit with Rust. Rust crates should be compiled in isolation, and often in parallel. This requires a global order in the whole program to be fixed before anything can be compiled because this order affects type checking.

Mmmmh… Good point. I think we can get away with it, though.

Did you try this or is this just an idea? I am doubtful that specialization will be able to be used for this sort of logic programming (especially to constrain equality of types), because it seems incompatible with rust’s type-checking model and it sounds to me like it could create similar soundness holes to e.g. the specialized impl impl<T> Trait for (T, T). (see niko’s blog post on always applicable impls to read about these soundness holes)

Looking at the tracking issue, it is clear that associated types can be specialized, but there is still some sort of limitation I don’t quite understand; something about writing projections on these types (i.e. <Self as Trait>::AssocTy).

I am not well-educated on these matters, though. I just have the generally vague feeling that “it probably won’t work.”

If you’re talking of IfEqThenElse, no, it’s a guess.

What are the scenarios where user defined units are really useful?

Its not meant as snark, but i have a really hard time imagining use cases that are not served by SI units (plus shorthands).

Imperial should never be used in an engineering context, and if its needed for user facing stuff it can be converted at the application boundaries, like we do with funny unicode representations, but base our internals on utf-8.

Currencies are probably out of scope of any general unit lib anyway because of special rounding rules and context dependent conversions.

Having a fixed set of base units would simplify the whole endavour immensly, and i dont see practical downsides. Disclaimer: Everytime i wanted units in my programs it was in the context of process control or physics stuff, so my view on that matter probably is a little bit narrow.

2 Likes

Well, SI + shorthands would certainly be the main use cases (I’m assuming that you include both Watts and nanometers and parsecs as shorthands, for instance). However, I can think of many other use cases:

  • I don’t see any reason to dismiss imperials, or nauticals, or other exotic measures out of hand. Yes, in an engineering context, they might make no sense, but not everything is engineering, and even if they are restricted to boundaries, they need to be handled properly.
  • Even in physics, not all units are SI. Think, for instance, of logarithmic/exponential units (pH, half-life & co), or radians vs. degrees, or angular speed vs. linear speed, etc.
  • I don’t see any reason to dismiss currencies, either. Yes, users of the system would probably require fixed-point arithmetics with specific rounding rules, but that’s ok with me, we’re not trying to specify representation or conversion, only to prevent people from confusing units.
  • Now, think of statistics: fertility = children * women ^ -1, cases of flu per inhabitent per year, …
  • Think of games: XP * monster ^ -1…
  • Think of everything related to display or printing (dots per inch) or page layout (multiples of line height, multiples of space width, multiples of screen size, etc.)
  • Think of profiling: instructions per second, allocations per second, microseconds per context-switch, etc.

I really think that, if Rust gains these measures, we’ll find ourselves using them in really many applications, without even thinking about it :slight_smile:

6 Likes

Think of everything related to display or printing (dots per inch) or page layout (multiples of line height, multiples of space width, multiples of screen size, etc.)

and a big one, pixels by direction. I don't know how many bugs I could have saved if 10px down had a different unit than 10px right.

1 Like

Ok, I have a very, very early proof of concept here.

It consists in:

  • a module, which defines units of measures and their basic operations;
  • example definitions of the four SI base units;
  • a type-checker, implemented as a driver for rustc.

It’s missing lots of syntactic sugar.

The type-checker correctly accepts:

// Trivial case
fn get_speed(distance: Measure<f64, Meter>, duration: Measure<f64, Second>) -> Measure<f64, PMul<Meter, PInv<Second>>> {
    return (distance / duration).unify();
}

Normally, unify() should be part of the syntactic sugar. There may be a way to hide it without syntactic sugar, but I haven’t found yet.

The following is also accepted:

// Less trivial case, as we need to convert 1/ Second * Meter => Meter * Second
fn get_speed_2(distance: Measure<f64, Meter>, duration: Measure<f64, Second>) -> Measure<f64, PMul<Meter, PInv<Second>>> {
    return ((PDimensionless::new(1.) / duration) * distance ).unify();
}

// Not terribly hard, but it shows that we can still use `fold` & co.
fn get_average<U: Unit>(number: usize) -> Measure<f64, U> {
    let total = (1..number)
        .map(|x| x as f64)
        .map(U::new)
        .fold(U::new(0.), std::ops::Add::add);
    total / (number as f64)
}
/// We can even simplify type variables.
trait Distance: Unit {}
trait Duration: Unit {}

fn get_speed_generic<A: Distance, B: Duration>(distance: Measure<f64, A>, duration: Measure<f64, B>) -> Measure<f64, PMul<A, PInv<B>>> {
    return (distance / duration).unify();
}

And it correctly rejects

/// No, Kilometer is not the same as Meter. Ugly error message for the time being.
struct Kilometer;
impl Unit for Kilometer {}
fn get_speed_bad(distance: Measure<f64, Kilometer>, duration: Measure<f64, Second>) -> Measure<f64, PMul<Meter, PInv<Second>>> {
    return ((PDimensionless::new(1.) / duration) * distance ).unify();
}

Still lots to do, but so far, so good :slight_smile:

2 Likes

Same one, but with readable error messages :slight_smile:

error: Cannot resolve the following units of measures:
  --> examples/simple.rs:42:1
   |
42 | / fn get_speed_bad(distance: Measure<f64, Kilometer>, duration: Measure<f64, Second>) -> Measure<f64, Mul<Meter, Inv<Second>>> {
43 | |     return ((Dimensionless::new(1.) / duration) * distance ).unify();
   | |            --------------------------------------------------------- in this unification
44 | | }
   | |_^ While examining this function
   |
   = note: expected unit of measure: `Kilometer`
              found unit of measure: `yaiouom::si::Meter`

error: aborting due to previous error

I just published the first part of my experiment as a crate. I’ll try and publish the linter proof-of-concept within the next few days.

Then, if anybody is interested, we can get serious about RFC.

3 Likes

Wow. The idea of statically enforcing these units outside the type system is genius. This is quite possibly the most promising idea I have ever seen for units that are actually usable in real code.

(p.s. if something is meant to be glob imported, please don’t give it an innocuous name like Mul that matches something else that people frequently import :wink: )