In this comment @nikomatsakis articulated that he wanted to see more motivation, use cases, blockers, … for using Rust in HPC domains.
HPC is a huge field, and the current issues are largely domain dependent. I can only talk about the current blockers for the particular domain in which I am proficient with, which is solving systems of partial differential equations. I’ve decided to split the issues into 3 categories: core issues (probably useful things outside HPC), fundamental HPC issues (useful in other domains of HPC beyond my own), and domain issues (useful within my domain), from less to more specific:
Core issues (useful in other domains beyond HPC):

Native numeric traits:
NativeFloat
forf32/64/(16)
,NativeSignedInt
fori8/16/32/64/size
,NativeUnsignedInt
foru8/16/32/64/size
,NativeInt
fori/uXX
, maybeNativeComplex
. This is not about building a hierarchy, modelling mathematical number categories, nor about supporting user definedBigXXX
types. These should be simple traits for abstracting over native/primitive machine types. These traits could drive some core language features likeconst fn
as trait methods. 
Native SIMD support: SIMD vector types/algorithms with software fallback. In particular users must be able to write
v0 = v1 * v2 + v3;
kind of code and that should generate a fused multiply add SIMD instruction (FMA) when available. This would probably drive the design of plugins and improve eDSL support in Rust. 
SIMD support on slices of primitive types: as opposed to operations on SIMD vector types, often users want to do operations in slices:
s0 = s1 * s2 + s3;
where the type ofsX
is[f32]
. This requires figuring out the best vector type to use for load/stores, and the corresponding SIMD instructions at compiletime (depending on the arch) and possibly at runtime as well (when compiling for multiple archs), and dealing with the remainder, very short slices, … properly. More plugin / eDSL work would be required. 
Bitwise manipulation instructions: like SIMD instructions but for doing bit manipulation, and higher level algorithms that work on slices of words.

Abstracting over memory layout: highperformance code needs to be able to play and switch the memory layout of the data in a localized way, without causing a rewrite of the whole application. Tools/techniques/idioms/language support for abstracting over the memory layout and be able to change it at will without major code changes would be really helpfult. For example given a
struct S
with fieldsx, y, z
and a slice ofS
,[S]
, I would like to be able to change the layout ofS
from ArrayOfStructs to StructsOfArrays to be able to benchmark the difference with as little code modification as possible. A possible way to do this could be with improved support for Dynamically Sized Types (DSTs), but Jonathan Blow’s jai language (See the DataOrientedStructures section has also an interesting take on this. 
Accelerator support: in the language. I’ll just leave it at this, no major language has this, although things are converging (e.g. C++ executors and HPX support accelerators, C++ HSA proposal is evolving fast).
Fundamental HPC issues (useful in other domains within HPC beyond my own):

Multidimensional iteration: iterating over a collection of elements with some multidimensional structure. In general these structures are graphs, and trees play a very important role in HPC. However, in a lot of cases these structures are just regular vectors/matrices/tensorlike grids. Being able to efficiently iterate over a rankN tensor with Ndifferent lengths across its components, with/without strides, with low/high (aka ascending/desceding, row/col) linear order of the elements in memory, with runtime/compiletime lengths/strides, and subtiles/matrices/views of these tensors is required for a lot of HPC applications. A couple of C++ experts from the national labs (lbl, sandia) and AMD have recently written a ISO C++ paper called polymorphic multidimensional array views with their (very comprehensive) take on this. Typelevel values is one of the C++ features they exploit to make this a zerocost abstraction. Doing this with external iterators that can still be autovectorized is hard.

Linearalgebra vocabulary types and traits: dense and sparse vector, matrix/tensor traits that allow using different implementations of the types (e.g. to store additional data), and native implementations of these types with different memory layouts (e.g. SIMD vector types for small ones, row/col order, …), with/without compiletime sizes… Typelevel values, better
const fn
support, probably in traits as well, and maybe even HKTs will be required for this. 
Serial linearalgebra algorithms: that work on the linear algebra traits, and use SIMD and/or bindings to the system BLAS libraries. Ideally one should be able to at least solve a system of equations easily on dense or sparse matrices/vectors with generic syntax, e.g.,
let x = linsolve(A, b)
, solve an eigenvalue problem, perform an SVD, QR decomposition, Givens rotations, POD, … All in serial, in shared memory, using SIMD. These are the “fundamental” linear algebra building blocks for creating more complex and parallel algorithms. 
Parallel linearalgebra algorithms: same as above but in parallel. Ideally one should be able to specify where and maybe how the work is done, e.g., using a tokio reactor, or some different executor (like a threadpool, something that uses OpenMP, a GPU…). The latest executors C++ proposal and parallel algorithms (and the HPX C++ implementation) of these allow all of this for the C++ STL algorithms. The parallel algorithms should probably return a future and be async.

Linearalgebra EDSL: At the end of the day one wants to write Matlablike code. Ideally if I write something like
let x = linsolve(A, y) + linsolve(A, z);
the matrixA
should only be “inverted” (exposition only, not actually inverted) and loaded into memory once, and the whole thing should maximize cache efficiency, do a single memory traversal through y and z and whatnot. This really requires plugins, creating an expression tree, … 
Asynchronous File I/O: the current Rust Async I/O story is purely Network I/O focused. Lots of HPC applications need to write to a single huge file (typically in parallel but from different processes at the same time, not from different threads within the same process). This takes a long time, and can be overlapped with work, so… futures to the rescue? There are some widelyused C libraries that deal with this (HDF5, NetCDF, parallelNetCDF, …) and lots of applications just write their own wrapper over MPII/O… which leads me to the next point.

A good MPI wrapper: typesafe, that uses futures, with plugins for serializing datatypes, mapping lambdas to reductions, …

A good geometry library: this depends on vector, matrices and linear algebra, but it would be great to have a traitbased very generic and fast geometry library (a CGAL on steroids). At the end of the day, every application is going to implement their own vector, triangle, surface, polygon… types with their own dataquirks and layouts, but the subset of classic primitives is pretty standard (point, quaternions, line, ray, segment, triangle, square, polygons, polylines, planes boxes, AABB, OBB, polyhedra, polysurfaces, signeddistance field…) and the operations one wants to perform are also standard (intersection, intersection test, split, overlap, union, difference, xor, centroid computation, surface computation, bounding box computation, translations, rotations, …). When dealing with large number of primitives the datastructures are also pretty standard (binary/quad/octree, point set, bounded volume hierarchy, halfedge datastructure, …). So having everything as trait, plus providing good implementation of the primitives and datastructures that one can reuse to implement their own with the extra quirks and then implement the trait for them would be awesome.

Other generic mathematical algorithms: things that build on the linear algebra stuff and are commonly needed: interpolation (linear/bilinear/trilinear, least squares, bsplines, radial basis functions…), integration (quadrature rules, trapezoid/simpson/… rules, newtons method…), projections/restrictions, krylov solvers, FFT, …

Optimization: steepest descent, quasinewton, newtonkrylov, automatic differentiation…
Domain issues (probably only useful in my domain):

Mesh datastructures for PDE solvers: structured, unstructured, block structured, hierarchical, hierarchical Cartesian, … grids, serialization for them, distributed memory support, loadbalancing…

Scientific visualization: way to interface the core geometry libraries and mesh datastructures to write to common visualization formats like VTK, create custom visualizations, write plugins for scientific visualization software that uses VTK like ParaView, Visit,…

Libraries for building PDE solvers: something like Deal.II for Finite Elements, Clawpack for Finite Volumes, … There are also tons of good python libraries as well.

Multigrid solvers…

For an overview, e.g., see the Trillinos C++ framework from Sandia. The C alternative is Argonne National Laboratory’s PetsC library. Both of them offer the fullstack, and interoperate to some degree.
One thing I realized when writing this is that as I get closer and closer to my domain, while I would appreciate if there were libraries available, as long as the fundamental libraries and abstractions are there and work, I don’t mind writing my own (since I probably want to do custom stuff anyways). But having to write the full stack of file I/O, distributed computing, linear algebra and solvers, mathematical functions, multidimensional iteration, graphs… just to be able to get started writing what I am interested in (PDE solvers) is a titanic task.
Other people doing HPC will probably have completely different needs in terms of Domain issues, but I guess that for things like Machine Learning, Bioinformatics, HFT, … at least some of the fundamental issues will also apply. If you have similar or different needs please discuss them!