Setting our vision for the 2017 cycle

In terms of FFI, long-term I do think we want better integration for objects with “foreign lifetimes”: Python objects managed via Python reference counting, JavaScript and DOM objects in a browser, objects managed by a foreign garbage collector, EFI firmware objects, and other “external” objects. I’d love to have a better solution for those than just “write a wrapper using a raw pointer”, when a large family of otherwise unrelated objects all use the same “foreign lifetime” mechanism.

However, I don’t think that should go in the 2017 roadmap. For 2017, I’d like to see exceptional support for C FFI: better library integration, better build system integration, improved cdylib support, sonames, and anything else needed to transparently put Rust anywhere C can go.

I am not suggesting that we move in a less-safe direction. This is about the frame in which we present things to the world. "Memory safety without GC" is a good description of Rust, but not a good marketing strategy. Or rather, not as good of one as we could have.

This presupposes that having no generics like Go was compatible with Rust's goals. It was not. It would be impossible to have a memory-safe systems language without generics.

This complaint is off topic anyway, because we can't remove generics from the language, not in 2017 or ever (even if it were conceptually possible and a good idea).

Rust's generics are phenomenally hard to grasp for anybody new to generics. The syntax makes progressing in a tutorial actually difficult. Staring at nested angle brackets is like staring at a foreign language expecting it to translate itself. Haskell's generics are really easy to read and therefore easier to grok

I'd also like to suggest that complaints about generics syntax are exceptionally off topic for this thread. The decision to use angle brackets was made years and years ago. It is about as set in stone as anything can possibly be. It has nothing to do with anything that is possible to do in 2017.

This way editor plugins can be be kept up to date with the language spec with no effort.

Parsing is nowhere near the most difficult, or interesting, part of IDE integration.

5 Likes

That's great! Then maybe more of the issue is communication then needing to improve the situation so that the information escapes the Rust community and makes it to the Python community.

Other than surveys, I wonder what other metrics we could use to measure this?

Yes! I two came to Rust via Python, entirely skipping C and C++, and I do not really “know what I’m doing”. I need some handholding on calling rust functions on python numpy arrays. A library would be lovely, but not strictly required. A “FFI 102” would work as well. It would be nice if that came from the python side, as there are a lot more python devs then rust devs. But if rust docs community made tutorials it would bring in a lot of interest by the python devs at my place of work.

I am pretty happy with the current state of things, except that it would be really great if debugging Rust code using Visual Studio's debugger worked approximately as well as debugging C code using that debugger does.

In Visual Studio, and probably other IDEs, it is relatively easy to create custom build rules for building Rust projects using Cargo, and it's really just a minor and infrequent inconvenience.

I think this might be true, but I think there are language features that are blocking progress on many other library and language features: const fn and intuitive support for array references. I feel there have been lots of discussions and proposals about things that were ultimately concluded "we'll just wait for const fn to solve that" but const fn seems to be getting further away rather than closer. Similarly, improvements to array reference support—e.g. slicing a [T; N] into a [T; N - 1] for some constant N > 0—seems to be blocked on the "generics parameterized by integers" feature. I think implementing these features are likely to have at least as big of an impact as implementing impl Trait return types.

6 Likes

Rust has great potential for those who want their code to be directly callable from the widest set of languages.

I focus on high-performance/on-demand image asset processing servers and libraries; the next generation of which is Imageflow, an (originally) C library. I recently ported the extern API to a Rust cdylib, some of the Ruby demo server to Rust, and wrote the a demo command-line app in Rust as well. I’m not happy with the current state of the Imageflow Rust codebase though; I’m still focused on regaining behavioral parity with the original C.

As a library author, I find Rust compelling because

  • Control flow is obvious and explicit, and it encourages correctness. It is less magical than C++, easier to learn, and has fewer footguns. Lure: fewer bugs.
  • It enforces many kinds of safety, preventing the most common categories of vulnerabilities in libraries. Lure: lower likelihood of catastrophic security vulnerabilities
  • It has no GC, and is theoretically* capable of exposing a C-compatible API that can be used from nearly every popular programming language on earth. Lure: re-use parity with C

Achieving parity with C

Robustness in low-memory conditions

Graceful handling of malloc failure is expected of C libraries and most C servers. See common justifications for abort() on malloc failure - debunked. In my previous-gen software (the server version), malloc failure simply caused that HTTP request to return a 500. There are better ways to handle this, of course, but aborting the process is not one of them.

There is no clear path to achieving this in Rust.

  • There is no reliable way to know whether an API allocates. If you have advice/ideas on creating a compiler plugin to help with this determination, let me know. Knowing which APIs and trait implementations to avoid would permit carefully coding around them, and writing replacements for stdlib collections.

  • Allocation failure nearly always causes an abort instead of a panic.

  • set_oom_handler was created to provide better abort messages, not to allow swapping out abort for panic!. stdlib may not be robust against malloc failures that result in recoverable panics (although this should be easy to verify through automated tests). Others have requested that panic-on-oom at least be possible for a collections.

Remember that malloc failure != OOM. Although you should avoid allocations during unwinding (does this happen IRL?), a malloc failure for 192MB (an uncompressed smartphone pic) is unlikely to be followed for a malloc failure for 1KB .

Coming from C, Rust’s current *alloc failure behavior seems a particularly glaring regression in correctness, particularly given its otherwise wonderful focus on robust and correct behavior. The status quo feels very contrary to Rust’s philosophy.

There are many use cases where graceful handling of malloc failure makes sense:

  • You have a work queue. Failed tasks can go to the back of the queue with a decremented retry value.
  • A web server. If a request fails, you toss that state and return a 500.
  • Any kind of scheduler or server with heuristics or resource consumption planning. If you can’t tolerate a malloc failure, you can’t have a feedback loop in your algorithm.

I began porting my C library over to Rust when catch_panic reached stable. I did this under the assumption that out-of-memory caused a panic, which turned out to be a common misconception on Reddit.

Also +1 for Vulkano’s TROUBLES.md; Imageflow will hopefully use Vulkano sometime next year.

Error reporting in production

There’s a common pattern in C libraries.

  1. Create a context. The context allocates space for an error message and stacktrace.
  2. Pass the context to every function call thereafter.
  3. Every function call returns NULL or false upon failure, but first populates the error ID, error message (which can include key variable state), and the first row of the backtrace with line number, function name, and file name. As the failure cascades, each function explicitly adds itself to the backtrace, usually via a small macro.
  4. The end user can call an API to copy the error message and backtrace to any buffer they provide.
  5. This means that rare and troublesome production bugs are not nearly so painful to debug. You always have a useful backtrace and clear error message, if, at least, you’ve been consistent in following the simple conventions required.
  6. Users of your library get very clear error messages, with backtraces, and error codes for dealing with recoverable situations.
  7. The app can log those errors and backtraces and continue to run.

Can one accomplish the same result in Rust? What about in a Rust sandwich (Rust code which both exposes an FFI and consumes another, and both expose this kind of error reporting)?

If possible, but simply undocumented, I’d be happy to help.

Vectorization & Performance

It’s unlikely that Imageflow will become 100% Rust within the next. Not necessarily for lack of effort or desire, but rather due to performance constraints (assuming other blockers disappear).

Imageflow currently depends on libjpeg-turbo and libpng, both of which use large quantities of assembler to achieve the required performance.

For Imageflow itself, GCC offers enough hints that we can get very excellent vectorization with very carefully tuned use of vectorizable C and a few SIMD intrinsics. We stop short of actually using assembler.

CLang/LLVM actually doesn’t fair as well as GCC, and produces significantly slower runtime code for Imageflow. Should I expect more from Rust?

Come to think of it, image codecs are a great way to evaluate runtime performance.

Image codecs are the perfect storm of complexity

  • Extreme focus on performance (or they’re not usable - they’re a common bottleneck)
  • Highly complex state machines
  • Malicious input
  • Massive matrices.
  • Graceful malloc failure handling.
  • Often used on embedded and low-ram devices.
  • Fleixble I/O interfaces
  • Async (resumable I/O) support. For real, in C, with setjmp.
  • Defective specifications which enable attacks.
  • Defective files you have to support … because Photoshop
  • High numbers of data dependencies
  • Complex metadata formats; a single image may contain IPTC, EXIF, and XMP (xml) metadata.
  • Aforementioned metadata includes rotation data AND color profile information, both required for processing.
  • Color profile conversions may require runtime code generation.
  • Algorithms tend not to be well optimized by compilers; assembler proliferates.
  • Branching too common for porting more than certain passes to the GPU.
  • APIs are used from all languages, and must be flexible. (There aren’t many codecs written in languages other than C/C++).
  • Often use or permit custom allocators per context.

Are image codecs something Rust intends to be good at in 2017?

Would a Rust image codec be sufficiently interesting as a real-world benchmark that its performance issues would inspire compiler improvements?

It’s 1am, so I’ll conclude with two hopes:

  • I would love to see an official document tracking Rust’s parity status for various patterns in C. Maybe this can tie into the C/C++ to Rust guide mentioned earlier.
  • I would love to see Rust improve its parity with C for the things above, particularly around malloc failures.
13 Likes

I have not 'proposed' to remove anything from Rust. I have read the book/manuals/&c. I understand the language's design. All I have said is that 1) Rust documentation is not beginner-orientated enough, that is, it bogs the reader down in meta-programming too early-on, and 2) the generics syntax is hard to grok.

I'd like to suggest that you are dismissing the feedback from a interested Rust patron in a thread that is all about, to quote the OP: "Rust should have a lower learning curve"

It saddens me that you are being so elitist :confused:

I see no reason for it not to be possible to translate your C macros to Rust. What challenges did you encounter? Could you give an example of your C code?

Rust uses LLVM for vectorization and will continue to do so for the foreseeable future. Unless your vectorization is inhibited by an aliasing problem (which I doubt, if GCC optimizes it correctly), rustc can't be any better than clang. Do you have code examples?

The generics syntax is not going to change post 1.0. Deal with it.

Also, in Rust, generics are not meta-programming - lifetime generics are fundamental to ordinary programming.

Note that cargo does not need the internet strictly, but it will automatically try to download missing packages.

You can use cargo with alternative ways of supplying dependencies, one of them being cargo vendor: GitHub - alexcrichton/cargo-vendor: Archived as subcommand is now part of Cargo itself

This is going to improve with additional tooling becoming available. Help from people with specific needs is always appreciated, because we cannot anticipate needs no one mentions.

May I suggest that there's also the possibility that @pcwalton just misunderstood the angle you were coming from?

I don't think that this was meant to be elitist. It's just that there's absolutely no possibility that generics syntax will change at this point. It doesn't matter if this would improve the learning curve, that ship has sailed.

Apart from this, angle brackets are used for generics in most mainstream languages. I fail to see how the learning curve would be improved by deviating from the most popular syntax. And personally, I don't find the parenthesis-free syntax of Haskell easier too read.

2 Likes

Something I’m interested in is a particular point that my co-speaker and I included in our rustconf talk, “the playrust classifier”, which describes the current state of Rust’s ML community. In essence, we stated that the ML/numeric ecosystem is fragmented, with many crates providing similar functionality but different APIs. For example, matrix implementations are different between crates. This limits natural interoperability between crates that are in the same domain. Interoperability is a great feature of a language like Python that, for example, builds many of its numeric libraries around the numpy array. I’d be interested to hear if others have experienced similar crate fragmentation in domains other than ML/numerics.

How do we suggest and/or standardize core data structures/objects to be used across crates that operate in the same domain? Is it a top-down process, in which the core team evangelizes particular projects? Or is it bottom-up, where it’s up to the developers of a fundamental project to gather others around it? Can rust have formal “core” projects in each technical domain that we suggest developers rally around to build on top of? How do we as a community decide that a data structure or object should be standardized? Can we do this in a way that promotes interoperability without destroying competition?

2 Likes

@aturon we talked about this today so would love to hear your thoughts on this!

I’ve started a thread on crates.io discoverability and community engagement: Crates.io Discoverability and Engagement: starting the conversation

My proposal touches briefly on the fragmentation issues brought up by @pegasos1; the idea is domain-specific “working groups” similar to Rust’s various teams.

Note that Python did not standardize on numpy early. numpy is the result of many failed older projects. Some of it’s predecessors wore written by core teams staffed by the biggest names in python including Guido van Rossum. The second attempt was Numeric but it did not serve all use cases so Numarray was released. Many packages supported both for a long time. NumPy was a N+1 atempt to salve it. While not the first, it was successful. (Note I was not involved then. This is based on interviews of the involved parties.)

TLDR. NumPy is a great example of the advantages of a unified interface. But it is not obvious how a community gets to that state quickly. Python did not get there by the top down method, despite trying. And the bottom up method sometimes takes a long time.

Are you proposing to alter the documentation, or alter the generic syntax?

Docs could be improved in some way (never enough docs!), but syntax changing is nigh impossible. Because of Rust backward compatibility, changing Rust generics syntax means Rust goes 2.x. And that is something devs have stated they will never, ever do. The syntax was agreed to a looong time ago and as @troplin said, it's based C family generics syntax (e.g. C, C#, Java...).

Note that there was discussion about, not changing, but adding a simplified syntax (Can't find the exact comments, but I think there were some straw-man proposals of any Type for universals and some Type for existentials in function signatures, a la print_area(shape : any HasArea) in Minimal `impl Trait` by Kimundi · Pull Request #1522 · rust-lang/rfcs · GitHub or https://github.com/rust-lang/rfcs/pull/1603). Sorry for the offtopic.

1 Like

It’s somewhat on topic, but the idea is that syntax can be extended in backwards compatible ways. Changing Foo<T> into Foo[T] or Foo T is not backwards compatible.