Let's talk about incremental compilation, linking, and tradition


#1

Hi there! I haven’t made much of a name for myself in the Rust world yet, so I hope it’s appropriate for me to bring up this matter. My experience with Rust so far has been wonderful (I’m using it for osdev), but one thing sticks out to me in particular - how Rust code is built.

The norm with other languages that run on llvm (C and C++ being the obvious ones) is to use headers to describe your API and data structures, and then implement them elsewhere. This has several significant advantages. Among them:

  • You can compile a single source file into an unlinked object
    • It follows that you can link them all together later and only build modified files (incremental compilation)
  • The API is very clearly defined through header files
    • Alternate implementations of the same API are easy to build and use as a result
    • You don’t have to go reading through the code to learn the API
    • You don’t have to have shared libraries installed to compile against them
  • Interop with other languages is easy
    • C and C++ and Assembly all get along smashingly without any extra work
    • Mixed language projects have no additional effort required to maintain

I realize that we may be a bit too entrenched to revisit this, but it concerns me enough to bring it up. With my own project that’s been going against the grain on this issue, I’ve found that it’s entirely feasible to go the header route (or something hacky that resembles it, in my case). There are many benefits to describing your API without forcing the implementation to accompany it, and I can’t think of anything but drawbacks for the opposite position. Can someone shed light on why headers aren’t involved, and if not, is it too late to fix this?


#2

One drawback of headers is information duplication - you have function prototypes in both header and implementation files, and when prototypes change, you have to change two places instead of one.

I don’t claim that it’s enough to justify this particular decision - just outlining the tradeoff.


#3

This isn’t really true at all. You can write header files for (extern’d) Rust functions which C/C++ can call, and you can generate prototypes for C functions from C header files using rust-bindgen (or manually). There is no requirement that Rust use header files for this to be possible. And in order to have perfect interop with header files, Rust would have to use a superset of C syntax for them natively, which is not possible at all.

I don’t see how you can even find this useful - C++ header files are completely unusable from C. You have to either only expose a C-friendly interface in the header file, or have two sets of header files. And you kind of miss the point of using C++ if you’re always dropping back down to the C subset at every corner. The same is true of Rust.

The second point is easily mitigated by simple tools like tagbar, speedbar, and doxygen. The first point is a legitimate issue in incrementally building Rust projects, but header files are not a good solution.

I cannot name any languages besides C/C++ which use header files on llvm. The majority of languages I’ve seen which compile to llvm are implementations of high level languages which have full module systems.


#4

With respect to incremental compilation, it would make a lot of sense if the compiler extracted the information that is normally found in header files, i.e. the signatures of things, and stored those somewhere. Using this information, each function could be translated independently and thus incrementally and in parallel. There are a few things that need to know about the whole crate (e.g. “coherence”), and the resolve pass for things that are referenced in interfaces/signatures would have to be re-run in order to know what needs to be re-compiled, but those things are relatively light-weight compared to other compiler passes, such as type-checking/inference, “trans”, and especially LLVM’s codegen. If I had a few months of spare time on my hands, I’d give this approach a try right now :smile:


#5

I think the concern with duplication (and a number of similar concerns) can be addressed by having rustc generate headers. Thoughts?


#6

If the idea is for the compiler to read only the headers of imported modules, then you have a C+±like mess: any generic or (unless you want to give up on non-LTO operation) inline-worthy functions would have to be stuck in the header, and you have constant tension between dumping stuff in there to take advantage of these things and leaving it in the source file to keep things prettier (and increase compilation speed, but no reason to think /that/ would be recreated).

Meanwhile, it’s not actually necessary to do this to get incremental compilation. On the contrary, a superior method to C’s is to have the compiler manage it and maintain dependencies at a finer grain than per-file: thus only one function in a large source file may need to be recompiled; (critically) modifications in a file containing API or structure definitions need only force recompilation of code that relies on the particular items changed, unlike the situation with C++ where touching some common header often means recompiling the entire project; and modifications to generic functions need not cause recompilation of dependencies unless they were actually inlined into them (since this won’t happen at -O0, this is a nice benefit when prioritizing compilation speed over all else). Oh, and since this is basically equivalent to LTO, you get better inlining (i.e. across source files) without needing to redo codegen from scratch every time, like normal LTO.

If you do this, use of header files would have negligible benefit to compilation speed. Rust does not currently have anything of the sort, but I heard someone was going to work on incremental compilation, which I hope is something along the lines of the above…

If header files are to be consumed chiefly by humans, as documentation and/or to more clearly visualize what API is being exposed, that does not apply. However, I’m not sure how much advantage they have over Javadoc/librustdoc-like generated documentation. (I can definitely get behind keeping it in the editor rather than needing to use a slow web browser. But this doesn’t need to be part of the language.)


#7

Note also that the C++ proposals for modules - both Daveed Vandevoorde’s original one and Doug Gregor’s current one - drop header files in favour of a much more Rust-like approach. Going the other way would be an odd move.

To your “Alternate implementations of the same API are easy to build and use as a result” and “You don’t have to go reading through the code to learn the API” points, where are traits falling short for you?


#8

I was a C programmer, and for the last couple of months I’ve been living from writing in D.

Implementation of D modules have a bunch of issues (see this ever-lasting bug https://issues.dlang.org/show_bug.cgi?id=314), but it is still miles in front of from C headers both from points of compile times, maintainability, etc. Please don’t go back.

Also, note that D supports “interface files”:

When an import declaration is processed in a D source file, the compiler searches for the D source file corresponding to the import, and processes that source file to extract the information needed from it. Alternatively, the compiler can instead look for a corresponding D interface file.

A D interface file contains only what an import of the module needs, rather than the whole implementation of that module. The advantages of using a D interface file for imports rather than a D source file are:

  • D interface files are often significantly smaller and much faster to process than the corresponding D source file.
  • They can be used to hide the source code, for example, one can ship an object code library along with D interface filesrather than the complete source code.

http://dlang.org/dmd-linux.html#interface-files


#9

I feel like a change like this does not affect the public use of Rust, so it does not need to be in for 1.0.

If you come up with a concrete proposal, I would love to hear you. If you have a sound argument and win people over with hard evidence of better overall software development, Im sure more people would back you.

For now though, you sound like “if we bring back header files, things will be better!!” which sounds like malarky. Don’t get me wrong, I follow your work and know that you have a good basis for this knowledge, but you need hard evidence to win over this crowd.

Did that make sense? On mobile and formatting is harder.


#10

auto generated header equivalents would be fine.

C++ has a stupid design where the definition can’t be parsed until its seen the declaration in the header, with slight syntax changes like ‘defaults can only be in the header’, it’s utterly infuriating that such an amazingly poweful piece of software can have such a stupid (almost seemingly deliberate) problem in it … if only they made some syntax additions (which only affect parsing) that could be fixed


#11

Also there is the possibility to think about something like Ocaml Modules system(maybe without the parametric feature).

In this module you just expose an interface which is implemented in the ml file. It give also the possibility to make unit of compilation.

http://caml.inria.fr/pub/docs/manual-ocaml/moduleexamples.html https://realworldocaml.org/v1/en/html/first-class-modules.html


#12

There is a big meandering discussion of ML modules here: https://github.com/rust-lang/rfcs/issues/493.

I have a cargo package containing only types and traits, https://github.com/Ericson2314/rust-net/tree/master/data_link/interface/src — clearly this would be better as some sort of sig / header.

On the other hand, caching signatures between builds can be used to speed up recompilation.

Therefore headers are needed both for caching and expressiveness. So there should be two different types (e.g. same syntax but different file extension), to distinguish between the two use cases.