Pre-RFC: first-class support for compile-to-Rust languages


#1

There’s an ongoing meta discussion about accepting new language features. “Syntax sugar” (features that make Rust more pleasant to write, but don’t add anything that couldn’t be done with uglier syntax) seem to be the most common and the most controversial kind.

JavaScript syntax had its catalyst in CoffeeScript. CoffeeScript was free to experiment with frivolous features that couldn’t have been added to JavaScript. Eventually JS distilled and adopted some of the most successful features (e.g. arrow functions, for of), and the other questionable ideas remained CoffeeScript’s problem, not JavaScript’s.

I think it would be fantastic if Rust had support for languages compiled (“transpiled”) to Rust, so that the community could independently experiment with new syntax and new features without burdening the core language:

  • Like JavaScript experiments with future syntax using Babel, Rust could similarly test try {}, async, try fn, borrow[..#], delegate * and whatever neat idea is floated without having it in the nightly compiler. Crates using such syntax could even be used with the stable compiler and (after compilation to Rust 1.x) published on crates.io. This way new syntax could get a lot more real-world use before being evaluated for inclusion in the core language.

  • There could be CoffeeRust. Python like indentation? struct {.field = value} syntax? Automatic reference counting? Implicit modules? Regex literals? Features that are breaking/unacceptable for Rust proper could still have their fringe life in languages on top of Rust.

  • And finally DSLs, e.g. my new favourite template engine ructe are useful, but a bit unpleasant to work with when compiler errors refer to autogenerated Rust code rather than non-Rust source.


The MVP for this could be as simple as support for equivalent of C’s #line macro, and the rest could be hacked with build.rs.


Forking Rust
#2

A small subset of this is already possible, of course, by using macros.

I’m not saying writing DSLs with macros is easy, especially with macro-by-example. Rather, with statement-proc-macro, it’s very possible to write super-Rust transpilers. And with the full procmacro API allowing span manipulation, it’s possible (though not exactly easy) to get manageable errors out as well.

That said, a #line-like would be valuable for more holistic spit-out-a-whole-.rs pipelines.

Another interesting far-future idea is compile-to-MIR languages. In the JVM or .NET worlds, languages get interop by all running on the same VM which supports seamless interop between anything that targets the VM. GraalVM may even be able to extend that to further languages.

In a way, it’s a higher-level version of C foreign functions, that respect higher-level concepts like structs, generics, and other language specifics. If MIR ever comes to a point where it can serve as a common target for other languages, Rust could see a similar proliferation of niche languages as the JVM has seen.

There’s more work that would have to be involved, as GCd runtimes like the JVM or .NET have more insurance against bad bytecode eating everyone’s laundry, but I think it could be possible in the future.


#3

An interface generating TokenStream instead of text files with #line would be nicer indeed.

However, the current procedural macros aren’t great for it, since they must be explicitly invoked from a regular Rust file.

Since you wouldn’t want to create a boilerplate .rs file for every .non-rs file, the macro would have to parse and emit the whole project in one go. That’s not too bad if you want to replace the module system:

lib.rs:

#![feature(proc_macro)]
extern crate rust_coffee;
use rust_coffee::start;
start!();

but if your goal was to make “like Rust, but with a new feature” language, then that would be a bit annoying. You’d probably want ability for rustc to invoke a proc_macro on every file it is about to parse.


Fortifying the process against feature bloat
#4

I think Idris’s “Elaborator Reflection” and the associated paper Elaborator Reflection: Extending Idris in Idris could be interesting. To make Rust more effective as a host language, you need some way to interact with the type checker I think (but this will expose more of compiler internals of course, which we may not be comfortable with at this stage).


#5

Here’s another idea utilizing proc-macro plumbing: allow lexing an external file that attaches Span information so that you can generate code with Spans from external files. This precludes anything that can’t work based off of Rust’s lexer (such as whitespace-significant grammars), but it would work for other grammar extensions that it doesn’t choke on.

You could handle the problem of boilerplate .rs files by just having a build.rs step to create them.

// pseudocode
fn main() -> Result<(), io::Error> {
    for file in WalkDirs().invert.filter_ext(".rs.coffee") {
        let rs = file.replace_ext(".rs");
        writeln!(file, r#"rust_coffee!("{}")"#, file.name)?;
    }
}

Then the rust_coffee! macro would lex the given .rs.coffee with Rust’s lexer, therefore getting Span information, and by preserving that, allow Rust’s errors to point at the .rs.coffee source rather than being completely opaque hiding behind the macro.

Keeping track of those spans and other careful transformations could hopefully maintain workable compiler errors from Rust, and any added semantics would have to be transpiler-checked anyway.

I’d expect any Rust++ transpiler to include a tool to autogenerate the glue code, which would likely be the lib.rs/main.rs to load into the transpiled code plus a build.rs to handle gluing in other source files.

Actually, if we can get a nightly-experimental (or even fork-experimental) way to turn an external file into a TokenStream I’d be interested in helping create an experimental replacement-driven Rust++ framework.


#6

Many times I’ve imagined some magic proc macro that just.

#[bindgen(libpng)]
mod libpng_sys;

or even more aggressively,

#[transpile(ecmascript, vue.js)]
mod vuejs;

#[transpile(sql, schema.sql)]
mod schema;

and got code transpiled or AOT’ed and generate a Rust interface…


#7

#line is a gross hack. I think Source Map standard makes much more sense. It is used by JavaScript tools, but standard itself has no JavaScript specific part.


#8

I think this is a great idea. Tastes, as well as needs, are different between populations and it is hard to find a language that can unify them all. The libs team has a policy that they first check whether a feature can be implemented inside an external crate and only if it seems important enough they add it to the libraries. I think it would be great if there’d be a similar policy for the language proper. Everyone would profit: those who want the sugar, as they can now experiment with even more sugar, and those who don’t like the sugar, as they only have to deal with it if they encounter a codebase that uses that sugar. There could be subsets that are best suited for initial learners. We wouldn’t have to integrate the dialectal ratchet into the language itself.

I also like the idea that build.rs is responsible for the compile-to-rust process. This way, sources uploaded to crates.io remain editable and crates.io isn’t just a repository of compiled artifacts like npm is.


#9

Generating boilerplate for a proc_macro almost works. If you wrap .rs.coffee file in a proc macro in an .rs copy of the file, rustc will point to the right line, but in a wrong file. It’ll show .rs instead of .rs.coffee.

It’s close, but for example I’m clicking file paths in the terminal to open the file/line in my editor, so rustc pointing to .rs instead of .rs.coffee would open a wrong file for me.

Ability to create custom SourceFile/Span objects would be needed for the proc macro to correct the paths.


#10

So based on feedback so far, #line is too ugly. proc_macro is promising (and can help avoid exposing temporary .rs files). Custom attributes look neat. Spans seem to be the Rust way to do proper source maps.

To put it all together, it could be something like this:

 #[transpile(c)] // some way to select which transpiler you want
 // Transpilers may need config options. Options could be added as extra attributes.
 #[transpile(c, define="HAS_STDINT=1", include_path="../includes")] 
 #[path = "foo.c"] // the standard Rust attr can be used to select the path. 
 // If no path is set, the transpiler should have ability to infer it (modname + extension)
 mod foo;

Such attribute would invoke a special proc_macro that produces a TokenStream. However, the input to the proc macro would not be the usual TokenStream, but some higher-level object that would expose the module path, custom attributes and offer ability to get either the standard TokenStream (for Rust++–like languages) or raw source code (for everything else).


#11

I’m very interested in this. At my work we are currently working on an external DSL for describing binary data formats (in a similar vein to this paper), and we are going to be compiling this to Rust, and possibly other languages in the future too. This will be type checked using our own type system, and ideally we’d be able to catch all type errors before our Rust code gets to rustc. Still would be handy to have some kind of source-mapping available to us - I think @nrc had an RFC on this several years ago?

Another example would be LALRPOP - currently it doesn’t do any type checking on the Rust snippets, and type errors in those snippets can be a pain to associate with the offending line in the .lalrpop file.


#12

Ah, here’s the RFC on sourcemaps: Supporting code generators with source maps and multiple source directories

It was by @erickt, not @nrc! My bad!