Pre-RfC: Stablized syntax extensions. Sort of

Preamble

I was thinking about plugins and serde and it occurred to me that there are three types of common plugins, with different interactions with stability and development workflow:

  • Lints. These are not necessary for compilation; but are a tool to help the developer. It would be very hard to stabilize these.
  • Internal syntax extensions: These are syntax extensions used internally to make code nice. For example, the syntax extensions defined in html5ever are for usage within the library. These could be replaced by things like Mako templates if necessary. They are not a core feature of the library; technically the generated code is all that is necessary.
  • Exported syntax extensions: These are exported by libraries to make life easier for other libraries. These are a core feature of the library. phf is an example of this. However, libraries which use phf technically only need the generated code, in most cases. For some libraries like [string-cache][1], the macros are used multiple libraries downstream too, though in the case of string-cache there are non-macro alternatives to them.

This proposal makes it possible to use libraries with internal syntax extensions (and libraries which transitively use external syntax extensions) on stable Rust builds. In other words, if the dependency graph is like:

plugin P <- library A <- library B <- binary C

And only A uses the plugin, it will be possible to compile B and C with stable Rust. but not P or A. At the same time, those wishing to use nightlies can do so without any change in experience.

Currently, we have [syntex][2] by the brilliant @erickt which offers a solution by doing expansion manually with a libsyntax clone. However, itā€™s a bit hard to set stuff up so that a library may compile with or without syntax extensions depending on the userā€™s choice, and additionally it sort of cheats semver ā€“ plugin libraries will continue to be flaky and unstable with syntex.

This (pre) RfC takes inspiration from the idea of syntex and makes it into a baked in, easy to use system which doesnā€™t cause semver problems.

RfC

Overview

At its core, we basically allow users to upload two versions of the same crate. One is a ā€œstabilizedā€ crate which has all of the syntax extensions expanded and is checked for usage of any unstable features. When someone else decides to use the library and wishes to use stable Rust, this expanded copy of the library will be downloaded instead, and the downloaded Cargo.toml will be stripped of any plugin deps.

This only applies to libraries on Crates. While the infra here could be extended to support git deps (using branches), Iā€™m limiting the scope of this RfC to Crates for now.

Detailed design

####Changes to rustc

Rustc nightly should get a -Z as-stable option (IIRC this is already sort of possible with the right cfg flags) which makes a nightly compiler behave as if it was on the stable channel. This makes feature gates hard errors.

For full flexibility, it would be nice if rustc could have a more nuanced expansion mode where only syntax extensions not defined by rustc itself are expanded. This means that macros still work, and macro stability still works. (otherwise macros with unstable innards will fail) Complex nesting might still be broken though.

An additional wishlist item would be for that expansion mode to expand file-by-file to maintain the file structure as much as possible.

Also if it would be possible to do readable gensym-ing when necessary.

Changes to Crates.io

Crates can now have a ā€œstabilizedā€ version in addition to the regular one. It is encouraged to upload ā€œstabilizedā€ versions even if your library does not use syntax extensions, so that we can add some visual indication ā€œcompiles on stable Rust!ā€ later.

ā€œstabilizedā€ crates should only be different from regular ones as far as their build process goes (one uses syntax extensions, one doesnā€™t, but the post-expansion code is the same). I think that further code fragmentation for providing ā€œstableā€ and ā€œnightlyā€ versions of a library should be done via feature flags.

Changes to cargo

cargo build and cargo publish have an optional flag --stabilize. This will recompile from scratch, and do the following things:

  • Except for libraries with plugin = true, compile everything with -Z as-stable. Fail if a dependency needs unstable Rust.
  • For all non-plugin libraries from Crates, fetch and build ā€œstabilizedā€ versions if possible. If a library doesnā€™t have a stabilized version and fails to build with -Z as-stable, bail. Itā€™s up to the library owner to publish stabilized crates. Existing stable libraries wonā€™t be affected by this since they will build fine with -Z as-stable
  • For all plugin=true crates which are direct dependencies of the crate being built or any of the path deps (i.e., deps which are part of the repo and will be in the same bundle), compile with nightly.
  • Expand the repo and all path deps with --pretty=expanded (or something better as described in the above section) using a nightly.
  • Compile these crates with -Z as-stable.
  • If we are publishing, upload the expanded version of the crates.

--stabilize is mainly an option for the publishers of crates which use syntax extensions (eg the authors of library A in the dependency chain above). Itā€™s only part of cargo build so that things can be tested locally without causing a premature publish.

If cargo build/cargo publish are called with a stable compiler or with, they should build by fetching stabilized versions whenever possible, and upload the package as a ā€œstabilizedā€ one. We can of course have a --no-stabilized option to both to opt out of this, and a --with-stabilized option for cargo build that opts in to this when the compiler is nightly. This way nightly users can compile the libraries with regular expansion info, instead of having to debug generated code when something goes wrong.

These flags are intended for users of libraries which use syntax extensions; eg the authors of library B and binary C in the chain above.

Itā€™s also worth considering if cargo publish by default should do both a nightly and stable publish when a nightly compiler is available, unless itā€™s configured or told otherwise.

What this means for users

Plugin writers

Carry on. In case of hygeine issues, macro wrappers may be necessary.

For plugins like atom!() which are intended to be used by all child libraries, provide a non-plugin interface.

Library writers

Use --stabilized whenever possible. If your crate uses syntax extensions, try to ensure that theyā€™re hygenic and publish both nightly and stabilized versions

Library/binary users

If you like nightly, use Cargo with the appropriate flags to fetch libraries full of syntax extension goodness. If you like stable, use cargo with the appropriate flags to fetch stabilized versions.

(One of these two will be the default; I donā€™t know which yet.)

Unresolved questions

  • What should the defaults for Cargo be?
  • Should/can we do this in a way that macro stability isnā€™t affected? Regular --pretty=expanded will produce code which wonā€™t compile when thereā€™s internal macro stability involved. If we can selectively expand things (and donā€™t expand external syntax extensions who have macros fed to them as input), we can cover a broad range of edge cases I think.
  • How do we handle hygeine?

Alternatives

  • Continue using syntex. Itā€™s not perfect, but it does accomplish the job pretty nicely.
  • Avoid syntax extensions till they get stabilized (which could be a while)

cc @erickt @kmcallister @eddyb @alexcrichton @sfackler @nrc
[1]: https://github.com/servo/string-cache/ [2]: https://github.com/erickt/rust-syntex

2 Likes

An interesting ideaā€¦ I do wonder how --pretty=expanded followed by -Z as-stable is going to work, given that various stdlib-provided macros expand to code that uses unstable features.

As an aside: Iā€™ve been playing with a little fork of rustc that adds ā€œextern macrosā€. Rather than use libsyntax as the interface, the macro is written as a completely independent binary, with the contents of the macro invocation fed in through stdin, and the expanded code passed out through stdout.

This works really well for simple cases. I was trying to work out how to build a framework around syntex, so from a syntex macroā€™s point of view, not much if anything changes. I kinda got stuck on the fact that you cannot know which grammar element the macroā€™s output is valid as, you have to pick one, and you only get one shot at it. This is a problem because, in order to go back through stdout, I have to serialise the produced Ast node as tokens.

I was also a bit stuck on how to trick the Codemap into producing correct source locations, although I believe thatā€™s at least theoretically tractible by modifying Syntex. :stuck_out_tongue:

When I brought all this up in IRC, no one seemed very interested, so I assume Iā€™m the only one that thinks this would be a great way to solve the stability problem: text (either as Rust source or as JSON-encoded TT structures) has no stability issues, and using independent processes avoids all linking issues. You would still use syntex in the macro, except that instead of trying to parse, expand and re-emit the source (the output of which you then import using include!), you just do it all inline.

2 Likes

So this was similar to something Iā€™d proposed at Portland. The main issue is that we have to handle an unstable AST, which you donā€™t address ā€“ this binary wonā€™t compile in the first place if the ast changes. Whatever form it is in, anything that must be compiled and run on the userā€™s machine while building will have this stability issue. Syntex does, libsyntax does, and expansion binaries do to. Which is why the proposal above skirts this problem by never getting involved with plugins on the userā€™s machine :smile:

My old proposal was:

  • Mark the AST representation as ā€œstableā€, but allow for the addition of enum variants and fields to variants.
  • Plugins must use wildcards in their matches
  • Allow plugins using this limited API to compile on stable

Itā€™s still hard to do in practice.

My initial impression is that this is a totally viable idea, but adding all this extra cargo machinery is pretty scary - this seems like a solution that will not have a long lifespan, after which weā€™ll be stuck with some of these designs.

Thereā€™s a lot of overlap here with the idea of cargo publishing fully-annotated expanded source to avoid breakage due to unanticipated resolution conflicts. It probably makes sense to consider both problems together.

2 Likes

Disclaimer: I could just be stupid.

I don't understand. One of the major points of using plain text is that you never directly represent the AST. In the very simplest case, you just emit and consume Rust source text, which has to have a stable representation. The only way it doesn't is if you've broken backwards compatibility.

The next (and for the foreseeable future, last) step up is to emit encoded TTs, probably as JSON. The point with that is that the representation is specifically not fixed; additional fields and token types may be added over time, and programs have to be prepared to deal with, say, a <=> token suddenly showing up. However, again, the token representation is unlikely to change much over time, and any changes that do occur can be papered over.

At this level, backward and forward compatibility issues should be relatively straightforward to deal with. As you said, the problem comes when you start trying to deal with libsyntax/libsyntex for AST parsing. I have two solutions (and a cop-out) to this:

  • First of all, I did a survey of procedural macros a little while ago, and found that with one exception, plugins don't even inspect the AST nodes they parse; they treat them like black boxes, just like macro_rules! does. As such, I believe that, for most procedural macros, we could get away with a significantly less comprehensive interface that just lets you ask the compiler "hey, here's a span of tokens; split them into 'the tokens at the start that form an expression' and 'everything after that'."

  • Cop-out: who cares? If a plugin pins a particular version of libsyntex, who cares if an incompatible upgrade lands down the line. The macro in question might grow a "doesn't work on Rust 1.4+" flag, but that doesn't violate backward compatibility: any code written prior to the version of the language with new, unparseable constructs will not contain those constructs, and so will continue to build on new compilers. It's only if you feed the macro something it wasn't expecting that it will blow up, in which case you'll need to go through the pain of updating libsyntex and the macro accordingly. Life is hard. :slight_smile:

  • The long term solution is, I think, to just write a new parser that unlike the rustc-internal one allows for "unknown syntax" nodes. If you really need to be able to parse Rust grammar, but you also want to extend it, you're going to need some way to inject extra productions anyway, and you're not likely to be doing that with libsyntax/libsyntex. This would be needed for things like custom #[derive] attributes and the like.

So I don't see how the instability of libsyntax is an impediment to stabilising a useful procedural macro interface. It's a pain, sure, but it seems eminently possible to move forward, irrespective.

It seems like a good idea to allow doing this and the method seems reasonable. I share brsonā€™s worries that it is quite a lot of infra work for a temporary fix. I worry that it might end up not being a temporary fix and having this kind of thing hanging around forever is kind of gross.

Iā€™m also wary about encouraging any more use of current syntax extensions. The easier that is, the harder it will be to replace them with ā€˜newā€™ syntax extensions, whenever that happens. (Iā€™ve been thinking quite a bit about what that might look like, hopefully we can start the design process properly soon).

So, Iā€™m of two mindsā€¦

2 Likes

As I understand it your solution involves encoding the syntax extension as a binary. Syntax extensions can operate on the AST, so the binary needs to be fed a stable AST, and it needs to link to the correct libsyntax libraries to use any utilities. Pinning syntex is a solution here, but it seems a bit icky.

You could use the same solution as mine by enforcing wildcard matches; but we'd already sort of decided that that solution would be hackish (and it would still limit internals development)

Well, this mode could be expanded in the future to run things like rustfix and replace unstable and deprecated APIs with their stable variants. I think the idea of such a mode is a long-lived thing, but its usage for syntax extensions is short lived.

IMO having key libraries be hard to use on stable (eg html5ever) is more of a problem than having features in tooling stick around for too long.

I don't think we can avoid this really. The folks who use syntax extensions are aware that they are unstable and okay with churn AFAICT. If you give people a tool, they will use it; it's hard to expect people to avoid these.

No, Iā€™m not dealing with the AST at all. Because the libsyntax AST is unstable, the only options Iā€™m considering are source code fragments, and TTs serialised in some loosely-typed form like JSON (i.e. {"t": "+", "span": "some_hex"} for a single token). Also, Iā€™m only considering this for macros and custom derivings; it obviously wouldnā€™t help with lints.

Iā€™d like to have some kind of basic mechanism for making simple requests of the host compiler (hereā€™s a sequence of TTs, how many from the start form a complete expression?), but that sort of thing can be fundamentally done by an independent library which may or may not be syntex.

Again, the whole point of this idea is to avoid the existing extension mechanism.

So ... what does the binary do then?

Oh, I get it, your binary will only operate on tts.

Thatā€™s rather limited really; the places (like html5ever) where syntax extensions do well need to be able to output AST stuff. Without AST manipulations syntax extensions are severely limited in power, and in that case a better solution would be to stabilize TT-syntax extensions (which seems doable)

How is it limited, if the binary can do AST parsing itself and any reasonable output AST is encodable as Rust source? Maybe Iā€™m missing somethingā€¦

To create a binary which can parse the AST, it needs to use a stable AST representation from libsyntax; an upgrade of the AST will make this plugin binary stop compiling. Or it needs to be uploaded as a standalone binary.

Syntex solves this, but now you can have to pin on a syntex version.

My point is that we can avoid all this by just uploading the expanded or partially expanded source directly.

While I think this could work in principle, my biggest concern is the risk that weā€™d end up with more projects using nightly than weā€™d want. Exported syntax extensions like phf or serdeā€™s macros have the tendency to be used a lot in end user code, so it could virally pull people into using nightly.

Itā€™d also be worth getting @nikomatsakisā€™s opinion on this. When I last talked to him, he gave me the impression that we might be able to stablize syntax extensions sooner than later.

It also might be worth extending this idea to making cargo itself pluggable, and experimenting with ways to implement something like this in it. For example, in syntex the build.rs template is pretty much copy-pasted into all the projects. Iā€™ve toyed around with the idea that we could create a cargo plugin that lets me write one template to do this for end users. Itā€™d be far more convenient if end users only had to add this to their projects to get their syntax extensions expanded:

[cargo-plugins]
syntex_cargo_plugin = "*"
phf_cargo_plugin = "*"

[syntex]
files = ["src/lib.rs.in", "tests/test.rs.in"]

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.