Pre-RFC: `cargo-script` for everyone

est31 · April 8, 2023, 3:53pm

I think this proposal is really great, fundamentally, although there is a huge amount of design questions.

One of the big selling points of Rust is its robustness, and it should not be compromised here IMO.

Personally I think it's very important to support lockfiles inside the toml as well as not do assumptions about editions, aka keeping the behaviour of 2015 being default if not specified otherwise, although as it's a new feature one has the opportunity to now default to a new edition like 2021 or even 2024 (basically whatever is newest at point of stabilization, then not changing it). Otherwise it will harm the edition mechanism: upgrades of the toolchain should not introduce breakage, this is one of the core pieces of the editions RFC.

Especially when it comes to editions, people will probably omit it more often than not and then run into problems after upgrading their toolchain.

For inline lockfiles we'd probably benefit from a more compact way of specifying lockfiles, ideally just the list of version numbers of all used crates. Then cargo can infer the rest by running the classical resolver algorithm and pretending that the non-mentioned version numbers don't exist/were yanked/never published. Hashes, while great, for .lock files, are maybe not required that much. As a default behaviour, if no inline lockfile is present, the script could just create a standard lock file in the same directory with the name of the .rs plus ".lock" at the end. Then it's up to users whether they want to inline the lockfile (via a command) or whether they want to distribute that separate file (or not distribute it and take the risk of breakage onto them).

mjw · April 8, 2023, 4:39pm

If cargo-eval is mainly intended as a "gentle introduction to cargo" then there's an advantage to using cargo.toml format, but if its main purpose is "adding support for directly-executable scripts to Rust" then I think a simpler dedicated form of configuration would be better.

That would avoid having to tell people "read the cargo.toml docs, but these parts are forbidden and those parts have different defaults".

The things supported in the new form of configuration could then start out very minimal, and more options could be added when the need appears.

In particular, possibly later we might find we want things to be configurable here that don't make sense in cargo.toml (eg, a choice of toolchain).

Possibly starting by requiring exact version specifications for dependencies would allow deferring questions of how to deal with lockfiles.

burntsushi · April 8, 2023, 4:49pm

I love the idea. I especially appreciate it as a tool for normalizing how to report minimal examples in issues. Lots of people forget their Cargo.toml. It doesn't always matter, but it does sometimes.

I've also been slowly moving a lot of my bash/Python programs over to Rust. Well, it's less of an explicit porting effort and more of a "all new scripts over some arbitrary line of perceived complexity just get written in Rust." This sounds like something I would use instead of my little script to build them explicitly.

steffahn · April 8, 2023, 5:50pm

For some context: as far as I’m aware, the main (or only?) reason why a default edition (being 2015) exists in the first place is that editions were introduced a lot later than Rust 1.0, so the default edition exists for backwards compatibility.

In my personal opinion, the best choice is no default: You don’t tell me your edition, then I don’t compile your program.

No default is best, since every (eternally stable) default will be terribly outdated and thus terribly useless, eventually. (That, plus my personal experience with godbolt.org as mentioned above, that subtle problems emerging from unknowingly being in an old edition can be really really confusing.) Since no default is possible for a cargo-eval tool, so we don’t need the edition-2015 default for backwards compatibility, that option is thus a strong contender.

est31 · April 8, 2023, 6:39pm

Those are good points in favour of erroring if no edition is specified. Half of the reason why I suggested the support for an edition default was because then simple files could do without the overhead of specifying a toml at all: a hello world should be as boilerplate free as possible. The canonical example shouldn't have to include four/five lines of boilerplate:

#!/usr/bin/env cargo-eval
//! ```cargo
//! [package]
//! edition = "2021"
//! ```

Your suggestion upthread to have multiple binaries cargo-eval-2018 cargo-eval-2021 etc sounds nice, but I'm a bit unsure if it's a good idea to have that many binaries: in the end this would mean that every three years we are adding one binary. Since Rust is a (ardent fans would say the) language for the next 40 years, that'd mean over a dozen cargo-eval symlinks down the line, unless Mac OS supports the -S param for env in a couple of years.

What about cargo-eval interpreting any comment block starting in the first line as toml? It'd then be two/three lines of boilerplate only:

#!/usr/bin/env cargo-eval
// package = { edition = "2021" }

Comment block being defined as: either a block comment starting at the next line, or the largest set of consecutive lines starting with C++ style // comments.

I'm not sure if the rustdoc compatibility is worth the additional two lines for // ```cargo and // ``` .

simonbuchan · April 9, 2023, 4:38am

I've independently rediscovered the cargo xtask pattern (great to finally have a name for it), and I use it in nearly every midsized project now, so I would prefer something closer to that use case if I had to pick, so I think this approach should be focused on the "single file Rust program" use case. That that would mean generating and committing lock files is probably a bad idea? I don't know how you fix that though.

samsieber · April 9, 2023, 4:51am

Assuming that we're going to update the script to inline the lockfile....

We could have the default behavior for when no edition is specified be to update the script with the current latest edition. That makes scripts reproducible, and yet still easy to write.

Aloso · April 9, 2023, 11:45am

I understand that there are two distinct use cases:

experimenting with or sharing short-lived scripts (e.g. for reproducing bugs)
longer-lived scripts for automation (similar to cargo xtask and npm run)

For the former, conciseness is particularly important, but for the latter, stability (with a fixed edition and lockfile) is required. The RFC currently focuses mostly on the first use case, but many people (including me) are also looking for a solution for the second one.

Note that a solution designed specifically for use case 2 may look very different. I'm thinking about a fixed directory structure under a scripts/ folder, similar to benches/ and examples/. Then a script at ./scripts/foo.rs could be invoked simply as cargo eval foo. Furthermore, scripts for use case 2 can use the workspace's lockfile, which may not desirable for use case 1. Therefore, it might be better to have two different tools for the use cases, rather than one which tries to do both, but isn't very good at either.

Another unrelated point: I don't think the shebang to make the file itself executable is necessary: It only saves you a few key strokes and complicates the design quite a bit. Specifically, it makes it impossible to pass cargo flags to the program, and may require a different file extension because of Windows. And how much shorter is it really? For comparison:

./hello_world.rs
# vs
cargo eval hello_world.rs
cargo e hello_world.rs    # abbreviated command
cargo e hello_world       # if file extension is optional

That's only 3 characters longer. Users can still add the shebang if they want, but I wouldn't make it the recommended/official way to invoke scripts.

davidsk · April 9, 2023, 11:54am

I have been thinking about the lock file. The size seems an obvious problem. Maybe we can overcome that with a smaller representation. Lets say we only need the name and version of every crate and we drop the hash and any formatting.

I took a look at the lock file for one of my small utility rust projects and it had 38 entries. Now its dependencies had an average length of 10 characters (thanks to entries such as winapi-x86_64-pc-windows-gnu). Lets say 9 is more normal. As we need at least 6 characters for the version and two delimiters to keep everything separated that means:

38 * (9 + 6 + 2) =~ 646 characters

This fits in 8 lines if you wrap around at column 80.

Lets go further just for fun. What would happen if we drop the need for a human to read it? Lets assume the average crate name follows: [a-z | 0-9 | _ | - ]. Those are 44 different characters, we need 5 bits to represent that. Ignoring alpha, beta and release candidate and versions higher then u8::max the version needs 3 * 8 bits at most. Thus we can encode the same information in 38 * (9 + 5 + 3 * 8) or 180 bytes. The real fun begins if we use utf8 to the max. We can encode 3 characters in the visual space of one (the lines will get kinda colorful with some though mainly �). That gets us down to 60 columns of unreadable magic but it is under one line

Aloso · April 9, 2023, 12:08pm

I'm strongly against putting compressed binary data into a file of source code, even if it is valid UTF-8.

I also don't like the idea that executing a file may change said file. I often edit source code while running it, so this lead to version conflicts. VS Code automatically reloads open files when they were changed, but many other editors (I'm thinking of vim) don't. And even VS Code can't magically resolve conflicts when there were unsaved changes.

davidsk · April 9, 2023, 12:41pm

Unfortunately there is the third use-case: scripts that add / customize the OS. Some of those have fixed locations. And users might want to keep all their system/scripts in one space for example ~/.local/bin the other in ~/bin.

Its needed when you can not control how the script is called. For example when extend/customize the OS by placing files in a directory that get executed when an event happens, see the message of the day.

Such scripts also need to be robust they may be in place for years and would move between machines.

I do not know how to make this work without an inline lock-file or create one in the same directory on script run. I think I would prefer inline as it lessens the chance of forgetting about the lock-file when moving scripts around. Its a difficult choice though.

Aloso · April 9, 2023, 12:48pm

In that case, why can't the script be built with cargo build or cargo install, resulting in a single binary that doesn't even depend on cargo and will work basically forever and without internet access?

DitherWither · April 9, 2023, 12:49pm

Yeah, if you only store the crate name and version, the size goes down drastically. In my case, it went from 70 kB to 5.3 kB.

This is while keeping it fairly readable. This is a example lock file created using regex + find and replace

The random format I created is fairly similar to yaml, just not requiring a space after the crate name. I am not saying that this format is good, or should be used. But using a more compact format could be a good idea

DitherWither · April 9, 2023, 1:00pm

I also think that putting the lockfile inside the source code is a bad idea. It adds a lot of lines to the source code that you should not edit, which is imo bad design.

Creating a lockfile in the same directory might be a better idea:

hello.rs has a lockfile of hello.lock or hello.rs.lock. This seems like a better idea to me, but there might be issues with this as well.

drewkett · April 9, 2023, 1:12pm

Regarding having cargo eval inline the lock file, that could be an optional flag to enable that behavior. Something like cargo eval --save or cargo eval --lock. This gives people who want that behavior an easy way to accomplish it without making rewriting the script file the default behavior which seems undesirable.

If it is not the default behavior but is allowed behavior, we’ll also want it to warn (or error?) if the inlined lock file is out of date.

I’d lean towards the default behavior being to stick it in the target directory somehow as someone mentioned above. I don’t love tooling dropping new files in current directory and if these scripts get put into a bin directory, I also wouldn’t want it to try to put a lock file in the same location as the file in the bin directory.

kpreid · April 9, 2023, 3:08pm

Bug reports also often need stability to reproduce the exact conditions for the bug.

toc · April 9, 2023, 4:14pm

It seems to me lockfile/dependency resolution needs to be considered up front, even if mechanisms of embedding the lockfile are punted on.

Startup time is important for scripts, possibly even when the script changes. Can we cache the lockfile in .cargo/ along with the cached binary to avoid rebuilding dependencies?
Can I easily share a lockfile ("drop this in the right spot in .cargo/") to get a consistent build across machines (I forgot to embed it/embedding was punted on).
Can we (plan to) have support for cargo eval --embed-lockfile? Or cargo build --embed-lockfile, I don't think I usually want to run the script when generating its lockfile.
Special formats are crazy, this should just be a standard lockfile inside a '''Cargo.lock comment block. Probably (by default) at the end of the file. Some later effort can come up with a compacted lockfile format.

steffahn · April 9, 2023, 5:00pm

Even more extreme approaches for compactification I can come up with:

The crate index gives a list of crate versions in chronological order, right? We could simply count those, with the effect that most crate versions could be as short as 1 byte (if there’s fewer than 128 versions published, using a variable-length encoding).

The dependency tree of a crate can be traversed in some canonical order, and if that’s done, we might not need crate names at all, and the lockfile-information could be only a list of versions. Just start with the first dependency, then continue with depth-first traversal, and every time a new crate is encountered, the version is the next version in the list.

Unfortunately avoiding the crate names breaks whenever the list of dependencies is changed But – ignoring that^[1] – it’s pretty neat and your example of 38 dependencies would boil down to typically not much more than 38 bytes of information, which is about 50 characters without using anything but nice printable ASCII characters.

and maybe… just maybe it’s not entirely impossible to work around this problem ↩︎

vi0 · April 9, 2023, 5:15pm

Maybe instead of omitting hashes from inlined lockfiles completely, one general hash for the entire tree can be included instead of multiple per-crate ones?

steffahn · April 9, 2023, 5:17pm

One thing that just came to mind was formatting. rustfmt also has an edition argument that one might not always want to specify by hand, so it (or some other command) would presumably want to support reading the script’s edition information somehow?

Given that formatting tooling updates the file already anyways, maybe it’s even desirable to have a combined command that can do many things at once to tidy up a script, like

formatting
creating/updating a (compressed?) included lock file
filling in underspecified or unspecified (e.g. via the user typing = "*" or some new syntax) crate dependency versions, resulting in functionality akin to cargo add

Given how popular automatic formatting already is, people who already have a workflow to do formatting could thus simply integrate the other source-file-modifying operations into it.

Topic		Replies	Views
"Jar" for Rust: single file crate support for `rustc` compiler	33	2254	January 22, 2024
Rust Scripting language design	27	7107	September 17, 2020
Tools Team: tell us your sorrows tools and infrastructure	60	8357	August 10, 2019
Can we rename Cargo.toml? cargo	55	11393	March 25, 2019
Build.rs use cases and stories sought! cargo	47	3636	June 5, 2019

Pre-RFC: `cargo-script` for everyone

Related topics