[Pre-RFC]: cargo new templates (v2)

I just had a thought; will templates be intended to be reused within the same project? That is, can we create a template that meets all of the rust API guidelines for creating new struct/enum types that is designed to be reused with the same project? This will affect the engine's design.

1 Like

Thanks for these detailed responses again! I've updated the RFC. One of the questions was poorly worded on my part so I updated it:

Where do the docs need to be updated in the Rust Lang book or official docs?

That one I can probably do, but open to feedback.

I also think we should create even a minimal example in Git to include. Anyone is welcome to do that, otherwise I will this weekend.

Hmm...I am not sure I fully understand. Here's what I'm hearing/imagining:

I have a crate: [rocket](https://rocket.rs/). Inside the crate, I have a template called hello_rocket.

I don't think you should be able to use hello_rocket within the rocket crate/project.

It can be used in a new project, but not the project where it lives. Am I following you correctly?

Yes, but I was thinking of a different use case. Suppose that the rust developers decide to create a new template project repository, but one that isn't geared towards any particular crate, it's just there to help developers write better code. There are several different types of templates in there, including one for creating new struct types that implement all of the appropriate traits as described in the Interoperability portion of the rust API guidelines checklist. As an end user, you might want to use the template to generate new structs within your project over and over again (e.g., the template ensures that you've implemented the traits, that there are (empty) doc comment blocks with FIXME in all the appropriate places, etc. You'll use this template to generate type Foo, Bar, Baz, etc., which means that you're using the same template within your project over and over again.

Note that this behavior could be extremely useful within a project. Think about a large project like rustc, which may have its own coding guidelines and standards. By implementing a template for new objects, an author just needs to fill in the blanks, rather than remember what the guidelines are. So, I believe that you should be able to use templates within the crate/project that they are defined in.

But thinking on this, I think we've found another question; how big/small can templates be? Do they have to be an entire project/crate at a time, or can they go all the way down to a one-liner? I know that as a practical manner, having a template that produces a single variable is silly, but but the question is really 'is it allowed?'

1 Like

I just checked a couple of minutes ago. Google picked up the post, and it is the only hit. :grin:

2 Likes

It worked for me too!! That's so awesome! :smiley:

Hmm...I agree that is a valid use case, but I would say to keep this MVP small, we should not include that in this scope. I think that can live on it's own without being part of this MVP. I believe that problem can be solved by existing solutions.

However, it could be discussed in the future. Thoughts?

I think we might be ventured out into something separate. I'd like to keep this MVP focused and keep the scope small. This is definitely a valid thing. I've seen solutions like hygen used in the JavaScript ecosystem, but I think that's a different problem to solve (and frankly out of the scope of this issue).

Happy to add it to "Related issues that are considered out of scope for this RFC:"

Thoughts?

I concede that it could be discussed in the future, but the reason I keep thinking about it is because so much of the infrastructure for templates is fully reusable; at some point, the engine is generating code, and the only difference is going to be how much code it generates. So, may as well start thinking about it here and now!

1 Like

I want to turn attention to how templates are supposed to function for a moment. Basically, what is the underlying engine going to do, and going to be able to do. Is the engine going to be allowed to do arbitrary things? Or is it going to be sandboxed? How will it be sandboxed? Are we going to develop a new domain specific language (DSL) for templates alone? Can we use a language that is already available?

My vote is to use the absolute simplest method I can think of; copy/paste/text substitution. This would involve making the convention that each child of the template directory is a complete template. Thus:

- mycrate
    - templates
        - A 
            - Template.toml
            - src
                - Cargo.toml
                - src
                    - file1.rs
                    - file2.rs
        - B
            - Template.toml
            - src
                - file1.rs

cargo --template would interpret this as there being two templates, A and B. Within a given template is the Template.toml file. This file enumerates all of the keys that are available within that template, along with any help information that is needed. E.g. A/Template.toml might contain:

[package]
name = "A"
version = "0.1.0"
authors = ["Alice B Caring <abc@place.com>"]
edition = "2018"
description = """A small, complete template for the [foo-bar-baz](https://crates.io/crates/foo-bar-baz) crate.

This template will create a small, but complete, project that uses the [foo-bar-baz](https://crates.io/crates/foo-bar-baz) crate.  
"""
keywords = ["foo", "bar", "baz"]
categories = ["template"]
license = "MIT OR Apache-2.0" # It would probably be better to require that templates use the same license as the original crate.

[badges]
maintenance = {status = "experimental"}

[dependencies]
foo-bar-baz = {version = ">= 0.4, <= 0.6", features = ["serde"], uuid = "B23901D91F984C39888E468F531E0F97"}

[keys]
authors = {description = "A list of strings of authors like 'Alice B Caring <abc@place.com>'", default = ""}
date = {description = "The date that the file was generated.  Can be any form you want, it will be directly copied into the template.", default = ""}

Most of those keys are directly copied from the Cargo docs, and so should already be familiar to users of the tool. The only new keys are uuid, which contains a uuid generated by the trick I mentioned above (by the way, uuid would have to become a new key for Cargo.toml files), and the whole keys section, which (I hope) should be relatively obvious.

When using cargo --template, you might have subcommands like:

  • cargo --template list, which would return A and B to you.
  • cargo --template show A would give you a pretty-printed version of A/Template.toml.
  • cargo --template generate --authors "Alice B Caring <abc@place.com>, Donald E Francis <def@place.com>" --date "Thursday, 29 February 2024" A ~/Documents/repositories/my_project/ which actually generates your file(s) from the template.

The generate command would have two required arguments; the first is the name of the template to use (A) in the example above, and the location where you want the template to be copied to (~/Documents/repositories/my_project/) above. The complete contents of a template's src directory would be copied directly to that path; thus, when using template A I would expect to see a directory structure like:

- ~/Documents/repositories/my_project
    - Cargo.toml
        - src
            - file1.rs
            - file2.rs

but if I used template B, I'd see something like this:

- ~/Documents/repositories/my_project
    - file1.rs

This actually solves the 'what is a template?' question; to use template B correctly, I would need to use cargo --template generate --authors "Alice B Caring <abc@place.com>, Donald E Francis <def@place.com>" --date "Thursday, 29 February 2024" A ~/Documents/repositories/my_project/src/, which would place file1.rs where it belongs.

The engine would be a simple text substitution engine. It would search for text strings of the form {{key}} and replace them with either what is on the command line, or the specified default in the Template.toml file. If a key doesn't have a default and is not specified on the command line, then the engine would issue an error about the missing key (maybe with the key's description copied from the Template.toml file), and require that the template user fix the errors before continuing.

This solves the major security problems because the template isn't being executed, and you can't really do a DOS attack on text substitution (notice how the method above doesn't permit re-evaluation of the generated text, which means you don't get a billion-laughs style attack).

This leaves two issues; how do template authors indicate that they really want the string {{ or }}, and how do you reuse a template for the same destination path multiple times in a row?

  • How do template authors indicate that they really want to have {{ or }} in the source text? A method that could work is to pre-define the keys {{left double curly brackets}} and {{right double curly brackets}}, which template authors could then put into their templates as needed. The engine wouldn't need any modification as it would substitute the appropriate strings in directly whenever it encountered those keys. The only tricky part would be binary blobs (e.g., audio or image files that were part of the template). My suggestion is to have a cargo --template from <source path> <dest path> command. It does the reverse of the template engine, copying everything from <source path> to <dest path> almost verbatim; the difference obviously being that whenever the engine encountered {{ it would substitute {{left double curly brackets}} in, and similarly for }}.
  • What happens when a user has the same destination path multiple times in a row? The trivial answer is that the engine simply and blindly overwrites the contents of the destination path. This would be a horrible user experience, so let's not do that. Instead, cargo --template generate could have additional switches, maybe -f, --force and -i, --interactive, which would be mutually exclusive. If the user selected -f, then the destination path would be overwritten. If -i were selected, then the engine would ask what to do for each destination file that was going to be overwritten (probably overwrite, don't overwrite, compare with the user's diff tool, etc.; whatever you'd expect from a good VCS when merging in conflicts). If neither switch is selected, then the engine would quit with an error warning about the conflict. Note that it would probably be nice to have a key argument to rename the file to make this easier to use.

I think that something like the above would be sufficient for most templates; can anyone think of something where it wouldn't work?

2 Likes

I would prefer that dates be required to be formatted in sort order, perhaps with some optionality on the separator character. Although I prefer yyyy.mm.dd I could live with yyyy-mm-dd.

The potential confusability of U.S. mm.dd.yyyy vs the widely-used dd.mm.yyyy format needs to be precluded. Additionally, yyyy.mm.dd is the only possible order in Chinese and derivative languages, which always order from larger domain to smaller subdomain.

2 Likes

If we have to choose a date format, either

  • use the ISO date format YYYY-MM-DD
  • have no knowledge of date format, it's just more text

Also, @ckaran, you're creeping the scope of templates again.

For the MVP, we need exactly the following:

  • A package published to crates-io can provide associated templates.
  • These templates contain a Cargo.toml and optionally other files.
  • The generic template functionality offers customization of the exact same set of items as regular cargo [new|init], that is, the package name, the package authors, and the source control. (Yes, cargo new supports source control other than git!)
  • cargo [new|init] --template package/template (or w/e syntax) grabs template template from package package and instantiates it.

To this end, I suggest:

  • The Cargo.toml of a template must include the following text (or some other placeholder) literally at some point, which will be replaced wholesale by the template engine.
    [package]
    name = ""
    authors = []
    
  • While future extension to a "full" templating engine should not be blocked, it is expressly not part of the MVP
    • And I would suggest that the "full templating" engine is opted into with a separate toml from Cargo.toml that would provide a glob-based include list for what files to do replacement on.

And I don't really get the point of the UUID thing. The templates are already attached to a published crates-io package. This also handles versioning, package:SEMVER/template would provide a semver bound on the source of the template, package/template would be a shortcut for package:*/template.

5 Likes

I personally prefer the last idea (no knowledge of the date format), and was hoping that it would come across in my example (the fact that it didn't is my fault). I view this as a text substitution engine only; if we have a required format for dates, then that implies that the engine is somehow able to parse the dates, and possibly reformat them into some other form. I view that as being out of the scope of this proposal.

I just re-read the proposal, and you're right, I misunderstood what was being proposed. I thought that the proposal was a more complete template engine.

That said, I still disagree with the idea of limiting ourselves to only the currently available keys. At the end of the day, even a fairly trivial template engine is going to have to do some amount of duplication + text substitution. The difference is that there would need to be additional code to reject keys that aren't already defined within cargo. I'm removing that code from my idea of the proposal. :wink:

1 Like

Given the scope of this proposal (everything is on crates.io), you're right. We could easily remove that from my example without any significant loss.

1 Like

I have no real problem with the ISO date format. As to not requiring dates to have a format, that presumes that no-one will grep the template by date except the creator. I've spent way too much effort in the past compensating for people who think that they are the only person who will ever need to search for dates in their material. IMO, all such non-standard formats should be consigned to private documents that are never posted for others to review or use.

1 Like

I believe the intent of the example wasn't for the templated date to be special in any way; rather, it's just an arbitrary bit of text that is included, say, to copy/paste into a copyright notice. As far as the template engine would be concerned, it is just another string.

You'll also find that legal documents almost exclusively use the month-name format as opposed to ISO format or any other numerical format, because it is impossible to misrepresent "29 February 2024" as any date other than the intended one.

2 Likes

Very true, because legal documents usually are interpreted relative to the laws of the country within which they are written. But different languages and countries require that dates be expressed in differing ways and orders. That's why, when documents are expected to be shared across cultures, it's useful to use an internationally-standardied format rather than one with only regional significance.

Aside: Date order is the one aspect of the movie Avatar that James Cameron got wrong. It is inconceivable that the date order used 100+ years in the future will not be the larger-context to smaller-subcontext descriptive order required by the Chinese language, which with its derivatives is the milk tongue of over 20% of the human race.

1 Like

Yes, that was exactly what I intended (including the part about the copyright notice, good job reading my mind!).

@Tom-Phinney, I do see your point about date strings needing to be in a standard format, but the issue is that we then need to decide what the date actually is. For example, at the time I'm writing this, the date is:

  • Sunday, 19 April 2020 (Gregorian calendar)
  • 25th of Nisan, 5780 (Jewish calendar)
  • 25 Shaban 1441 (Muslim calendar)
  • Sunday, 19 April, year 2 of the Reiwa period (Japanese calendar)

Etc., etc., etc. The precise way you number a date depends on where in the world you're from. Forcing everyone to use a particular format/date system means that one of the strengths of rust (everything is unicode) is instantly weakened as a convention is imposed which forces some groups to give up their own, preferred date system. I know that rust uses English keywords, and that the standard library and most crates are also in English, but nothing prevents rust from adding a few new keywords (e.g. γ€Œγ¨γ€for 'if' in Japanese) to the language, nor is there any particular reason not to expect users to create something like crates.io.jp. (I'm not saying that it will happen, and I personally think its a really bad idea to do this, but rustc already has 99% of what it needs to make this happen).

Beyond that, it also means that the engine would need to have a long list of known keys. {{left double curly brackets}} and {{right double curly brackets}} are already special, but the engine doesn't need any special logic, they're just predefined keys. {{date}} being a specially known key also means that the engine would need to be able to parse dates correctly, and to give good error messages when the parse fails for some reason, which adds a huge amount of complexity. And once you start with {{date}}, you might as well throw in {{authors}}, {{license}}, etc., etc., etc.

If, after this proposal is completed, accepted, and has been in use for a while to test how well it works, we want to come back and extend it with some kind of validation engine I'm open to that, but I really think that validation is out of scope for this.

2 Likes

While I sincerely wish this would be true, remember that the metric system was invented in late 1700s, and certain parts of the world still haven't switched over to it fully. :wink:

4 Likes

Thank you @ckaran, @Tom-Phinney and @CAD97 for the additional comments!

I've made a few minor adjustments. I think we're ready to end this Pre-RFC and move on to the next stage. I'm going to end the discussion here. If you'd like to be tagged on GitHub when I post the RFC, please leave your GitHub handle here and I'll make sure to include you :slight_smile:

My GitHub handle and my handle here are the same; ckaran. Thanks!

1 Like