Exclude .github and .gitignore from published packages by default?

Recently I was a bit surprised that Cargo includes .github/ and .gitignore into packages generated for crates hosted in a one-crate repository (e.g. see here). Interestingly, Cargo already ignores items listed in .gitignore, but the file itself gets included. Both items are useless in published packages and I don't see any practical reason for including them.

I think it's worth to exclude them by default instead of requiring manual addition of exclude = [".github/", ".gitignore"] to Cargo.toml. It may be even worth to exclude all hidden (i.e. starting with .) files by default (you would be able to overwrite it with the include field if necessary).

What do you think?

8 Likes

I think that a policy of excluding all "hidden" files by default makes a lot of sense.

It is also a possibly breaking change and should be gated on package.edition.

10 Likes

Could the export-ignore property be used instead? I don't see why something should be in the .crate that if git archive would ignore it too. I just don't want to see cargo grow a menagerie of default exclusions for every forge's hidden directory (sr.ht, foegejo, gitlab) that changes based on the version in use.

As for .gitignore itself, it could be useful to see what may be missing. But again, I think export-ignore is a better source for this.

2 Likes

I'm not familiar with the conventions for what is included in an archive. Would we risk of going against convention?

Also, how many people even know of this and then set this? I suspect this won't be helpful enough to be worth the effort.

Ignoring hidden files seems like it can make an immediate difference but we'd need to work out the transition plan.

1 Like

Some dotfiles may be important (.gitattributes perhaps, though there's no git infrastructure to query them). Perhaps .cargo files that are committed? It's really hard to say that "all dotfiles are unnecessary" IMO. I'm not sure that export-ignore is all that well-known, but I think using existing mechanisms, even if "obscure", would be better than making Yet Another Exclusion Mechanism™ (see also xkcd#927) that needs to be considered when figuring out "why is this being ignored?". Who knows, maybe cargo can even raise awareness of export-ignore to be used as intended in more places :slight_smile: .

1 Like

I'm having a hard time identifiying a dot file that would be relevant for most projects. As you mentioned, there is no git. .cargo/config.toml is ignored and excluding it by default would better communicate that.

I'm mostly concerned about tooling that doesn't have in-source markers. Are there dotfiles that help things like rustfmt, cargo doc, LSPs or AI metadata/configuration files, etc. that would be useful from an extracted .crate? I don't think anyone is formatting crated code nor are they doing feature stuff not already enabled elsewhere. It's fine if the answer is "no", I just want to make sure the "hidden file" heuristic isn't something that's likely to frustrate in the future (and that when someone archaeologies their way here, it was at least considered).

If someone has the dump of all published crates, they could produce a frequency table of dotfiles.

A noisy compromise could be to warn at publish time on any dotfile not explicitly mentioned as “include” or “exclude”. Then the list of hardcoded exceptions (like .git) has less reason to grow.

2 Likes

Would it be possible to do a crater run to see whether removing dotfiles from published packages breaks anything?

I feel like .gitignore shouldn't be excluded (it conveys important information and is tiny enough), but everything that .gitignore excludes should be excluded by default by cargo. However, this would break crates that rely on generating code to publish on creates.io but doesn't want to commit this code to git (arguably they should).

Excluding dotfiles by default is kind of random, because some tools have config files as dotfiles, others don't, and there's no real difference between them.

1 Like

The local cache of a downloaded crate isn't in a git repo and thus .gitignore wouldn't do anything. If you vendor into a git repo, then it still shouldn't matter given that you are not supposed to modify the source directory under any circumstances. cargo publish even checks that you don't modify the source directory during a regular build and IMO we should use file permissions to make the crate cache read-only.

1 Like

This is already the case:

If include is not specified, then the following files will be excluded:

  • If the package is not in a git repository, all “hidden” files starting with a dot will be skipped.
  • If the package is in a git repository, any files that are ignored by the gitignore rules of the repository and global git configuration will be skipped.

... though I didn't realize cargo already skips dotfiles for non-git repos!

5 Likes

Huh... Interesting. I wonder why such inconsistency was introduced. Is it because skipping of dotfiles was considered to be "replaced" by .gitignore? If so, I think dotfiles should be ignored regardless of the used VCS.

I created Cargo issue with proposal to remove the git exception:

I think a crater might might actually overestimate the amount of breakage.

From experimenting locally, it seems that cargo package (and presumably also cargo publish) attempt to do a crate compile without access to any of the ignored files. So something like include_bytes!(".foo") would get caught before uploading.

These files are usually small; I don’t mind.

1 Like