Build scripts "PWD" and "CARGO_MANIFEST_DIR" strangeness

Ref: Issues · terry90/internationalization-rs · GitHub

I discovered some strangeness associated with non-top-level build scripts, and I'm trying to figure out how it's supposed to work, and whether this is a bug in Cargo.

The internationalization crate, which I was trying to use unmodified, runs a build script that creates some macros. The macros are based on language files, which it looks for on a path relative to PWD. This yields some very strange results, including non-repeateable builds.

The Cargo documentation says "The directory containing the manifest for the package being built (the package containing the build script). Also note that this is the value of the current working directory of the build script when it starts." This is not quite right. What seems to happen is that, starting from a clean state (cargo clean), PWD (and, I think, CARGO_MANIFEST_DIR) are set to the top-level manifest directory where the top-level Cargo.toml file lives. But on later builds, PWD can point to some place in the "target" directories. I've captured the "output" file generated by the build script and can see this. I've had a project where "cargo clean" has to be run between generating debug and release builds. This seems wrong.

How it should work isn't clear. When a dependent crate is being built, its Cargo.toml file is somewhere in the files Cargo has downloaded, and PWD ought to point there. But it doesn't, at least not consistently.

Build scripts don't get an environment variable that gives the location of the top-level build directory, although Cargo knows that and targets are relative to it. Is that by intent, or an omission? Having a dependent crate depend on an externally provided file is useful. It's fairly common to want to reference a file that is parsed and used to drive code generation. The code generator is usually in a dependent crate. Was there a decision that was a bad thing? Documentation seems to be silent on this.

The motivation for this macro approach to internationalization, by the way, is egui, an immediate mode GUI system. Values of type "&str" are read on every graphics frame, so you don't want to be doing a lookup. So the run time internationalization packages are too slow for looking up all the menu items.

So, is this is a Cargo bug, a Cargo documentation error, or bad design of internationalization?

(Ubuntu 20.04 LTS, rustc 1.61.0)

Not necessarily, you can override the target directory and have it shared between different “top-level builds” (and that’s sort of the effect a workspace gives too).

Actually, what fills in the PWD envvar? Is that set by the shell so might not actually be updated when cargo changes the current working directory for the sub process?

That's a good question. The Cargo documentation implies that PWD, CARGO_MANIFEST_DIR, and the actual working directory should all agree. But it's possible that they get out of sync somewhere. If that happens, it should be considered a bug.

I tried using a local copy of internationalization. So I changed my top level Cargo.toml file from

internationalization = "^0.0.2"

to

internationalization = { path = "../rustcode/internationalization-rs", package = "locales" }

('package" is needed because the crate name in crates.io and package name in the crate's Cargo.toml are different.)

Now, if I do that, CARGO_MANIFEST_DIR, when running the build script for internationalization/locales, is always pointed to the "path" value, which is correct. Not what the designer of internationalization wanted, but not wrong per the Cargo manual.

I now suspect that PWD, and maybe CARGO_MANIFEST_DIR behave strangely when using crates from crates.io. Sometimes it's the root Cargo.toml directory, and sometimes it's something under "target". This is kind of hard to test without putting a test crate into crates.io. The cases where dependencies are local or in the same repository seem to be fine.

I think this is the issue. Cargo doesn't mention PWD, only "the build script’s current directory", which you can discover by calling std::env::current_dir() or similar system APIs. But PWD is a shell thing.

2 Likes

Right. It's bad form to get them out of sync, but it's certainly possible.

It now looks like the internationalization crate only works because of a bug. In a clean build, PWD apparently is the manifest directory of the whole project, which is wrong but makes the crate work. In rebuilds, it can apparently have other values.

Is there any legitimate way to find the manifest directory of the whole project from a build script? That was the intent in the internationalization crate.

It seems to me that rather than doing this in internationalization's build script, this should really be code that you call from the build script of your top-level project. But then I suppose they would also need you to call some initialization with that built data at runtime.

Yes, that would be a better architecture.

(Internationalization for Rust is not in good shape. This simple crate doesn't work, and seems to be abandoned. "fluent" has a following, but it's much more complicated, has a lot of moving parts, and its own docs say it's unfinished. On the help forums, someone wrote "I don't think there is a consensus here yet.". I guess Rust is English-only for now.)

No, the same build script run can be reused for multiple “projects” when using a shared target directory (such as is used by default when using a workspace).

Right, dependencies are a directed acyclic graph, not a tree. Therefore, parents are not unique, and looking towards the root for information at compile time is thus a no-no. Therefore, the design of the internationalization crate is fundamentally flawed.

Thanks, all.