core::path::Path

Hello Rustians!

I’m not proposing an API yet; I’m trying to understand whether this direction has already been considered or is architectually blocked.

The idea: Path feel like it could share a template/pattern-style abstraction: borrowed, allocation-free, usable in core, and separated from filesystem/OS behavior.

Questions:

  • Is core::path impossible mainly because of platform path semantics / OsStr?
  • Would a smaller abstraction, e.g. path-ish pattern parsing without IO, make sense?
  • I am not 100% educated here and i am going to glady read any discussions - as i could not identify any, towards the topic.

In general i was viewing it from the architecture perspective - Network and Filesystem Core Structure - rather then looking too much into the past, more into an "ideal world" and how much can just be kept as generic guidance, for later hardware and platform adaption. std::net std::fs

Related: there's an issue about also making it possible to deal with Windows paths (in a lexical way) on Unix and vice versa.

2 Likes

I'd love to see core::path::Path and alloc::path::PathBuf. And I'd love to see windows path handling and unix path handling available on on platforms.

2 Likes

For the ecosystem i had in mind

core::path::Path<Syntax>  *borrowed path structure*

alloc::path::PathBuf<Syntax>  *owned path buffer*

std::path::Path  *native OS path, OsStr-backed, filesystem-facing*

a large fraction of code using std::path::Path is not asking the OS anything. It is using Path as a typed lexical structure.

A lower-level generic path abstraction, while the filesystem-facing parts remain in std. Maybe there is room below it for a pure lexical path type.

This is some motivation and i believe its tempting to start a small first practical experiment.

For example, a tar archive uses Unix-style paths regardless of the host OS, and a Git/tooling crate may need to reason about Windows path restrictions while running on Linux. Those are path-syntax problems, not filesystem-access problems.

Risks I see in this design:

  • Currently, items in core and std with the same paths are generally identical. Changing that makes it harder to correctly migrate code to no_std compatibility, and also interferes with the possible future where instead of a core/alloc/std split, we have std with feature flags. (There is, currently, one exception to this principle already, std::panic::PanicInfo, but that came with providing an alternative and deprecating the existing name.)
  • Individual operating system variants or versions may have particular quirks of path parsing that std::path::Path needs to be aligned with, which may not be as simple to define as having a single Unix type and a single Windows type. (I don’t know if this there are any examples of this today.) This means that more than two syntax types would have to be introduced, and this might put Rust in the position of either causing a breaking type mismatch, or having security bugs.
3 Likes

The goal is not to make path semantics less platform-specific.

The goal is to stop making purely lexical path handling unnecessarily std-specific.

My view is mostly ecosystem-oriented: Rust has a large amount of code that could be more harmonious with no_std/alloc if path-shaped data had a shared layer below std.

Security-wise, I agree the API must not imply that a generic lexical path type can predict how a real OS/filesystem will resolve a path. That boundary should be explicit and central, not an afterthought.

So the promise would be:

this type models a documented lexical syntax;
it does not model OS path resolution.

Key points:

  • std::path::Path should remain the native, target-specific, OsStr-backed path type.

  • Filesystem resolution, canonicalization, platform quirks, permissions, symlinks, device/mount context, and other security-sensitive behavior should remain in std or platform-specific crates.

  • A hypothetical core::path::Path<Syntax> should only model documented lexical syntax.

  • It should not claim to exactly match Windows, Linux, macOS, or any particular filesystem or operating system.

  • The analogy to core::net is limited but useful: core::net::SocketAddr is an address value, not a connected socket and not a model of the whole network environment.

  • Likewise, core::path would be path vocabulary, not filesystem access.

  • The ecosystem motivation is that a lot of Rust code uses std::path::Path for lexical structure, not actual filesystem I/O. It’s personally my strongest argument, because I can just patch it to core path, see if nothing was actually platform-specific, so it doesn’t have to inherit that from std if my move for these pieces that almost fit like a puzzle is identified (I mean entire Rust crates as library-like independent projects as the puzzle pieces). Because a lot of people write totally awesome code but forget to think at all from this perspective. So we would just upgrade that, fundamentally, and may it be the smallest steps necessary to safely take.

  • That creates unnecessary std coupling for code that could otherwise work in core, alloc, or no_std contexts.

  • A lower-level lexical path type could give crates a shared vocabulary for components, prefixes, file names, extensions, relative/absolute handling, and lexical normalization.

  • This could reduce duplicated partial parsers and accidental host-platform assumptions, while keeping native path behavior in std::path / std::fs.

// core::path 

pub enum Unix {}
pub enum Windows {}

pub struct Path<Syntax: ?Sized> {
    /* borrowed lexical bytes */
}

pub struct Component<'a, Syntax> {
    /* borrowed component */
}

// or go more neutral for any syntax
core::path::Path<syntax::Posix>
core::path::Path<syntax::Dos>

// cleanest abstraction is probably
core::path::Path<Syntax>

My view is only towards Rust <-> Rust no_std or platform scenarios, more like, linker and compiler code compatibility that just is naturally kind of better if its common grounded.
So that Rust Codebase Glues more easy towards itself, is my perhaps small overall just beneficial intention here.

pub trait PathSyntax {
    const PRIMARY_SEPARATOR: u8;
    const HAS_ROOT: bool;
    const CASE_SENSITIVE_DEFAULT: bool;
...
}

Then a presets could->maybe, but not written in stone, by grammar shape look like

core::path::syntax::Slash
core::path::syntax::DrivePrefix

pub enum Slash {}
pub enum DrivePrefix {}

type SlashPath = Path<Slash>;
type DrivePath = Path<DrivePrefix>;

RelativePath<Sep: AsciiChar> could be made platform independent fairly easily. A relative path is just (conceptually at least) an array of OsStr separated by a character. Most of the complexity in Path comes with having different types of root.

There is also the problem that Win32 allows both / and \ as separators but that could be worked around, either by RelativePath taking multiple separators or by it being an error on use in a Windows context (like null in Path is today).

2 Likes

I want to thank you for your responses and actually it is a too big of a side-quest for myself to navigate. Plus i have the Standpoint that a Time Duration Instant kind of Idea or Experiment would be more worthwile.