Is interning as a general primitive something that should be in the standard library?

Title: Pre-RFC: List and VariableList: identity-based storage primitives with interning

Every system that accepts input from humans operates on two concepts that are currently absent from core:

  1. A way to store and retrieve values by a stable integer identity (1-based ordinal, where 0 is the null sentinel — the only meaningful null for a computer)
  2. A way to recognize that two inputs are the same value — interning

slice and Vec are memory representations. They have no notion of identity or equality across contexts. You cannot intern a Vec. This is a gap, not a stylistic choice.

I have been using two primitives to fill this gap:

List<T> — fixed-width unit store, addressed by usize identity VariableList<T> — variable-width unit store, same identity model, with interning support

Both are small enough to copy into a project without installing a crate.

Source: context-engine/src/list.rs at main · animagram-jp/context-engine · GitHub

These two concepts — ordinal identity and value interning — are not primitives in the sense of bits or ordering. But they are unavoidable the moment a computer does anything for a human. Every system that serves human intent must identify "which one" and recognize "the same one." That is not a library concern. It belongs in core.

I am not proposing specific API surface yet. I want to know: is this a gap others have felt? And is interning as a general primitive — not just for strings — something worth pursuing in core?


English is not my first language — please forgive any awkward phrasing.

Thank you for reading.

I’m going to assume your intentions are good, but I fear AI has substantially misled you. For one, Vec is not in core, and what you’re describing could at best be in alloc. Second, people can easily implement interning in library crates; for instance, rustc itself implements and uses interning. I also found this crate from a quick search: internment

Rust is extremely likely not going to add this to alloc or std when structures that implement interning:

  • are not widespread “vocabulary types”,
  • do not need special language support (they can be implemented in libraries),
  • libraries for the structure don’t have hundreds of millions of downloads and pressure from users to add the types to std, as with past cases where stuff was added to std.

Personally, I have never felt a gap from lack of interning in std.

Maybe this is a gap in expectations, but it’s extremely normal for Rust crates to pull in a lot of other crates as dependencies.

5 Likes

Thank you for the feedback, Mr. robofinch.

To clarify: I was not suggesting that Vec belongs in core. The mention of slice and Vec was just to say they are not what here I wanted.

You are right that I was imprecise about core vs alloc. Since the implementation relies on alloc, the more accurate claim is that these concepts are absent from core/alloc, and if they were to be added anywhere, alloc would be the appropriate place. That was my mistake.

Thank you again for taking the time to respond.

It’s a fairly bold claim that something is absent from (the standard library) that seems so hard (at least to me) to even understand what exactly you're talking about.

Or maybe there's indeed a language barrier.

Anyway, staring at the definition for List and VariableList you've linked for just a few minutes, I'm having a hard time figuring out what exactly it's used for. Though maybe I could figure it out eventually looking at the example(s) in your documentation for even longer [relevant links for anyone else, since it's more approachable than the raw source code on GitHub: VariableList, List]


You reference interning, though that term would generally more correspond to concepts like the ones implemented by the crate @robofinch linked above, and I'm not completely whether or not the concept you're suggesting is missing is even too strongly related to that.


As @robofinch already mentioned, the standard library (including core, alloc and std) deliberately doesn't contain a lot of things; for more context on that philosphy, feel free to look into previous discussions or write-ups, e.g. this blog post.

If you want to discuss or present the ideas of your crate to others, maybe you could do that first before combining it with the suggestion that it must be included in the standard library. Then the users forum, over on users.rust-lang.org, can be a better place for that kind of discussion. :wink:

3 Likes

Apart from everything else, this is a strong claim, requiring equally strong evidence, which you haven't provided at all.

I would personally argue that interning (and, more generally, memoization) are only sometimes useful. They can't be used, for instance, when you need to process an unbounded stream of data using a constant amount of space, which is the most common case in my experience for data crunching tasks (and yes, that data stream might be coming from a human).

It's also very easy to build an interner on top of things that the standard library already gives you, e.g. HashMap<String, u32>, which means the benefit of adding an "official" interner to the standard library would be quite limited.

1 Like

Thank you, Mr. steffahn and Mr. zackw. I'm sorry to put together my answer.

  • First, I placed "proposal for addition to the standard library" as my conclusion, hoping it would provide a smooth interpretation of why I was seeking agreement in this forum. However, this created unnecessary tension by conflicting with Rust's design philosophy of avoiding increased complexity in the standard library. I sincerely apologize for that, and I am genuinely grateful to those who took the time to reply.
  • Regarding the content and focus of the discussion: the English words I used in my proposal, including "intern (to place inside)," deviate from their narrow meanings in the software field, which I think made the post difficult to follow. It might have been better had my original post started from: "Since any system exists to serve human intent, it must always derive an equivalence concept autonomously from within, rather than relying on the user to define it externally. The most practical means for a computer to do this is a combination of three concepts: binary values, equivalence, and ordering." But, no, perhaps not much would have changed, since that framing still depends on the reader resolving it internally.
  • The suggestion that users.rust-lang org would be a more appropriate venue is also advice I appreciate. I intend to prepare a publicly available system as a concrete basis for discussion before posting there.
  • Overall, I was unaware that each of my design decisions is tightly coupled to structural insights about the real world, making them impossible to present piecemeal. Thank you again for your responses.

Andyou