Refactoring std for ultimate portability

Coming from the Debian/Ubuntu side of things, there are scripts one runs at every new version of some library software to see if new symbols appeared or got removed compared to previous versions. This is then used to enforce proper versioning - if symbols disappear then you get angry mails from the downstream maintainer. It varies heavily between maintainers how much this is enforced though, but technically you're breaking the ABI if symbols disappear.

This seems very error prone to me. While it works to get the first "just up and running" code, how do you know when you found the last panic, i e, when your program is ready to ship (without implementing the entire pal, of which you only use a fraction)?

I would much prefer many platform dependent crates, i e, pal_allocsystem_unix, pal_file_unix, pal_threads_unix and so on. It helps with crate interdependencies, and it also helps with not having to implement functions you don't need.

That looks very much like what I had in mind.

The question is whether there’d be interest and motivation to do the same for Rust…

I think “pal” is the wrong layer for memchr/memrchr. It’s not a unique or important function in itself, but it’s representative. An efficient way to find a byte value in a &[u8] should be available already in libcore, and it’s needed for slices and strings sooner or later (we just haven’t gotten there yet). At least the portable implementation should be already in there. If possible, it should be an option to inject a platform dependency to use instead.

In comparison, we’re making use of different very platform specific implementations of memcpy that the c library provides. I would want to be able to provide functions of the same type, implemented in Rust.

1 Like

We're breaking the ABI every time we release a new compiler version anyway.

Presumably by running the std test suite. Or grepping for panic!("Not implemented in std").

This can come at a later stage, but as previous attempts at this have shown there are many interdependencies and getting this right is quite complex. While this could (should?) be a future goal, I think we need to implement this proposal first.

We could use weak linking here, and allow memchr to be overriden by libc at link time. I don't know which platforms support it.

I expect Error can be handled with the inner-type pattern, where pal_common defines a common Error type with less features than the one in std.

I'm not sure, but I suspect there are semantic differences that would be exposed.

This would be a reasonable approach.

It's hard to guess whether that is sufficient. io::Error is used for many purposes in std that are not associated with those traits.

It depends on what you mean by platform-specific. The example here showed that it is platform-specific in that the implementation uses platform-specific features.

Those features are just optimizations of convenience though because libc happens to have efficient implementations of memchr (and one other function I can't recall off-hand). There is nothing about the definition of C strings that is tied to an operating system (though the definition of c_char is ABI-specific, c_char is one of the few C types std defines), so one could implement them without calling into libc, and in that sense CStrings are platform-independent.

I'd be interested to know your motivation for not having CString (I inderstand not wanting to link to libc). If it's just because you don't need it that's fine.

There are various ways one could imagine not implementing and/or exposing CStrings. Scenarios are one.

(Looks like CStrings have been well discussed in this thread so I won't comment further).

grep for panic! is the simple answer. I'm not sure if there's any particularly better strategy, except where there are useful defaults. One must either stub out functionality or implement everything at once.

I mentioned this in passing as potential future work. Any appropriate division here will become more clear as the work progresses.

Any chance of factoring out the parts of std that use ambient authority? I’ve been working on this since 2012, but not making much progress.

I’m interested to at least design a lint for it… with that and a call-graph tool, I think I could make good headway.

grep for panic! is the simple answer.

Using a custom macro like pal_default_panic! would make the situation even more obvious. panic! might be used in other parts of the code, so grepping would give more than the relevant results. But that's just a detail.

unimplemented! already exists, and shouldn’t be used in std for anything other than the PAL defaults.

2 Likes

Extrapolating from those two blanket impls, there's the general principle of defining functions with a minimal error, and using where std::io:Error: From<MyMinimalError> in the std wrapper.

I expect Error can be handled with the inner-type pattern, where pal_common defines a common Error type with less features than the one in std.

Maybe that's still needed, but works just fine in conjunction to mine. Impls like Cursor<T>, &mut [u8], Buf* and the other wrappers ought to be perfectly portable and live below the PAL. I think that just lives the standard streams of things in std::io to be defined in the PAL.

I feel I have to mention that PAL is a great name. It’s the German name for the SEP from the Hitchhiker’s Guide to the Galaxy, and indeed for most Rust programmers, the PAL will be somebody else’s problem. :wink:

1 Like

I would love to see a future where everyone uses rust, or at last one where it is much easier to port rust to new platforms :wink:.

Wrt.:

and

some bikshedding/idea

I think both can be done with some think like singular (module?) interfaces , basically interfaces, which have exact one implementation chosen at compiler time at any time (ignoring the "multiple versions of the same crate" case). They can't be used wrt. generics but are only meant to say (at module level?) that some think of a certain structure exists (applicable to structs, enums, modules etc. inclusive possible partial variations of them). A crate could require such a interface to be satisfied (through other crates, possible depending on this crate) or could provide a implementation satisfying.

EDIT: just to clarify singular interfaces are also not meant to be used as trait objects (they are no traits) and don't have to specify the complete public interface the implementation satisfying them has, partial is ok as long as it doesn't conflict.

And while I think such interfaces would be nice for some cases of dependency inversion in combination with platform specify code I'm not sure if I would want see usage of it outside of this cases. And both the UI case and the test harness case mentioned by @DanielKeep can be solved differently, I think.

Related: Pre-RFC: providable

Thanks @jethrogb. The main difference to my idea is, that my idea is not bound to traits and doesn't use impl. With this you can use it for parts, requiring structs. E.g. defining that there has to be a struct PathBuf which's (non trait) impl has a method fn new() -> PathBuf etc. Also that std::path::PathBuf should be a valid path to access it.

is there a way, to make symbols public wrt. to building std. But private wrt. to using it. It probably don't work when rust parts of std are dynamically liked in, but if a single, possible dynamic linked in, std artifact is produced it could work, or? (I'm not saying we should do it, I just wonder if we could do it if it is needed)

When building a shared object you can choose which symbols to expose. So yes, if you were to build the entire runtime into a single shared object a lot of internal symbols would disappear. However, you'd of course lose all the pluggability you'd gain from using separate objects.

Could this functionality be provided by an annotation, using the #[] syntax? Maybe like #[provides(item.path)] for the implementation and #[providable] for the definition.

Maybe I misunderstand those annotations (I don’t know what they’re called in Rust).

I think so, you could see #[provides(item.path)] as some how analog to #[lang="some_lang_item"] but more general. An use #[providable] on a struct, struct+impl or trait. (through there might be some corner cases)

I will move this part of the discussion to the mentioned pre-rfc :smile:

Also while I think that this kind of mechanic can be a grate help for std and other bigger libraries where the logical seperation of components is not allways on the same plane as the os-specific/unspecific seperation, I don’t know if it would be good for the general rust language.

I think per function providables are a good idea.

@brson I’d like to keep moving on this. What would be a path forward on this? Reading this thread there weren’t really any objections to your proposal. Does someone just need to do the work to implement your proposal? Do we need to do some more design first? Do we need an RFC?