Since I plan to apply for this GSoC project, I hereby present my ideas and a draft for my project proposal. Any discussion, comments and possible feedback is welcome. The following text section outlines my proposal’s concept.
Thank you for your time and input.
Project Proposal
I propose to implement a proof-of-concept tool to check SemVer compatibility of library crates in the Rust ecosystem, as suggested on the list of official projects. The expectation is for it to be integrated with cargo
and surrounding infrastructure to maximize ease of use and adoption amongst library authors. To reach this goal, a solution to three key problems has to be derived:
- The infrastructure necessary to obtain and represent the public interface of a crate during compile time needs to be carefully designed and integrated in
cargo
and a compiler plugin run byrustc
. - Consequently, the interface description needs to be serialized and stored in a well-specified format in a standard location, to allow further processing by the tool and possibly other components in the future. Alternatively, an approach where no information is ever permanently dumped to disk can be pursued, but it in turn needs machinery to obtain interface descriptions for past versions of the currently compiled crate, as well as the current version.
- Finally, the core of the tool, an algorithm to verify
SemVer
-compatibility of two interface descriptions and the corresponding version numbers, needs to be designed and implemented, with special consideration applied to ensure the robustness and flexibility of the resulting code.
This structure to the task is proposed because it allows for maximal separation of concerns in the final product, which in turn eases further refinements and additions to the design of individual components. This is especially useful when we consider that the features need to be implemented across multiple codebases, as some components need direct access to the compiler’s data structures, while others are interacting with cargo
to obtain version information and possibly other metadata.
Based on this rough design, a few discussion points remain:
- We need to determine the most beneficial way of obtaining the interface representation of the already published crate version. The simplest approach is to store it in a serialized form alongside (or even as part of)
Cargo.lock
, that is, in a file on disk. This has a few benefits, most prominently the ability to reuse the data stored that way in other tools and for other purposes. However, the decision to add more metadata to a typical crate’s repository should not be taken lightly. Alternatives include integration with a VCS, such asgit
, or pulling the reference version fromcrates.io
or another source. Both options trade in some of the flexibility of the first approach for a less intrusive change to the already established functionality ofcargo
. Concrete implementation details for the components described above need to be decided upon. A through look at projects with similar requirements and structure, likeclippy
, and it’s integration withcargo
, could serve as guidance here.
Obviously, it is also necessary to outline the basic working of the algorithm enforcing SemVer
compatibility. Since this requires operating on crate interface descriptions, I assume a naive implementation of such a data structure providing a mapping between identifiers and type signatures, and definitions, as well as the set of available types and identifiers.
A rough sketch of the algorithm could look like the following:
- Identify the removal and additions of signatures and/or types between versions. The addition of any element will force at least a minor version bump. The removal of any element, in turn, forces a major version bump.
- For each identifier common between versions, examine the changes to it’s signature, if any. Changes that generalize the definition and can’t introduce any ambiguity in user code (which would result in a compile error), force at least a minor version bump. All other changes are considered breaking, and require a major version bump.
The actual implementation would have to consider a lot more details than presented here, as generics, traits, and different trait implementations are all subtly interacting here. It is important to note, however, that absence of specialization in a change does not imply backwards-compatibility. This hints at a core problem the proposed project will face: The generated version number is only a suggestion, since it’s generated in a conservative fashion whenever possible, but won’t include any information on function implementations in the first iterations of the project.