Routing and extraction in Tide: a first sketch

FWIW, I'm totally happy with this design. Tide is providing a strong "incorrect code won't run" guarantee and I don't personally care if this is done at compile time or run time.

Could the "Compile-Time Checked" API be a layer above the non-compile-time checked API? It seems like it could. That way everyone is happy and this could be implemented either in parallel with the higher-level, compile-time checked API or with a mindset to not preclude it.

That would be a good compromise to my mind and the best of both worlds.

Thoughts?

As a general statement, probably yes; but the inverse is perhaps the better solution. In other words, it might be more optimizer friendly to build the run time checked API atop of the compile time checked one if possible.

I'm having difficulty imagining how that would work (not saying it can't). Could you provide some notional idea of how that might work.

So approaching this from a general POV (as opposed to the specifics of web-stuff/Tide -- so I cannot say anything about the feasibility of this for Tide), if you consider for example a simply typed lambda calculus represented in as an AST Term in some Agda encoding (example I found), then you can have a run-time checked version of this which is RawTerm; to go between these you have typecheck : RawTerm -> Maybe Term and then forget : Term -> RawTerm. You'd similarly run a sort of "mini" type checker mapping the run-time version to the compile time version.

A large part of the compile-time guarantees spoken for by @josh and some others in not just something that hypothetically might be possible to implement, but actually something that is implemented and seems to work fine in warp. And I for one (more) would really love to see that kind of static compile-times route sanity guarantees in ā€œthe standard Rust web service frameworkā€ if there ever is one.

Back in nickel i even wrote a macro to make url-matching statically guaranteed. In case it might inspire anything, hereā€™s the macro definition and hereā€™s some usage of it.

If the only place you can run the router (as opposed to the entire server) is on your staging environment, then I think you have other issues. Having a mechanism to allow testing server code in a local development is a key part that software engineers often overlook, to their (and their codeā€™s) peril. If this encourages people to improve that situation, that would be a good thing.

More specifically, if warp does compile time well, then why should tide do that as well? I for one, find warpā€™s routing hard to understand. I didnā€™t read the code or dig deep, but I read the posts about it and just sort of assumed it worked rather than stopping to understand it. If Tide is supposed to be easy to understand, I think that sort of response to someone reading about the routing in tide would be a really bad thing.

I think the discussion about compile-time checking of routes is missing the forest for the trees. I think it is a nice feature, but it is not in my top 3 biggest concerns when choosing a web framework. This routing and extracting proposal is well-done and I do not have any major concerns.

Also, I appreciate the design goals. I am a big fan of avoiding macros and code generation at the core. I also like that the routing is kept simple. I like that frameworks, such as warp, are experimenting with new ways of describing routes. Tide should be standardizing on well understood approaches.

6 Likes

I quite agree, but there already exists several well-done web frameworks in Rust, and I think "the tide should rise" and take the best from all of them.

5 Likes

:+1:

I'd like to see Tide adopt the best-in-class approaches from existing frameworks.

And one of the things that most appeals to me about writing web code in Rust is how much the Rust compiler has my back and checks things for me at compile time.

1 Like

Another thing that I havnā€™t seen in the blog entry or this discussion: What about subrouters?

I want to be able to define a router that might define for example ā€œGET /loginā€, ā€œPOST /loginā€, ā€œPOST /logoutā€ etc in a function provided by a library, and have the application be able to ā€œmountā€ that under e.g. ā€œ/authā€, making the application support ā€œGET /auth/loginā€, ā€œPOST /auth/loginā€, ā€œPOST /auth/logoutā€ etc. Is there a plan for this in Tide?

Ideally, Iā€™d even want to be able to tell the function that some argument(s) has been extracted before the point where the subrouter is mounted, so that the handlers used by the subrouter should get that argument as well as any extracted by the subrouter.

There are other downsides: it requires adding type paramters throughout most of the core framework types to track the number of "holes" present, and dealing with those type parameters in extractor implementations. When you get things wrong, the error you'd be presented with from the compiler would be a type mismatch about large, ghost type parameters that are used for this type-level programming. By contrast, catching the error at construction time makes it very easy to provide a clear, informative error message.

These kinds of obscure errors are part of what I'm trying to avoid in the design by favoring "plain" Rust. I'm already a bit bummed by the need for the ghost Kind parameter for the Endpoint trait, but that at least is almost entirely invisible to users of the framework.

In short, there's non-trivial costs in complexity and overall UX to introducing this extra level of tracking.

OTOH, there are lot of bugs that the type checker doesn't catch, but that tests do. In this case we have a situation where we can guarantee catching mistakes if you run any test whatsoever. Testing and debugging is part of the normal workflow; why eat the above costs on this very specific kind of trivial bug that we're guaranteed to catch during the normal development process?

In other words:

These remarks sound reasonable when stated at that level of generality, but when you look at the sum total of practical concerns in play in this particular case, the tradeoff just doesn't make sense, for the reasons outlined above. I think it's always important in design discussions to avoid dogmatically following general principles, and instead look at the end-to-end UX.

16 Likes

Interesting question! My personal instinct is to support dynamic configuration unless there's a strong reason not to. Can you say more about what you have in mind re: "static config done really strongly"? What features/benefits would you anticipate?

Yes, I fully agree! In particular, it should be possible to apply these kinds of "guards" to a bundle of endpoints and have that clearly laid out in the table of contents. What I'm trying to avoid, though, is for the guard process to be part of route selection and instigate a fallback/rematch behavior when a guard fails. I'll follow up on this in the next post.

Agreed, thanks!

Yes indeed! I didn't want this post to get too large, but mounting subrouters in the way you describe is easily compatible with the sketched API.

Agreed. With const generics, it might be possible to actually use these names, i.e. by saying foo: Path<"id", u64>.

2 Likes

This is what I was coming here to talk about.

I'm uncomfortable with the positional {} argument syntax paired with signatures like (mut db: AppState<Database>, id: Path<usize>, msg: Json<Message>), because it feels like it requires the same kind of non-local reasoning that Rust eschews by refusing to infer the types of function arguments.

I don't like that, if I were to pick up an unfamiliar codebase, I'd have to check information that's part of neither the async fn set_message declaration nor the app.at("/message/{}").put(set_message); call and may not even be part of the same crate (ie. impl Extractors on the types) to feel confident that I understood which positional arguments in the URL map to which function arguments.

Named arguments (ie. {id}) would avoid that problem.

2 Likes

Just to clarify this point: I'd be pretty surprised if that happened much in practice; I'd expect the framework to provide extractors like Path and Glob that correspond directly to the URL matching syntax, and for those to be nearly universally used. Custom extractors will almost certainly be focused on parsing out other request data.

IOW, looking at just the endpoint definition, it should always be clear which arguments are extracted from the URL.

That said, I do agree that there are downsides to doing this positionally: I can imagine refactoring the URL structure of your app, but forgetting to update the endpoints, and getting some confusing errors as a result. With named parameters, either things would just keep working, or you'd more quickly get to a clear-cut error.

It's worth talking about what Actix-Web does here, which is basically "support both". In particular, for Path<T> in Actix, the T needs to be deserializable from the path match. So you can do named matches, but then you need to write a dedicated struct with field names matching the URL pattern. That seems pretty heavyweight to me, and I'd like to see if we can do better.

As I mentioned above, one possibility would be to eventually leverage const generics, making it possible to write the name directly in the type, getting the best of both worlds: Path<"id", u64>.

1 Like

So Iā€™m relatively neutral on this specific proposal, but I think itā€™s worth highlighting an important feature of the compile-time vs. test-time nature of this, specifically in terms of cycle time for development. In my experience, there seem to be rough order-of-magnitude differences between the feedback you get fromā€”

  1. the compiler (especially with RLS)
  2. automated test suites (assuming theyā€™re running on every build change)
  3. manual local testing
  4. CI (presumably == automated equivalent of what you do in 2 and 3)
  5. deploying to staging (for manual testing)
  6. deploying to live (i.e. when production traffic is meaningfully different from what you can test on staging)

That doesnā€™t mean we should put everything in the compiler all the time. Indeed, there are some things it is too expensive or difficult to test anywhere but production (all the way up at layer 6)! However, I think explicitly being aware of the cost in cycle time is useful, and being explicit about why we think itā€™s worth slotting into layer 2 vs. layer 1 (or layer 3 or 4 or 5 or 6!).

I think a lot of people who end up in Rust tend to prefer putting as much as possible into layer 1 here because we have so often been bitten by things that are at layer 2 in other languages or frameworks and take a lot of time to figure out why they broke at layer 2. This happened to me in a Cā™Æ app just last week, and chasing it down was not fun.

Again: thatā€™s not an argument for what to do in this case; itā€™s just trying to make explicit what I think is implicit in a lot of these discussions so it can be more effectively discussed.

Edit: I am going to extract this and elaborate on it slightly and turn it into a blog post. Seems more generally useful. :nerd_face:

Edit later with the promised post: Scales of Feedback Time in Software Development

11 Likes

This completely makes sense. It would be even more awesome, if we can pipe these filters for certain routes.

At the very minimum, can't we have a generic fallback that will re-direct to the error page for example.

Yep, I agree 100% with what you spelled out there, and didn't mean to say there's no benefit for layer 1 vs layer 2. It's just the tradeoffs in this very specific case that concern me.

2 Likes

Ah, yes, we'll want some mechanism along those lines. Ideally this will fit naturally with whatever mechanism we use for "guards" as well.

To be clear, what I mean by "fallback" is more specifically: a case where a URL is matched by a route, but we then cancel that match and look for another, lower-ranked match.

Well, the example I alluded to is a good one, but perhaps not useful if you're not familiar. In the embedded hal / ecosystem, there's a pattern for allocating hardware resources (mapping timers, dma controllers, device pins, etc) at compile time that prevents conflicts with type-checking failures. It does involve macros but it's pure delightful magic.

In the context of Tide, I assume this would need to involve declaring a number of things as more explicit types, rather than strings: path elements and endpoints and filter middleware. Then there would be compile failures based on (say) the borrow checker discovering that two endpoints were both trying to claim the same path entity.

To me, this would be most useful in a model where the routing is not table-of-contents based. If I declare in an endpoint function which bit of namespace it should be listening to, compile-time resolution and conflict checking seems more valuable. It provides immediate feedback as I'm constructing something that's inherently distributed and hard to keep track of in my head, but that the tooling can do well - just like in the hardware resource case. It perhaps also supports better re-use, since I can assemble an app / service out of multiple components without risking a runtime conflict detection/crash. It seems closer to what Rocket does than the model here.

On the other hand, it means that every web app has to be compiled, which is far more suitable to something I'm going to flash to a microcontroller than it does to many web apps.

Hence my question: what is Tide targeting? If it's targeting more dynamic configurations and environments, perhaps the routing setup API needs to factor that in, returning Results for attempted routing changes rather than panicing?