Routing and extraction in Tide: a first sketch

Another thing that I havn’t seen in the blog entry or this discussion: What about subrouters?

I want to be able to define a router that might define for example “GET /login”, “POST /login”, “POST /logout” etc in a function provided by a library, and have the application be able to “mount” that under e.g. “/auth”, making the application support “GET /auth/login”, “POST /auth/login”, “POST /auth/logout” etc. Is there a plan for this in Tide?

Ideally, I’d even want to be able to tell the function that some argument(s) has been extracted before the point where the subrouter is mounted, so that the handlers used by the subrouter should get that argument as well as any extracted by the subrouter.

There are other downsides: it requires adding type paramters throughout most of the core framework types to track the number of “holes” present, and dealing with those type parameters in extractor implementations. When you get things wrong, the error you’d be presented with from the compiler would be a type mismatch about large, ghost type parameters that are used for this type-level programming. By contrast, catching the error at construction time makes it very easy to provide a clear, informative error message.

These kinds of obscure errors are part of what I’m trying to avoid in the design by favoring “plain” Rust. I’m already a bit bummed by the need for the ghost Kind parameter for the Endpoint trait, but that at least is almost entirely invisible to users of the framework.

In short, there’s non-trivial costs in complexity and overall UX to introducing this extra level of tracking.

OTOH, there are lot of bugs that the type checker doesn’t catch, but that tests do. In this case we have a situation where we can guarantee catching mistakes if you run any test whatsoever. Testing and debugging is part of the normal workflow; why eat the above costs on this very specific kind of trivial bug that we’re guaranteed to catch during the normal development process?

In other words:

These remarks sound reasonable when stated at that level of generality, but when you look at the sum total of practical concerns in play in this particular case, the tradeoff just doesn’t make sense, for the reasons outlined above. I think it’s always important in design discussions to avoid dogmatically following general principles, and instead look at the end-to-end UX.

16 Likes

Interesting question! My personal instinct is to support dynamic configuration unless there’s a strong reason not to. Can you say more about what you have in mind re: “static config done really strongly”? What features/benefits would you anticipate?

Yes, I fully agree! In particular, it should be possible to apply these kinds of “guards” to a bundle of endpoints and have that clearly laid out in the table of contents. What I’m trying to avoid, though, is for the guard process to be part of route selection and instigate a fallback/rematch behavior when a guard fails. I’ll follow up on this in the next post.

Agreed, thanks!

Yes indeed! I didn’t want this post to get too large, but mounting subrouters in the way you describe is easily compatible with the sketched API.

Agreed. With const generics, it might be possible to actually use these names, i.e. by saying foo: Path<"id", u64>.

2 Likes

This is what I was coming here to talk about.

I’m uncomfortable with the positional {} argument syntax paired with signatures like (mut db: AppState<Database>, id: Path<usize>, msg: Json<Message>), because it feels like it requires the same kind of non-local reasoning that Rust eschews by refusing to infer the types of function arguments.

I don’t like that, if I were to pick up an unfamiliar codebase, I’d have to check information that’s part of neither the async fn set_message declaration nor the app.at("/message/{}").put(set_message); call and may not even be part of the same crate (ie. impl Extractors on the types) to feel confident that I understood which positional arguments in the URL map to which function arguments.

Named arguments (ie. {id}) would avoid that problem.

2 Likes

Just to clarify this point: I’d be pretty surprised if that happened much in practice; I’d expect the framework to provide extractors like Path and Glob that correspond directly to the URL matching syntax, and for those to be nearly universally used. Custom extractors will almost certainly be focused on parsing out other request data.

IOW, looking at just the endpoint definition, it should always be clear which arguments are extracted from the URL.

That said, I do agree that there are downsides to doing this positionally: I can imagine refactoring the URL structure of your app, but forgetting to update the endpoints, and getting some confusing errors as a result. With named parameters, either things would just keep working, or you’d more quickly get to a clear-cut error.

It’s worth talking about what Actix-Web does here, which is basically “support both”. In particular, for Path<T> in Actix, the T needs to be deserializable from the path match. So you can do named matches, but then you need to write a dedicated struct with field names matching the URL pattern. That seems pretty heavyweight to me, and I’d like to see if we can do better.

As I mentioned above, one possibility would be to eventually leverage const generics, making it possible to write the name directly in the type, getting the best of both worlds: Path<"id", u64>.

1 Like

So I’m relatively neutral on this specific proposal, but I think it’s worth highlighting an important feature of the compile-time vs. test-time nature of this, specifically in terms of cycle time for development. In my experience, there seem to be rough order-of-magnitude differences between the feedback you get from—

  1. the compiler (especially with RLS)
  2. automated test suites (assuming they’re running on every build change)
  3. manual local testing
  4. CI (presumably == automated equivalent of what you do in 2 and 3)
  5. deploying to staging (for manual testing)
  6. deploying to live (i.e. when production traffic is meaningfully different from what you can test on staging)

That doesn’t mean we should put everything in the compiler all the time. Indeed, there are some things it is too expensive or difficult to test anywhere but production (all the way up at layer 6)! However, I think explicitly being aware of the cost in cycle time is useful, and being explicit about why we think it’s worth slotting into layer 2 vs. layer 1 (or layer 3 or 4 or 5 or 6!).

I think a lot of people who end up in Rust tend to prefer putting as much as possible into layer 1 here because we have so often been bitten by things that are at layer 2 in other languages or frameworks and take a lot of time to figure out why they broke at layer 2. This happened to me in a C♯ app just last week, and chasing it down was not fun.

Again: that’s not an argument for what to do in this case; it’s just trying to make explicit what I think is implicit in a lot of these discussions so it can be more effectively discussed.

Edit: I am going to extract this and elaborate on it slightly and turn it into a blog post. Seems more generally useful. :nerd_face:

Edit later with the promised post: Scales of Feedback Time in Software Development

11 Likes

This completely makes sense. It would be even more awesome, if we can pipe these filters for certain routes.

At the very minimum, can’t we have a generic fallback that will re-direct to the error page for example.

Yep, I agree 100% with what you spelled out there, and didn’t mean to say there’s no benefit for layer 1 vs layer 2. It’s just the tradeoffs in this very specific case that concern me.

2 Likes

Ah, yes, we’ll want some mechanism along those lines. Ideally this will fit naturally with whatever mechanism we use for “guards” as well.

To be clear, what I mean by “fallback” is more specifically: a case where a URL is matched by a route, but we then cancel that match and look for another, lower-ranked match.

Well, the example I alluded to is a good one, but perhaps not useful if you’re not familiar. In the embedded hal / ecosystem, there’s a pattern for allocating hardware resources (mapping timers, dma controllers, device pins, etc) at compile time that prevents conflicts with type-checking failures. It does involve macros but it’s pure delightful magic.

In the context of Tide, I assume this would need to involve declaring a number of things as more explicit types, rather than strings: path elements and endpoints and filter middleware. Then there would be compile failures based on (say) the borrow checker discovering that two endpoints were both trying to claim the same path entity.

To me, this would be most useful in a model where the routing is not table-of-contents based. If I declare in an endpoint function which bit of namespace it should be listening to, compile-time resolution and conflict checking seems more valuable. It provides immediate feedback as I’m constructing something that’s inherently distributed and hard to keep track of in my head, but that the tooling can do well - just like in the hardware resource case. It perhaps also supports better re-use, since I can assemble an app / service out of multiple components without risking a runtime conflict detection/crash. It seems closer to what Rocket does than the model here.

On the other hand, it means that every web app has to be compiled, which is far more suitable to something I’m going to flash to a microcontroller than it does to many web apps.

Hence my question: what is Tide targeting? If it’s targeting more dynamic configurations and environments, perhaps the routing setup API needs to factor that in, returning Results for attempted routing changes rather than panicing?

As an analogy: like IP routing, where there can be overlapping routes, but there’s always an unambiguous choice (the most specific route, and perhaps some other priorities and metrics for detailed cases) for any given packet - and then the packet goes on to whatever later fate from there.

It’s in large part a question of semantics: is the routing decision based on matching some set of declarative conditions (url, method, other things) that can be hashed into a simple lookup, or is there any opportunity for dynamic routing evaluation (I can write some hook to influence the routing decision). Hooks might look like filters or guards, or they might look like extractors that help construct the destination for routing lookup, it’s all just about how these are arranged and where and when they run.

If routes aren’t going to be allowed to ‘reject and retry the next-best route’, regardless of whether we call it a match failure or a fallback after processing reject, the other classic mechanism is a kind of internal redirect: handlers manipulate the request in some way and it returns a response that doesn’t go back to the client, but resubmits the manipulated request to route again from the top, presumably to a new target.

These kinds of mechanisms can be dangerous and tricky to get right, but this is the web, and one of the dangers is that if something useful isn’t provided server-side, it will be done using 30x redirects via the client and expose more application logic to potential tampering by untrusted clients.

A clear evaluation of the use cases for this, and a clear articulation of how to use whatever inheritance/specialisation/fallback/redirect/rewrite mechanism to achieve things for users, is important.

That reminds me of another reason it’s at odds with what I understand to be idiomatic in normal Rust code: It has the same weaknesses @killercup pointed out in having a function take multiple bools instead of a set of purpose-specific two-variant enums.

As a Canadian, this reminds me of how, when you see 01/02/2020, you have no idea whether that’s February 1st (Canadian DD/MM/YYYY date format) or January 2nd (American MM/DD/YYYY date format) without out-of-band information.

For example, many sites have URLs of the form /namespace/{int}/{int}/ which could get messed up in a refactoring. Examples include:

  • The StackExchange ecosystem uses /a/{answer}/{userid} for their “share a link to this answer” URLs.

  • Fanfiction (and original fiction) sites tend to use these patterns:

    • /s/{story_pk}/{chapter_idx}/* (Fanfiction.net and FictionPress)
    • /works/{story_pk}/chapters/{chapter_pk} (Archive of Our Own)
    • /Story-{story_pk}-{chapter_idx}/* (Twisting the Hellmouth)
    • /story/{story_pk}/{chapter_idx}/* (FiMFiction)

Note that FiMFiction also exposes chapter primary keys via their /chapters/download/{chapter_pk}/{format_name} route, making for a third integer value which could be confused with the others during a refactor.

The C2 lists on Fanfiction.net are an especially noteworthy example because they essentially abuse the path component of the URL to pass a bunch of integer query parameters.

Suppose they wanted to switch from PHP to Rust as part of a plan to reduce the tendency for bugs and regressions to slip in, but they don’t want to break anyone’s bookmarks.

Should the router API really be introducing another place to mess up a URL like this?

https://www.fanfiction.net/community/Alternate-First-Contact-War/88942/99/4/1/0/10/0/0/

Bear in mind that most of those parameters appear to be indexes into <select> elements, so confusing them would produce no error… you’d just get subtly wrong content in the results listing that gets returned. (eg. if you mixed up the values for the sort order and time range filter, it’d still work and you probably wouldn’t readily notice.)

(Sorry for the delayed reply. The power went out while I was typing the first version of this and it only just came back on.)

1 Like

Obviously, the answer to this is frunk::LabelledGeneric. (This suggestion is only partly in jest. It’s a works-today verison of const str in types.)

EDIT: context provided below.

Yep, as I said up thread, I definitely agree that there are downsides here!

What’d be most helpful is to brainstorm ideas for how to solve the problem while still achieving the other goals laid out in the post. I mentioned a couple possibilities in my earlier reply:

  • The Actix approach for named URL parameters, which requires defining a custom struct for the endpoint with field names corresponding to match variables (see the “Path” section here). This approach would fit fine with the proposed model for Tide, but I worry about the ergonomics.

  • Holding out for const generics so that you can write the name as part of the type (Path<"id", T>).

What do people think about these? Are there other options?

Interesting! I wasn’t able to make heads or tails of the linked API page – do you have an example handy that would show how it could be used this way?

Four blog posts by the author(s) of frunk:

HList: https://beachape.com/blog/2017/03/12/gentle-intro-to-type-level-recursion-in-Rust-from-zero-to-frunk-hlist-sculpting/
Generic: https://beachape.com/blog/2017/02/04/rust-generic-not-generics/
LabelledGeneric: https://beachape.com/blog/2017/03/04/labelledgeneric-in-rust-what-why-how/
Sculpt: https://beachape.com/blog/2017/04/12/boilerplate-free-struct-transforms-in-rust/

(The chronological order is LabelledGeneric, HList, Generic, Sculpt, but this reading order is better for understanding the type-level hackery going on here; this follows the abstraction layers.)

1 Like

Hah, encoding type-level strings by creating a type for each possible character and using tuples of them to form identifiers – yikes! While I agree it could be made to work, that approach would definitely cut against the goals around sticking to “plain Rust”.

Here’s another idea, one that’s a midpoint between Actix and const generics. We could introduce a FromUrlSegment trait:

trait FromUrlSegment {
    const MATCH_NAME: &'static str;
    type Error;
    fn parse(segment: &str) -> Result<Self, Self::Error>;
}

and have Path<T> require that T: FromUrlSegment, using the MATCH_NAME to determine which match to extract.

Then you could define:

struct UserId(u64);

impl FromUrlSegment for UserId {
    const MATCH_NAME: &'static str = "user_id";
    type Error = ();
    fn parse(segment: &str) -> Result<UserId, ()> { /* ... */ }
}

and for some route "/users/{user_id}" you could have an endpoint:

async fn get_user(id: Path<UserId>) -> Json<User>;

Interestingly, this approach promotes best practices anyway, in the sense that it’s good practice to avoid working directly with e.g. raw u64 values precisely because of the potential for confusion; wrapping with UserId makes the intent more clear.

It doesn’t seem too far-fetched to have a custom derive for FromUrlSegment too…

9 Likes

Riffing further on this: we could do away with Path<T> entirely, in favor of directly deriving an Extractor impl:

#[derive(Extractor)]
#[extractor(path, component = "user_id")]
struct UserId(u64);

which gets rid of the annoying need for .0 to strip away the Path wrapper. (This of course assumes that you want to use the UserId wrapper type rather than immediately working with the raw u64).

14 Likes

Thinking about the run-time vs. compile-time checking, it does seem like perhaps the correct approach here is to stick with the run time approach (I agree it’s generally easier to understand and is simpler from the user’s perspective due to being “just Rust”) for the core tide crate but provide some API (perhaps in a separate crate) that generates the appropriate invocation and checks it. This is kind of similar to how structopt works for Clap, though there I believe there’s not much compile-time verification happening; adding that should be feasible though, I’d guess. That way we get the best of both worlds: compile-time checking for more serious applications where the cost is worth it and run-time checking for prototyping.

I think the other benefit here is that the macro crate might not need to be written now; it could be fleshed out separately. If tide exposed a “check if this set of arguments is correct” without actually starting the server, it would probably be fairly easy to implement the macro-based API, I assume – even if it doesn’t get the full benefits.

2 Likes

I found the repo: https://github.com/rust-net-web/tide

Is that the current state that contributors should start from or are there local changes that are waiting to be pushed?