Routing and extraction in Tide: a first sketch


#41

As an analogy: like IP routing, where there can be overlapping routes, but there’s always an unambiguous choice (the most specific route, and perhaps some other priorities and metrics for detailed cases) for any given packet - and then the packet goes on to whatever later fate from there.

It’s in large part a question of semantics: is the routing decision based on matching some set of declarative conditions (url, method, other things) that can be hashed into a simple lookup, or is there any opportunity for dynamic routing evaluation (I can write some hook to influence the routing decision). Hooks might look like filters or guards, or they might look like extractors that help construct the destination for routing lookup, it’s all just about how these are arranged and where and when they run.

If routes aren’t going to be allowed to ‘reject and retry the next-best route’, regardless of whether we call it a match failure or a fallback after processing reject, the other classic mechanism is a kind of internal redirect: handlers manipulate the request in some way and it returns a response that doesn’t go back to the client, but resubmits the manipulated request to route again from the top, presumably to a new target.

These kinds of mechanisms can be dangerous and tricky to get right, but this is the web, and one of the dangers is that if something useful isn’t provided server-side, it will be done using 30x redirects via the client and expose more application logic to potential tampering by untrusted clients.

A clear evaluation of the use cases for this, and a clear articulation of how to use whatever inheritance/specialisation/fallback/redirect/rewrite mechanism to achieve things for users, is important.


#42

That reminds me of another reason it’s at odds with what I understand to be idiomatic in normal Rust code: It has the same weaknesses @killercup pointed out in having a function take multiple bools instead of a set of purpose-specific two-variant enums.

As a Canadian, this reminds me of how, when you see 01/02/2020, you have no idea whether that’s February 1st (Canadian DD/MM/YYYY date format) or January 2nd (American MM/DD/YYYY date format) without out-of-band information.

For example, many sites have URLs of the form /namespace/{int}/{int}/ which could get messed up in a refactoring. Examples include:

  • The StackExchange ecosystem uses /a/{answer}/{userid} for their “share a link to this answer” URLs.

  • Fanfiction (and original fiction) sites tend to use these patterns:

    • /s/{story_pk}/{chapter_idx}/* (Fanfiction.net and FictionPress)
    • /works/{story_pk}/chapters/{chapter_pk} (Archive of Our Own)
    • /Story-{story_pk}-{chapter_idx}/* (Twisting the Hellmouth)
    • /story/{story_pk}/{chapter_idx}/* (FiMFiction)

Note that FiMFiction also exposes chapter primary keys via their /chapters/download/{chapter_pk}/{format_name} route, making for a third integer value which could be confused with the others during a refactor.

The C2 lists on Fanfiction.net are an especially noteworthy example because they essentially abuse the path component of the URL to pass a bunch of integer query parameters.

Suppose they wanted to switch from PHP to Rust as part of a plan to reduce the tendency for bugs and regressions to slip in, but they don’t want to break anyone’s bookmarks.

Should the router API really be introducing another place to mess up a URL like this?

https://www.fanfiction.net/community/Alternate-First-Contact-War/88942/99/4/1/0/10/0/0/

Bear in mind that most of those parameters appear to be indexes into <select> elements, so confusing them would produce no error… you’d just get subtly wrong content in the results listing that gets returned. (eg. if you mixed up the values for the sort order and time range filter, it’d still work and you probably wouldn’t readily notice.)

(Sorry for the delayed reply. The power went out while I was typing the first version of this and it only just came back on.)


#43

Obviously, the answer to this is frunk::LabelledGeneric. (This suggestion is only partly in jest. It’s a works-today verison of const str in types.)

EDIT: context provided below.


#44

Yep, as I said up thread, I definitely agree that there are downsides here!

What’d be most helpful is to brainstorm ideas for how to solve the problem while still achieving the other goals laid out in the post. I mentioned a couple possibilities in my earlier reply:

  • The Actix approach for named URL parameters, which requires defining a custom struct for the endpoint with field names corresponding to match variables (see the “Path” section here). This approach would fit fine with the proposed model for Tide, but I worry about the ergonomics.

  • Holding out for const generics so that you can write the name as part of the type (Path<"id", T>).

What do people think about these? Are there other options?


#45

Interesting! I wasn’t able to make heads or tails of the linked API page – do you have an example handy that would show how it could be used this way?


#46

Four blog posts by the author(s) of frunk:

HList: https://beachape.com/blog/2017/03/12/gentle-intro-to-type-level-recursion-in-Rust-from-zero-to-frunk-hlist-sculpting/
Generic: https://beachape.com/blog/2017/02/04/rust-generic-not-generics/
LabelledGeneric: https://beachape.com/blog/2017/03/04/labelledgeneric-in-rust-what-why-how/
Sculpt: https://beachape.com/blog/2017/04/12/boilerplate-free-struct-transforms-in-rust/

(The chronological order is LabelledGeneric, HList, Generic, Sculpt, but this reading order is better for understanding the type-level hackery going on here; this follows the abstraction layers.)


#47

Hah, encoding type-level strings by creating a type for each possible character and using tuples of them to form identifiers – yikes! While I agree it could be made to work, that approach would definitely cut against the goals around sticking to “plain Rust”.

Here’s another idea, one that’s a midpoint between Actix and const generics. We could introduce a FromUrlSegment trait:

trait FromUrlSegment {
    const MATCH_NAME: &'static str;
    type Error;
    fn parse(segment: &str) -> Result<Self, Self::Error>;
}

and have Path<T> require that T: FromUrlSegment, using the MATCH_NAME to determine which match to extract.

Then you could define:

struct UserId(u64);

impl FromUrlSegment for UserId {
    const MATCH_NAME: &'static str = "user_id";
    type Error = ();
    fn parse(segment: &str) -> Result<UserId, ()> { /* ... */ }
}

and for some route "/users/{user_id}" you could have an endpoint:

async fn get_user(id: Path<UserId>) -> Json<User>;

Interestingly, this approach promotes best practices anyway, in the sense that it’s good practice to avoid working directly with e.g. raw u64 values precisely because of the potential for confusion; wrapping with UserId makes the intent more clear.

It doesn’t seem too far-fetched to have a custom derive for FromUrlSegment too…


#48

Riffing further on this: we could do away with Path<T> entirely, in favor of directly deriving an Extractor impl:

#[derive(Extractor)]
#[extractor(path, component = "user_id")]
struct UserId(u64);

which gets rid of the annoying need for .0 to strip away the Path wrapper. (This of course assumes that you want to use the UserId wrapper type rather than immediately working with the raw u64).


#49

Thinking about the run-time vs. compile-time checking, it does seem like perhaps the correct approach here is to stick with the run time approach (I agree it’s generally easier to understand and is simpler from the user’s perspective due to being “just Rust”) for the core tide crate but provide some API (perhaps in a separate crate) that generates the appropriate invocation and checks it. This is kind of similar to how structopt works for Clap, though there I believe there’s not much compile-time verification happening; adding that should be feasible though, I’d guess. That way we get the best of both worlds: compile-time checking for more serious applications where the cost is worth it and run-time checking for prototyping.

I think the other benefit here is that the macro crate might not need to be written now; it could be fleshed out separately. If tide exposed a “check if this set of arguments is correct” without actually starting the server, it would probably be fairly easy to implement the macro-based API, I assume – even if it doesn’t get the full benefits.


#50

I found the repo: https://github.com/rust-net-web/tide

Is that the current state that contributors should start from or are there local changes that are waiting to be pushed?


#51

That’s an empty repo at the moment. In the near future I will get things into a good state for contribution and file issues for what needs to be done etc.


#52

I’d like to point out that a big downside of this seems to be that it explodes compile times (although it may be the middleware system in warp that does this, I’m not sure…). Personally I’d rather have faster compiles with runtime startup checking like proposed. If we can get incremental compile times down to say ~2 seconds then doing runtime checking becomes much less of a big deal. And IMO server dev in Rust will be pretty painful anyway if we can’t get the compile times down.


#53

Reposting this from reddit, because I only just noticed the link to this thread:

This is pretty similar to the design I’ve been working on (https://github.com/nicoburns/rustdi). So consider me broadly in favour of the design. I have some thoughts:

  1. I don’t think use of macros is a problem per se. I think the issues with Rocket are:

    • Nightly only
    • No async
    • As per /u/matthieum’s comment, I really dislike the coupling between handlers and routes.

    That said, it probably does make sense to see how far we can get in “normal rust”, and then fall back to macros when necessary. A key thing for me is that when a macro is used, the end-user should understand what the macro results in, and be able to write the same thing manually, even if doing so is less ergonomic and convenient.

    One thing I was using macros for was to allow the user to “request” an owned value, immutable ref, mutable ref, simply by using T, &T or &mut T in their function signature. Not quite how that would work with Extractors which I wasn’t using, but seem like a good idea.

  2. Why does (rust’s type system work such that) the Endpoint trait requires the Kind parameter? I came across this when trying to implement rustdi too. In the following case:

    impl<T, Ret> Foo for T where T: Fn() -> Ret { ... }

    and

    impl<T, Ret, T0> Foo for T where T: Fn(T0) -> Ret { ... }

    Shouldn’t rustc be able to tell that the two implementations are for disjoint sets of types? i.e. that a function/closure with one parameter can never be the same type as a function/closure with zero parameters…

  3. It seems a little unfortunate that the parameters passed into the extractor trait need to be hardcoded. I wonder if there is any way to to generalise that interface such that the parameters available can be expanded by the caller, and such that different frameworks that provided different data could have some level of inter-operation. I guess this would introduce an extra level of indirection, but I am unclear on how significant an impact this would have on performance…

  4. I’m really looking forward to seeing the design for middleware. I liked warp’s take on that. Specifically the property that middleware could be composed and provide values to endpoint handlers, and so long as the handler happened to match the composed middleware’s type signature, it would work (the disadvantage seemed to be compile times and confusing error messages, which are both a pretty big deal IMO). It would be even better if order didn’t matter, and middleware provided values could be ignored by individual handlers as they saw fit.

  5. I’d love to see the database example fleshed out more. What fields would a database handle have (are there internal synchronisation primitives?)? I’m assuming it would somehow enable you to get access to a future that would run on some kind of thread pool / queue? It might just be me, but I’m really struggling to work out what the best way of implementing DB access using an async web framework and a sync DB client (which is what we mostly seem to have atm, and many people seem to think should be entirely sufficient or even preferable to truely async db clients).


#54

I think holding out for const generics seems entirely reasonable. I consider const generics a pretty fundamental missing part of rust that a whole load of APIs won’t be properly ergonomic without. Also, progress on const generics seems to be going pretty well. I’m anticipating it landing on stable sometime in 2019? And can’t see rust web servers being in a “recommended without reservations” state until then anyway…


#55

I really like this idea of using Constant Generics. I can’t wait to have CG support in Rust. It will definitely take things to the next level.


#56

If you manually implement the Fn trait it’s entirely possible to have a single callable take multiple different argument sets (playground example). Still unstable (and a real pain to actually invoke), but the current trait definitions allow for it.


#57

I like what’s in the post and think waiting for some improvements to Rust for a better UX is totally worth it.

Something that seems missing (or that I didn’t see) is a way to resolve (called reverse in Django) a URL in the views.

Let’s say you have the following routes:

order-details: /orders/{id}
checkout: /checkout

and imagine you want to redirect the user after a POST on the checkout route to its order page with the newly created order ID. In Django you would redirect to the URL given by reverse("order-details", kwargs={'id': order.id}).

This requires 2 things (well 3, I’ll expand on the last one a bit later):

  • named routes: if I change a URL, I only want to change it at one place, not everywhere
  • named parameters: we could potentially skip that but it avoids tons of bugs and improve readability so that would be sad to not have them

The last point that would be nice, providing we have named routes is namespaces. It has been mentioned before as subrouters providing some routes.

If I do router.mount("/auth", "auth", &auth_urls) with the parameters being (prefix, namespace, router), I should be able to redirect to /auth/login by doing reverse("auth:login") for example if we follow the Django example and there is a view named login in the auth subrouter.


#58

I think it goes both ways: this can be a showcase for what can be done with the new features, too.


#59

Addressing @aturon’s original post, I think looking at routing and extraction at this early point in the development process is too narrow.

Security, consistency and reliability need to be the guiding considerations in any networking software. If we had these components in a web framework it would take web programming to another level. On the other hand, without them Rust’s value is diminished. Why lean on Rust’s memory-safety if our network software isn’t secure? Why lean on Rust’s reliablity if our network software isn’t fault tolerant? Why prevent data-race inconsistencies if our distributed data isn’t consistent?

Consider the example of a web server that is a very basic digital bank. Lets say it returns a bank balance or debits one account by a u64 while at the same time crediting another account with the same amount. Hopefully, this should be easy to build but unfortunately, without solid security, consistency and reliability support it becomes far from straightforward.

One of Rocket’s good points is that it makes input validation and authorization explicit. One of Actix’s good points is that the Actor model is potentially a good solution for reliability and consistency. I’d like to see something along the lines of the banking server example above as one test of a successful outcome to the Tide project.


#60

It seems like annotations go against the design goals of plain rust. To me, an annotation and a macro are not that different for the user.