Kickstarting a database-wg

It uses prepared statements internally, just like every other part of diesels api.

Using this layered approach will involve a significant performance overhead because you need to go back and forth between a typed representation and several string representations. (As seen from the 6. layer). Diesel does currently something differently in some cases: It skips building the actual sql query, because we have already build that query before and it's now in the prepared statement cache. The corresponding prepared statement could be received using a statically known type id. Such optimizations cannot work if you have to pass a sql statement though several layers that expect some sort of sql string.

Returning a "non-exhaustive enum of all possible types across all databases" is not an option here because some database systems allow user defined types.

How would that be to the currently provided derives for Queryable, Insertable and AsChangeset?

Hm, then how would the API for Diesel drivers look like? Or are you suggesting that for diesel the Driver is tightly coupled to the implementation?

From what I know about Diesel (and it's not much) it turns queries like select * from users into a type like Selectable<Star, From<users>>, and uses its TypeId to cache that query.

At the lowest level there is the Connection trait. Everything inside of that is implementation specific, though there are several shared components like the StatementCache or the TransactionManager.

Basically yes (The types are a bit different). That only works in specific cases, otherwise the sql is used as key.

Ok, so hypothetically speaking, if some future ConnectionDriver returned PrepareStatement or at least some kind of PreparedStatementId would that still work for Diesel, or does Diesel require some specific scenario for it to cache its SQL statements?

Hi @spacekookie. I was wondering if there is any update on this working group?

2 Likes

Since the governance working group has been established, it could be interesting to create a issue on their github repository if there are people intererested in starting the working group.

I would wait with async till the current futures API is fully stabilized and battle tested. It’s already enough pain to transition futures-0.1 stuff over and I think we’ll still see a lot of changes in patterns and best practices.

  1. let SQL be SQL, we do not need something like
someTableStruct.select(foo).where(bar).desc().
  1. Mappers to map Records from DB row to real Structs. the mapper rules need to be easy customized and easy to extend.
  2. more DB drivers.
  3. type safe, away from SQL injection,
  4. dynamic SQL like mybatis did.
  5. IDE friendly

While in principle I agree, here are some problems with your statements

These are opposing requirements. The moment you want type-safety, you lose plain SQL, the moment, you want dynamic queries (without SQL injections) you lose plain SQL.

In essence, SQL doesn't have any type-safety, but you can abstract that type-safety, using various tricks.

someTableStruct.select(foo).where(bar).desc() is an attempt to make runtime errors in SQL into compile errors in your IDE/console.

Sure you can write like in MyBatis via configurations or annotations, but those aren't IDE friendly, not in the same way that regular language constructs are IDE friendly. Basically, MyBatis' way is the most IDE unfriendly way to achieve your goals. Followed closely by proc-macro.

That said, I'm not sure there is a right or wrong answer here. Each approach is a trade-off.

Go for extra strictness? You either go with super complex type as diesel/proc-macro and you end up with some kind of monomorphization monster that takes ages to compile.

Go for a more lax approach? Well, now you get more runtime errors, and refactoring code becomes a huge problem.

Go for just plain SQL in configuration/annotation? Same as the previous attempt, only more IDE hostile.

What does type safe mean here? Is this in reference to type safety when parsing a stream of bytes?

I don't think this is true. You still have the concept of prepared statements and functions. The sql injection issue comes from constructing strings and not escaping the data correctly. In a prepared statement, the query string and parameters are sent over the wire to the db separately and it's up to the db to interpolate the strings.

I mean people could still construct strings but surely no one does that. That would be like storing passwords as plain text.

Man, you should take this show on the road. I fell out of my chair with laughter with that joke. :slight_smile:

5 Likes

Just don't allow to pass arbitrary strings to execute as SQL. Instead, only allow to execute an SqlString datatype that only supports these operations:

  1. Concatenating &'static str, ensuring they have balanced quotes and inserting spaces before and after
  2. Concatenating datatypes that are escaped and single quoted if they are strings and surrounded by spaces.
  3. Concatenating &str as a single identifier that is escaped and double quoted as an SQL identifier and surrounded by spaces

This makes SQL injection impossible without using unsafe transmutes, while allowing all use cases.

I'm sorry for chipping in and I tried searching through the comments but couldn't find it: It would be nice to have something like Diesel that can be used with different backends. I don't even mean diesel with it's ORM, I mean something that accepts SQL and can be used with sqlite/postgres/mariadb/.. This prevented me from ever using diesel, as it was a) pretty darn complicated compared to my good old self made SQL and b) also didn't work for multiple backends. (Using auto-increment ? already impossible with sqlite in diesel, so not even on compilation-per-backend usable) I started using sled for all internal storage so far, which is nice but doesn't work when you want to give the user a choice in his favorite DB.

If we have this non-typesafe version we can still put something like diesel on top.

2 Likes

It would be nice to have something like Diesel that can be used with different backends.

Using different backends only works as long as you have an application that uses SQL only for simple things like a single filter or something like that. As soon as you try to build complexes things you will run into the following issues:

  • Need to use database specific SQL dialects to write a query in a performant way
  • Slightly differences in behavior of some SQL constructs in different databases
  • Features just not supported by one of the databases, need to find a hacky workaround for them

To workaround this points there a 2 solutions in my opinion:

  • Fall back to a commonly accepted SQL subset, which removes basically all advanced and even some "common" features
  • Make it clear which functionality works with which backend

Additionally side note: It is certainly possible to develop diesel applications with swapable backend. You just need to stick to SQL set supported by all databases. See diesel_cli for an example.

I don't even mean diesel with it's ORM, I mean something that accepts SQL and can be used with sqlite/postgres/mariadb/..

As already stated above: There is sql_query in diesel for exactly this use case. Also I don't think diesel is a classical ORM, more a query builder with some functionality to map data to data structures. Both parts are separated, it's possible to use only one or the other.

Do you mean that there is no way to get back the generated id using diesel? If so yes this is not supported by diesel because the underlying databases (sqlite and mysql) do not provide anything which would allow us to implement a solution that works in every use case (like postgres returning feature). By not providing some solution here out of the box we basically force the user to make their own decision based on their use case here.

If we have this non-typesafe version we can still put something like diesel on top.

See my comments above. It don't think it would be that simple, at least not without making compresses at performance.

I was thinking about something like ODBC, of course it's then up to the user to use only a subset of supported SQL.

i don't know why everybody expects async version, i am sure jdbc is a blocking process and whenever they want a sync call they just open a thread and wait for the response , whats i am saying is its not hard to wrap your db calls with async function if its necessary than making it default to async.

:smiley: no hard feeling my question regarding this workgroup is is this going to focus on protocol like java does with jdbc or is this working group responsible for making db drivers?

It doesn't exactly work this way. To create typesafe driver, you will need a definition of the data that is (sent to/from or stored in, depending on your approach, diesel is the only ORM that I know that's tied to wire protocol, others focus on database storage model) database so that you can map driver types to Rust types that are convenient for people to use. The layers are mixed, its not really that one is "on top of the other".

I think you misunderstood. When you execute a statement on DB you use an interface that's type safe (e.g. Users.select().where(id>3)) or type unsafe (e.g. SELECT * FROM users WHERE id=3), it's next to impossible to combine these two interfaces. Also MyBatis does exactly that, it executes SQL from annotations.

Right, but I was responding to the idea that the "type unsafe" version being susceptible to SQL injection. But it's usually written query("SELECT * FROM users WHERE id=?", 3) and then on the wire the string and parameters are sent separately, so there is no injection taking place. So SQL injection doesn't have to be a risk when using raw SQL. But you could get some people who incorrectly construct strings by misuisng the API. :frowning:

As for the type safety of the interface, I think Users.select().where(id>3) is only facetiously type safe. It is translated into a query in a wire format and sent. Then the result comes over the wire and is parsed. It's certainly not going to be more type safe than the results of a raw SQL in a query being converted using e.g. serde to a particular struct format (e.g. query::<User>("SELECT * FROM user WHERE id=?", 3);).

And I said you misunderstood me. I never claimed that type-safe interface is sql injection safe, just that interfaces are incompatible. Although type safety is in theory easier to check for sql injection than random string (e.g. is this production string safe "SELECT * FROM " + table + " WHERE "+ sqlWhere?).

In essence you can't have type safety and type dynamism at the same time.

The advantage of type-safety is that your API is easier to be checked at compile time, not to mention you can prohibit some syntax on type level. E.g you can pass at compile time "SELECT * FROM FROM FROM", but this is impossible in type-safe manner.