Kickstarting a database-wg

Trying to summarise some points that have been raised in this discussion in order to structure it for people joining in later.

  • There’s interest in working on both database abstractions as well as database-development tooling
  • People have feelings about ORMs
  • Common database abstraction interfaces should be a basic priority
    • Especially for pipelining/ batching queries
  • Work on tooling and libraries bottom up: don’t start with an ORM
  • There’s many efforts already in motion: no need to start from scratch

Personally I feel that higher-level abstractions (such as not having to write sql by hand, etc) are desirable. But they can be added on top of a solid base of libraries and tooling later.

I also feel that the points I just listed would make for a good guiding charter of a working group. Feel free to raise new/ challenge existing points though.

2 Likes

I also consider using SQL much more straightforward, but handling every flavor and implementation of the language is a challenge. T-SQL in particular is quite opinionated on what it will and won’t allow. Syntax also differs between different versions of the same RDBMS.

1 Like

Ruby’s sequel library has done a great job at being an SQL builder that manages the split between being generic while allowing to drop down to being database specific quite well. https://sequel.jeremyevans.net/

I’m also sure the author is willing to talk about a few of the particularities there, he’s quite approachable and knowledgeable.

Hi,

Yes as today rust as first step need to expand the list of the supported databases that already exist on the market, and maybe the working group can try to define an enough generic way to interact with databases that feel modern and still allow to have some deep specialized implementations specific for each vendor (yes this easier to say than to do).

Some specific contribution that I think the working group can do is also related to the standards, have some reference implementation of SQL parser and validator for example I think is a good idea.

I’ve a few years of commercial experience build a (NoSQL) database and as well collaborating on database drivers development, so I would really like to be part of an eventual databases interaction API definition process.

as well on my free time I’m one of the few people trying to build some parts of a database in rust, my personal project is Persy so I would definitely like to see at some point in future a few databases fully implemented in rust, but as application developer I prefer to see first a good support of some major existing vendors.

That is way an sql builder is pretty complicated to make. Generate CRUD is easy but that is all.

Is my impression the idea is put all in top an sql builder? That is nice, but tangential. Certainly can be a sub crate.

Not everyone need to have a real abstraction to all the many flavors of SQL in the market. Using a orm/builder “just in case I need to move later to another DB” is niche concern. And unnecessary most of the time. And I talk as somebody that run code across 7 RDBMS! (But how many times need to actually abstract the sql? Just in 1 project among dozens)

Exist many things that truly are universal, and allow to build the rest of the nice “sugar”:

  • A dynamic container of results
  • A trait or enum to hold the database values (For example I see how the sqlite crate and PG one differ on this)
  • How introspect the query results (with columns with names and types, how many rows affected, which was the last ID inserted, etc)
  • How pass raw sql string with optional parameters. A uniform way to send parameters.
  • Not limit the interface to use positional fields, but also the names

Frankly most of this is already covered in the dbapi of python, .NET, etc.


With respect of other ways of build a sql string, Rust have a nice trick with the Into traits. Say the interface allow anything that support Into and we are free to build anything.

Hi and thanks @spacekookie for starting this. I’m the person who wrote the Reddit post about “Are We Database Yet?”, so thanks for sharing this link on the reddit post.

My interests are in:

  • creating common interfaces for databases (async and sync), which reduces overhead for developers spending hours picking up a database crate
  • creating/improving bulk data interaction with databases

Both were sparked by struggling to use a few database crates for different kinds of workloads. I think Diesel would benefit from standardisation as it would become easier to support more databases. Serde support for ‘all’ databases would also benefit users who aren’t using Diesel, but want a type-safe interface into their databases.

I’ve used Apache Arrow since its inception (Python world), and the work that @andygrove and other Rust commiters on Arrow are doing inspired me to start contributing to its Rust implementation.

My personal goal on Arrow is to increase IO support, with our CSV support now only left with date/time read/write, and JSON support underway. I started tinkering with batch/bulk database IO using Arrow, and I’m planning on working on that over the coming months.

I think there might/would already be enough people interested in async, web and OLTP use-cases in general, so I’d like to put my hand up to also work on our OLAP use-cases.

2 Likes

Hi! I’m interested too. I wrote a bunch of database-related crates, and am interested in new things:

  • Sanakirja is at the moment a database backend, where you can store some datastructures in a file, in a transactional way. The concurrency model allows writers and readers in parallel (under certain relatively light restrictions). I’m interested in expanding it to include more datastructures (still transactional).

  • Pleingres is an asynchronous PostgreSQL client, with a few macros to turn Rust structs into SQL requests.

Since I’ve worked quite a bit on patches and versions before, I’m also interested in writing a more declarative migration tool than what is available at the moment. I’m sure we can find sounder algorithms than what has been attempted in the past, that would work for 99% of the cases (or even 100% of my cases).

2 Likes

@spacekookie maybe a good time to create an organisation so that we can track things better there?

1 Like

@Dylan-DPC is it this one? https://github.com/rust-db

2 Likes

Yup, I already created an org. I’ll send you some invites. @spacejam made an organisation repo. Let’s create some tracking issues there and discuss.

Also I’d want to have regular synced meetings, at least in the beginning. This can be either in text or on video, depending on how people feel about stuff.

Hi there, I’m also interested in contributing to a (interface to relational) database WG.

I’m currently (co)maintaining several related crates:

  • wundergraph is a crate that allows to serve your database schema as graphql api in a efficient way. Currently it supports postgresql and sqlite, but it should be easy to add support for whatever database is supported by diesel. That crate is currently WIP and needs some minor improvements for an initial release. (Mostly documentation and cleanup)
  • diesel is a Safe, Extensible Query Builder and ORM for Rust.
  • diesel-oci is an extension crate for diesel that adds support for the oracle database system. It’s working but requires two unstable language features (specialization and re-rebalance-coherence) and contains several bad (but fixable) hacks (related to blob/string sizes)

So generally I’m interested in a working group that focuses on building interfaces to relational databases.

Additionally some random thoughts:

  • Diesel at it’s core is a query builder and not an ORM. In it’s core it provides a (extensible) dsl to construct sql queries and a mechanism to load results into rust types. Additionally it provides several ORM like features on top of that to map insert/changesets/results from/to rust structs. (That’s more what is described as data mapper above). Looking at slick without ever having used it (Just looked quickly though the basic docs there): It looks quite similar to diesel. Most things have a slightly different syntax but are also supported by diesel. I haven’t looked far enough to tell exactly what things are fundamentally impossible in diesel. (Yes some things are not implemented yet, but mostly it is possible to do that in a third party crate)
  • Going all async is a mistake in my opinion. Async only makes sense when the underlying database system actually supports doing that. For example Sqlite does not support async (and in my opinion it doesn’t make sense to support that because the database is actually running in your program, so no io to wait for) Additionally (at least for diesel) an async interface need to solve the following problem: Assume that there is a connection type called Connection that is Send but not Sync the following code should return a future that implements Send because otherwise you won’t be able to interact with tokio:
async fn foo(conn: &Connection) {}
async fn bar(conn: &Connection) {}

async fn do_something(conn: &Connection) {
    await!(foo(conn));
    await!(bar(conn));
}

That brings me to the missing pieces in diesel:

  • Working with schemas that are unknown at compile time: This is mostly possible today with using sql and sql_query, but requires that you know the types that are returned by a given query (Something like: That query will return two intergers). This will hopefully change with diesel 2.0, where we plan to add a more flexible API for that.
  • Using the same table more than once in a query: Diesel is currently missing support for (table) aliasing, therefore it is not possible to reference a table more than once in a query (For example self joins). There is already an proposed API for this, that waits on feedback from actual interested users.
  • Group by support: Diesel currently lacks support for group by clauses and mixing aggregate and none aggregate expressions in a select clause. We plan to improve our internal type system to add support for that, but there are some open questions left.
  • Documentation: It seems that it is quite hard for beginners to actually find what they are searching in diesels documentation. Additional documentation about how to add support for custom types, abstract over diesel or even writing third party back end is basically complete missing. (Getting help on implementing those points would be great)
3 Likes

I think even for basic CRUD it’s not always straightforward. Take the CREATE IF NOT EXISTS Syntax as an example.

For most flavors of SQL this is very simple. For T-SQL, the syntax doesn’t exist. Instead you have to do this:

IF NOT EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'test_table_123') 
BEGIN 
    CREATE TABLE [test_table_123] 
    ( 
        [id] INT NULL, 
        [info] NVARCHAR(50) NULL 
    ) 
END

If you don’t have permissions to read from that system view (which defaults to displaying only what the user has access to) then trying to run this can lead to errors if that table actually does exist. This is especially pernicious if you use separate credentials to run DML and DDL operations (which many do, so that a reader can’t DROP TABLE).

You might say that a user should make sure to give permissions on that view to the service doing the DDL, but users would have to look at your source code to know if you’re using INFORMATION_SCHEMA, sys.tables, or sys.objects…or even OBJECT_ID() IS NULL in your IF NOT EXISTS call. There’s unfortunately more than one way to do this.

All of this is to say, yeah, it’s super hard. :smiley:

Thank you @weiznich for the summary.

So we have the github.com/rust-db/coordination repository so please go ahead and open tracking issues.

Otherwise, discussion can also happen at this zulip stream: https://rust-lang.zulipchat.com/#narrow/stream/193127-wg-database

It’s pretty surprising that Diesel is missing what seem to be extremely basic features like table aliasing and “group by”.

If there is not enough manpower to produce a reasonably complete query builder (i.e. at the very least support the full syntax of SELECT/INSERT/UPDATE/DELETE for PostgreSQL and MySQL), maybe we should start with something simpler like an SQL-based “DB API” that abstracts over databases but does not touch SQL?

And then proceed to add a version with strong typing by passing the SQL to the test database at build time and asking it about the columns of the resultset and their types, as well as the inferred parameter types (at least PostgreSQL can do that with PQdescribePrepared).

This way it’s much simpler to write a crate that can support all features of all databases (albeit without automatic application portability between databases), and using SQL also means that the user already knows how to write queries and doesn’t have to learn yet another query builder.

:+1: I use Dapper in C# because all I really want is a nice way to pass parameters and get the results into a nice type.

Turns out that nobody needed it that bad that they sat down to implement it. Therefore I would conclude that it is not so basic.

That is already there. You are looking for sql_query

The full syntax for SELECT/INSERT/UPDATE/DELETE for PostgreSQL and MySQL is quite large and also a changing target. What is with extension, which should be supported and which not? Therefore I think it is much more important to build the foundational parts of the query builder and make the rest customizable. So everyone is able to extend that part that they need.

That assumes that queries are static, which is not always the case. Diesel does already support the use case of static queries quite well. Beside of that having the type of the query at compile time allows us to do some interesting optimisations regarding prepared statements.

I mean making that a separate crate and focusing on it as Rust’s core way to access SQL databases that everything else builds on top of.

I agree that having a common low-level SQL access API with query(), execute() , … methods would be very valuable.

Go has the sql package, which has the benefit of being in std, but is the accepted standard that all database drivers implement and the entire ecosystem builds on top off.

I have often whished for something similar for Rust. Right now you are forced to deal with considerably different APIs between eg postgres, mysql, rusqlite.

Getting the API right in a generic way might be tricky, especially around a Row type, but it seems achievable to me.

5 Likes

At the first look that sounds straight forward, but unfortunately it is not that easy. I see the following problems with this approach:

  • How does that handle prepared statements which are quite fundamental to write performant applications accessing a database?
    • Does it handle the prepared statements internally by maintaining a cache based on hashing the sql? -> That approach would mean diesel could not use such a crate because we need to handle own own cache for optimization reasons. (For example we do not need to build the actual sql in specific cases)
    • Does it provide a separate api for that? That would greatly complicate the api compared to the currently available diesel sql_query interface.
  • How does that handle binding values as part of the sql query?
    • Does it just use the placeholder that are expected by the database? Those are different for each database, what means that you now have a common api but need different sql strings for each supported database in your application.
    • Does it use one common placeholder and replace it internally with the appropriated one? Now you end up with an API that is doing string processing on sql strings internally. To do that in the right way you need to parse the sql to be sure you don’t replace accordantly the wrong part of the query.
  • How do you handle transforming values to and from the database representation? This does most likely involve some dynamically typed container? Crates building on top of that will have an performance impact from that. For example diesel known in most cases what are the result types of a query and therefore could load that values in the right rust type without any overhead.

Additionally for diesel: There are several features that requires access to the underlying specific database implementation, for example passing a callback as custom function to a sqlite database.

I may be a bit biased here, but nearly everything is already there inside of diesel. Some things are not public yet. So what exactly is missing for you? Maybe we could work on exposing that.

Does the current “sql_query” interface not support prepared statements at all? That doesn’t seem suitable for a production-level API since parsing SQL on every query is suboptimal.

I think an ideal crate would provide three layers:

  1. An explicit API with per-connection prepared statement objects
  2. An API with a query object and an internal cache of per-(query, connection)-pair prepared statement objects for API 1 (with entries dropped when either the query or the connection is dropped)
  3. An API that provides global “interning” of query objects provided by API 2

Applications would normally use API 2, using API 3 only for dynamic queries.

Again, several layers:

  1. An API that just passes the SQL to the database,
  2. A simple string-based SQL builder
  3. Something that can tokenize SQL and do basic transforms between databases like changing placeholders
  4. An untyped SQL AST with a parser, renderer and transforms
  5. A dynamically typed SQL AST
  6. A statically typed SQL AST like what Diesel has currently

Probably have a “ResultReader” interface with a method for every database type that allows to read one value of that type from the database connection and returns an Error (or garbage if the types are not self-described) if the type is incorrect, and also something that returns a non-exhaustive enum of all possible types across all databases.

Also provide database-specific extension traits that allow to read the raw representation.

On top of that, provide higher level interfaces like custom derive deserializers and serde deserializers. Same for serializing.