Kickstarting a database-wg

This is a really interesting idea which I’ve not seen before. To clarify, instead of directly mapping a result set -> struct (or vice versa), you’re building an abstraction layer that goes from DB -> Gateway -> Repository -> Relation -> Changeset -> Struct?

Will each flavor of RDBMS have it’s own gateway implementation? Just want to make sure I’ve got this straight in my head.

Regardless, I’m really interested in participating in this WG! I’m relatively new to Rust (coming from a DBA/SysAdmin/ETL Dev background).

Along those lines, yes. It’s inspired by https://rom-rb.org/ and Elixir Ecto. The nice thing about that method is that it actually doesn’t force DBs to be RDBMS, but also allows for specificity that e.g. RDBMS are queried with SQL, while others in other methods (or specific RDBMS might be queried using their SQL dialect).

Rom.RB is an especially interesting library, as it is at least the 4th attempt of those folks at a proper flexible mapper and the first ones haven’t been bad :).

The whole current idea is that mapping out of the database is an important application concern and that the attempt to uniformly map between domain and database leads to bad design. Also, abstracting too hard over the database, there’s reasons to choose one over the other.

I have two comments here: Rust is not evented async first class, but it has extremely good support for encapsulating async components. It’s completely feasible (and IMHO even useful) to run multiple schedulers in one program and handover is extremely well structured. I agree with you though that evented async DB drivers are not necessarily the first thing I’ll do, but having the interface async helps. Many database interfaces are mapped to thread-pools though and in the following:

async fn read_from_db(id: u64, threadpool: DB) {
    let row = await!(DB.get_row(id));
}

Waiting for a thread to end is much better abstracted in an async fashion then in a sync fashion and very much possible in Rust.

Async maps very well to sync. You can just run one async task in its own lightweight executor.

SQL has a lot of issues making working with it painful for library authors. Django could not be so useful without an ORM. Not to say it should be avoided at all cost, but a layer above is often required and should be easily available if desired.

How you migrate the addition of an extension on PostgreSQL?

By declaring that I want to use some library/application module/crate that requires it and having this thing generating whatever is required for it to work. It might use SQL, it might not (Mysql may use percona instead of just sql), I don’t need to know that. It might do different things depending on a database I am using or its version and configuration.

Or the removal of 10 tables, to be put on one that need a massive SQL rewrite on the middle?

Its trivial with Django: add new table (make migrations), add data migration, update the app to use it , remove 10 tables (make migrations), done. 2 of 3 steps will have automatically generated migrations forward and back. With “write your sql”. its a lot more work that computers do better than people.

“Not write SQL” is nice for BASIC stuff.

Yes. But there are many applications that do a lot of basic stuff, having it done in automated way makes things a lot easier to change. Writing “basic” migration in a database-agnostic way is actually anything but basic.

A 2-layer or 3-layer design can work well

Conceptually maybe, but in practice you will have database driver integrated at all levels, because layers 2 and 3 require features layer 1 doesn’t need or knows about.

1 Like

That is my point actually. I not against build nice stuff on top. But is important that that “stuff” not dominate the design the basic interface, because will constrain the possibilities and also, making hard to actually implement all the possible ideas.

For the other points, is matter of degree. I consider using sql much simpler and easier than try to understand a layer on top, mainly for complex stuff. I have using the Django ORM and SqlAlchemy a lot, so I understand the benefits of have idiomatic sugar. But also, I have need to workaround every single orm library in all my career. So that is just my concern: To keep it simple.

2 Likes

Trying to summarise some points that have been raised in this discussion in order to structure it for people joining in later.

  • There’s interest in working on both database abstractions as well as database-development tooling
  • People have feelings about ORMs
  • Common database abstraction interfaces should be a basic priority
    • Especially for pipelining/ batching queries
  • Work on tooling and libraries bottom up: don’t start with an ORM
  • There’s many efforts already in motion: no need to start from scratch

Personally I feel that higher-level abstractions (such as not having to write sql by hand, etc) are desirable. But they can be added on top of a solid base of libraries and tooling later.

I also feel that the points I just listed would make for a good guiding charter of a working group. Feel free to raise new/ challenge existing points though.

2 Likes

I also consider using SQL much more straightforward, but handling every flavor and implementation of the language is a challenge. T-SQL in particular is quite opinionated on what it will and won’t allow. Syntax also differs between different versions of the same RDBMS.

1 Like

Ruby’s sequel library has done a great job at being an SQL builder that manages the split between being generic while allowing to drop down to being database specific quite well. https://sequel.jeremyevans.net/

I’m also sure the author is willing to talk about a few of the particularities there, he’s quite approachable and knowledgeable.

Hi,

Yes as today rust as first step need to expand the list of the supported databases that already exist on the market, and maybe the working group can try to define an enough generic way to interact with databases that feel modern and still allow to have some deep specialized implementations specific for each vendor (yes this easier to say than to do).

Some specific contribution that I think the working group can do is also related to the standards, have some reference implementation of SQL parser and validator for example I think is a good idea.

I’ve a few years of commercial experience build a (NoSQL) database and as well collaborating on database drivers development, so I would really like to be part of an eventual databases interaction API definition process.

as well on my free time I’m one of the few people trying to build some parts of a database in rust, my personal project is Persy so I would definitely like to see at some point in future a few databases fully implemented in rust, but as application developer I prefer to see first a good support of some major existing vendors.

That is way an sql builder is pretty complicated to make. Generate CRUD is easy but that is all.

Is my impression the idea is put all in top an sql builder? That is nice, but tangential. Certainly can be a sub crate.

Not everyone need to have a real abstraction to all the many flavors of SQL in the market. Using a orm/builder “just in case I need to move later to another DB” is niche concern. And unnecessary most of the time. And I talk as somebody that run code across 7 RDBMS! (But how many times need to actually abstract the sql? Just in 1 project among dozens)

Exist many things that truly are universal, and allow to build the rest of the nice “sugar”:

  • A dynamic container of results
  • A trait or enum to hold the database values (For example I see how the sqlite crate and PG one differ on this)
  • How introspect the query results (with columns with names and types, how many rows affected, which was the last ID inserted, etc)
  • How pass raw sql string with optional parameters. A uniform way to send parameters.
  • Not limit the interface to use positional fields, but also the names

Frankly most of this is already covered in the dbapi of python, .NET, etc.


With respect of other ways of build a sql string, Rust have a nice trick with the Into traits. Say the interface allow anything that support Into and we are free to build anything.

Hi and thanks @spacekookie for starting this. I’m the person who wrote the Reddit post about “Are We Database Yet?”, so thanks for sharing this link on the reddit post.

My interests are in:

  • creating common interfaces for databases (async and sync), which reduces overhead for developers spending hours picking up a database crate
  • creating/improving bulk data interaction with databases

Both were sparked by struggling to use a few database crates for different kinds of workloads. I think Diesel would benefit from standardisation as it would become easier to support more databases. Serde support for ‘all’ databases would also benefit users who aren’t using Diesel, but want a type-safe interface into their databases.

I’ve used Apache Arrow since its inception (Python world), and the work that @andygrove and other Rust commiters on Arrow are doing inspired me to start contributing to its Rust implementation.

My personal goal on Arrow is to increase IO support, with our CSV support now only left with date/time read/write, and JSON support underway. I started tinkering with batch/bulk database IO using Arrow, and I’m planning on working on that over the coming months.

I think there might/would already be enough people interested in async, web and OLTP use-cases in general, so I’d like to put my hand up to also work on our OLAP use-cases.

2 Likes

Hi! I’m interested too. I wrote a bunch of database-related crates, and am interested in new things:

  • Sanakirja is at the moment a database backend, where you can store some datastructures in a file, in a transactional way. The concurrency model allows writers and readers in parallel (under certain relatively light restrictions). I’m interested in expanding it to include more datastructures (still transactional).

  • Pleingres is an asynchronous PostgreSQL client, with a few macros to turn Rust structs into SQL requests.

Since I’ve worked quite a bit on patches and versions before, I’m also interested in writing a more declarative migration tool than what is available at the moment. I’m sure we can find sounder algorithms than what has been attempted in the past, that would work for 99% of the cases (or even 100% of my cases).

2 Likes

@spacekookie maybe a good time to create an organisation so that we can track things better there?

1 Like

@Dylan-DPC is it this one? https://github.com/rust-db

2 Likes

Yup, I already created an org. I’ll send you some invites. @spacejam made an organisation repo. Let’s create some tracking issues there and discuss.

Also I’d want to have regular synced meetings, at least in the beginning. This can be either in text or on video, depending on how people feel about stuff.

Hi there, I’m also interested in contributing to a (interface to relational) database WG.

I’m currently (co)maintaining several related crates:

  • wundergraph is a crate that allows to serve your database schema as graphql api in a efficient way. Currently it supports postgresql and sqlite, but it should be easy to add support for whatever database is supported by diesel. That crate is currently WIP and needs some minor improvements for an initial release. (Mostly documentation and cleanup)
  • diesel is a Safe, Extensible Query Builder and ORM for Rust.
  • diesel-oci is an extension crate for diesel that adds support for the oracle database system. It’s working but requires two unstable language features (specialization and re-rebalance-coherence) and contains several bad (but fixable) hacks (related to blob/string sizes)

So generally I’m interested in a working group that focuses on building interfaces to relational databases.

Additionally some random thoughts:

  • Diesel at it’s core is a query builder and not an ORM. In it’s core it provides a (extensible) dsl to construct sql queries and a mechanism to load results into rust types. Additionally it provides several ORM like features on top of that to map insert/changesets/results from/to rust structs. (That’s more what is described as data mapper above). Looking at slick without ever having used it (Just looked quickly though the basic docs there): It looks quite similar to diesel. Most things have a slightly different syntax but are also supported by diesel. I haven’t looked far enough to tell exactly what things are fundamentally impossible in diesel. (Yes some things are not implemented yet, but mostly it is possible to do that in a third party crate)
  • Going all async is a mistake in my opinion. Async only makes sense when the underlying database system actually supports doing that. For example Sqlite does not support async (and in my opinion it doesn’t make sense to support that because the database is actually running in your program, so no io to wait for) Additionally (at least for diesel) an async interface need to solve the following problem: Assume that there is a connection type called Connection that is Send but not Sync the following code should return a future that implements Send because otherwise you won’t be able to interact with tokio:
async fn foo(conn: &Connection) {}
async fn bar(conn: &Connection) {}

async fn do_something(conn: &Connection) {
    await!(foo(conn));
    await!(bar(conn));
}

That brings me to the missing pieces in diesel:

  • Working with schemas that are unknown at compile time: This is mostly possible today with using sql and sql_query, but requires that you know the types that are returned by a given query (Something like: That query will return two intergers). This will hopefully change with diesel 2.0, where we plan to add a more flexible API for that.
  • Using the same table more than once in a query: Diesel is currently missing support for (table) aliasing, therefore it is not possible to reference a table more than once in a query (For example self joins). There is already an proposed API for this, that waits on feedback from actual interested users.
  • Group by support: Diesel currently lacks support for group by clauses and mixing aggregate and none aggregate expressions in a select clause. We plan to improve our internal type system to add support for that, but there are some open questions left.
  • Documentation: It seems that it is quite hard for beginners to actually find what they are searching in diesels documentation. Additional documentation about how to add support for custom types, abstract over diesel or even writing third party back end is basically complete missing. (Getting help on implementing those points would be great)
3 Likes

I think even for basic CRUD it’s not always straightforward. Take the CREATE IF NOT EXISTS Syntax as an example.

For most flavors of SQL this is very simple. For T-SQL, the syntax doesn’t exist. Instead you have to do this:

IF NOT EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'test_table_123') 
BEGIN 
    CREATE TABLE [test_table_123] 
    ( 
        [id] INT NULL, 
        [info] NVARCHAR(50) NULL 
    ) 
END

If you don’t have permissions to read from that system view (which defaults to displaying only what the user has access to) then trying to run this can lead to errors if that table actually does exist. This is especially pernicious if you use separate credentials to run DML and DDL operations (which many do, so that a reader can’t DROP TABLE).

You might say that a user should make sure to give permissions on that view to the service doing the DDL, but users would have to look at your source code to know if you’re using INFORMATION_SCHEMA, sys.tables, or sys.objects…or even OBJECT_ID() IS NULL in your IF NOT EXISTS call. There’s unfortunately more than one way to do this.

All of this is to say, yeah, it’s super hard. :smiley:

Thank you @weiznich for the summary.

So we have the github.com/rust-db/coordination repository so please go ahead and open tracking issues.

Otherwise, discussion can also happen at this zulip stream: https://rust-lang.zulipchat.com/#narrow/stream/193127-wg-database

It’s pretty surprising that Diesel is missing what seem to be extremely basic features like table aliasing and “group by”.

If there is not enough manpower to produce a reasonably complete query builder (i.e. at the very least support the full syntax of SELECT/INSERT/UPDATE/DELETE for PostgreSQL and MySQL), maybe we should start with something simpler like an SQL-based “DB API” that abstracts over databases but does not touch SQL?

And then proceed to add a version with strong typing by passing the SQL to the test database at build time and asking it about the columns of the resultset and their types, as well as the inferred parameter types (at least PostgreSQL can do that with PQdescribePrepared).

This way it’s much simpler to write a crate that can support all features of all databases (albeit without automatic application portability between databases), and using SQL also means that the user already knows how to write queries and doesn’t have to learn yet another query builder.

:+1: I use Dapper in C# because all I really want is a nice way to pass parameters and get the results into a nice type.