edgedb-python: Why `execute()` doesn't support arguments?

My understanding that switching from fetchone to execute should be seamless if you don’t need the result. But it’s not the case:

  File "app.py", line 76, in login
    await request.app['edb'].execute("""
TypeError: execute() got an unexpected keyword argument 'cookie_id'

The error is confusing.

Technically, I understand that it’s because there is such thing in a protocol. And that thing in protocol may make sense for scripts or something, but it doesn’t make sense from the point of view of the user of python bindings (other database bindings have no such issue as far as I remember).

So perhaps execute should accept arguments and silently switch to fetchone internally when at least one argument is specified?

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 19 (19 by maintainers)

Commits related to this issue

API rename: "fetch" to "query" This commit renames the following APIs: * fetchall() -> query() * fetchone() -> query_one() * fetchall_json() -> query_json() * fetchone_json() -> query_one_json() Th... — committed to edgedb/edgedb-python by 1st1 4 years ago
API rename: "fetch" to "query" This commit renames the following APIs: * fetchall() -> query() * fetchone() -> query_one() * fetchall_json() -> query_json() * fetchone_json() -> query_one_json() Th... — committed to edgedb/edgedb-python by 1st1 4 years ago
v0.9.0a1 * API rename: "fetch" to "query". See issue #51 for details. — committed to edgedb/edgedb-python by 1st1 4 years ago
v0.9.0a1 * API rename: "fetch" to "query". See issue #51 for details. — committed to edgedb/edgedb-python by 1st1 4 years ago
Rename "fetch*" methods to "query*" See https://github.com/edgedb/edgedb-python/issues/51 for the relevant discussion. — committed to edgedb/edgedb-js by 1st1 4 years ago
Rename "fetch*" methods to "query*" See https://github.com/edgedb/edgedb-python/issues/51 for the relevant discussion. — committed to edgedb/edgedb-js by 1st1 4 years ago
v0.9.0 * API rename: "fetch" to "query" (#51) * Expose `query*()` methods in the Pool class directly — committed to edgedb/edgedb-python by 1st1 4 years ago
v0.9.0 * API rename: "fetch" to "query" (#51) * Expose `query*()` methods in the Pool class directly — committed to edgedb/edgedb-python by 1st1 4 years ago
Rename "fetch*" methods to "query*" See https://github.com/edgedb/edgedb-python/issues/51 for the relevant discussion. — committed to renawolford6/edg-edb-javascript by renawolford6 4 years ago

Most upvoted comments

I started to write a big comment in edb/server/edgecon.pyx to explain why we compile one EdgeDB SimpleQuery message to one PostgreSQL SimpleQuery message only to see that it’s no longer the case. The protocol implementation saw a lot of churn in the beginning so I’ll use this issue to clear a few things up and document what I think should be our API strategy for language bindings.

Simple Query Protocol

Originally we did have a 1 to 1 mapping between our SimpleQuery messages and what we send to PostgreSQL. Eventually we changed our strategy to 1 to many. Here’s why:

A Postgres SimpleQuery message commits transactions immediately on COMMIT even if there happen to occur exceptions right after it:

START TRANSACTION;
  CREATE TABLE foo(...);
COMMIT;

SELECT 1 / 0;
# Error! But the "foo" table has been created already and will stay.

So SQL transactions within a single SimpleQuery are immediately committed. Here’s a more relevant example for us:
```
START TRANSACTION;
  CREATE TABLE foo(...);
COMMIT;

SELECT pg_sleep(100);

START TRANSACTION;
  CREATE TABLE bar(...);
COMMIT;
```
Essentially, we create a “foo” table, wait a bit, and create a “bar” table. The “foo” table is committed right before the pg_sleep() call.
Why is this important in the context of EdgeDB?

PostgreSQL is solely responsible for its state.

EdgeDB is responsible for handling its own state spread across several processes (current schema version, current session and system settings, etc) and for making sure that PostgreSQL’s state is in full sync with EdgeDB’s state.

This complication means that a single SimpleQuery with EdgeQL like the below:
```
START TRANSACTION;
  ALTER TYPE foo { ... };
COMMIT;

SELECT sys::sleep(100);

START TRANSACTION;
  CREATE TYPE bar { ...};
COMMIT;
```
Would be compiled into 3 SQL blocks each run via Postgres’ SimpleQuery:
```
START TRANSACTION;
  ALTER TABLE foo (...);
COMMIT;
```
EdgeDB will know that the first query block modifies the schema state if executed to completion. Therefore EdgeDB will signal all compilers to reload schema.
```
SELECT sys::sleep(100);
```
```
START TRANSACTION;
  CREATE TABLE bar (...);
COMMIT;
```
EdgeDB will know that the last query block also modifies the schema state. Therefore EdgeDB will signal all compilers to reload schema again.

Extended Query Protocol

In PostgreSQL the Simple Query protocol is just one way of executing queries. It can execute a batch of SQL queries but:

it doesn’t allow to use query arguments;
it returns data in TEXT encoding.

There’s an alternative, called Extended Query protocol. It allows to return data in BINARY, to receive query parameters, to prepare named and anonymous statements, etc.

EdgeDB Extended Query protocol is a pretty much one-to-one mapping to Postgres Extended Query protocol. We do a bunch of tricks on the server side to maintain an LRU cache and auto-prepare queries, but generally it’s the same thing.

EdgeDB Protocol & Python API

Currently in EdgeDB Python API we have the following APIs:

Connection.fetchone()—Implemented via our Extended Query protocol version. Returns data, accepts arguments. Only works with single-command queries that return singular results.
Connection.fetchall()—Implemented via our Extended Query protocol version. Returns data, accepts arguments. Works with any single-command query.
Connection.execute()—Implemented via our version of Simple Query protocol. Does not return data (EdgeDB ignores any data returned from Postgres’ Simple Query). Works with any single- or multi-command query.

Paul’s original question is valid: the inability of Connection.execute() to receive arguments is inconvenient. Especially in cases when a user just wants to insert/update some data and doesn’t care about the results.

There’s also another angle how to look at this: in asyncpg we have Connection.executemany() that allows to batch a query with a series of arguments—essentially pipeline it. We do want to have that in EdgeDB eventually.

One way to add “executemany” would be to add a new protocol message that would receive an EdgeQL command(s) and a stream of arguments. This way in the future we can extend Connection.execute() to accept arguments and add Connection.executemany().

That new protocol flow will use a combination of Simple Query and Extended Query Postgres messages. The former is optimal for blocks of SQL without parameters, the latter can receive arguments. It’s all complicated by the fact that we need to add cancellation handling to all of that, making this quite a tricky thing to implement correctly. We’ll do that eventually, though.

Therefore I think that right now it’s better to do nothing. 😃 We can always extend our protocol with this “batch execute” message flow. We can at later point add arguments to Connection.execute().

Another thing to consider: adding all kinds of APIs to the Python driver means that we’d want to add them to all other supported languages — this significantly increases the surface for us. We don’t have that time. We should focus on the hosted version and releasing a stable 1.0.

In conclusion I’d say that the current API we have in Python is forward-compatible. Let’s keep it as is.

1st1 on Dec 1, 2019

@1st1 let’s consider alternative names for the “statement” and the “script” methods, because I’ve seen similar confusion from asyncpg users.

Some links from my slack conversation with @elprans:

There’s some confusion with execute() and kwargs but it doesn’t have enough traction IMO. Some asyncpg issues have 50 stars and 50 comments in them with people posting “+1” – this isn’t one of them.

The bigger issue, though, is that people don’t realize that they can use asyncpg.Connection.fetchval() to run INSERT queries. That one is real and I forgot about it; thanks for reminding.

This isn’t new though, we had multiple discussions about INSERT / fetch() mental mismatch without any good ideas how to fix it. Let’s brainstorm, I guess.

I’ll start. One way would be to re-purpose execute():

Connection.fetchone() would become Connection.fetch() – fetch a singleton-result query. args & kwargs are supported.
Connection.fetchall() would become Connection.execute() – fetch all results of a qurey. args & kwargs are supported.
Connection.execute() would become Connection.run() or Connection.execute_script(). No args/kwargs are supported (at least for now).

1st1 on Nov 30, 2019

Thus, I think it would be worthwhile to briefly mention in the docs for Connection.execute() as a note at the bottom that if the results are desired, queryone() or query() should be used instead.

Absolutely, would appreciate PRs.

Quick question though: should there be some form of deprecation process or anything to smooth the transition from fetchone() -> queryone() and fetchall() -> query()? My initial assumption would be that backwards API compatibility isn’t as much of a concern for EdgeDB with it currently being in alpha, but I’d like to be certain; especially coming from experience w/ CPython where it’s a primary concern.

Of course, the change is backwards compatible. The old fetch*() API is going to be available for some time, albeit with deprecation warnings.

1st1 on Jul 15, 2020

You’re probably better at estimating this specific thing. If you think this is not hard enough I’m okay with that.

Let’s keep the issue open in the meantime until arguments are implemented.

tailhook on Dec 1, 2019