symfony: [Cache] PdoTrait clear method uncanny error handling

This is not really an issue, but an edge case, reproducible with 4.2, and probably higher versions too.

Today I stumble a bug which at first seemed to be arbiratry and non reproducible, our CI was randomly failing with PgSQL transactions errors such as:

Doctrine\DBAL\Driver\PDOException: SQLSTATE[25P02]: In failed sql transaction: 7 ERROR: current transaction is aborted, commands ignored until end of transaction block

This was mainly failing during tests. After an hour or so trying to reproduce it, I finally found a simple way to reproduce it which was to wipe out the database schema, running doctrine migrations to recreate, then run the failing test (in real life, not really an hour, just exaggerating a bit because really, it’s not a easy one and some developers could literally loose days stumbling upon errors like this).

After an extra hour of debugging (we are working in messenger based application, some messages are processed synchronously and some are not) I finally reached the error, which was not due to our code.

A bit of context first, our application has a wrapper around the bus that wraps up SQL transaction handling at the bus level, and cascading messages within the bus may honnor transactions all the way down, from the top level message to the most deepest one, it’s a business requirement and we have heavily tested transaction handling code behind that. In order to achieve this, we nest transactions using SQL SAVEPOINT, such as:

Top level messsage received:
  - SQL BEGIN
    - Top level message processing
    - second level message sent synchronously
      - Second level message receiced
         - SQL SAVEPOINT
            - Second leve message processing
         - if exception then
            - ROLLBACK TO SAVEPOINT
            - raise exception
         - else
            - DO NOTHING
      - end of second level message
  - another second level message sent synchronously
    - Second level message receiced
      - SQL SAVEPOINT
        - ... and do the same as upper
  - if exception then
    - ROLLBACK ALL
  - else
    - COMMIT
End of top level messages

In real life, that’s a bit more complicated, but this is enough to explain the problem we stumbled upon.

This whole algorithm works fine only at one conditions: SQL errors are uncatched by business code and catched by our custom dispatcher around the bus.

Now, let’s look what happens when SQL errors are raised by PgSQL but kept silent by some PHP code, let’s assume that we had the following PHP code within the second level message processing:

try {
    $connection->execute("SELECT non_existing_method();");
} catch (\Throwable $e) {
    // Here be silent about the error.
}

What happens then is that the SQL transaction running atop of it is left in an erroneous state, but everything is silent. Now consider the another second level message (it could be anything, or business code that handles itself a transaction), during the SQL SAVEPOINT call, since the SQL transaction is in an invalid state, it creates an exception, which just says:

Doctrine\DBAL\Driver\PDOException: SQLSTATE[25P02]: In failed sql transaction: 7 ERROR: current transaction is aborted, commands ignored until end of transaction block

With SQL errors being handled either by:

  • catching exceptions in a BEGIN … END procedure, thus not leaving the SQL transaction in a broken state,
  • leting the exception pass, and let the wrapper catch it and ROLLBACK TO SAVEPOINT gracefully.

This bug would not trigger.

Real problem here is no the fact that it does fail, actually I’m glad it does, it proves the transaction handling rather strict and robust, but it is that the error was silently catched and almost impossible to detect, it’s not even within the same execution branch the later error arise and is show, which makes step debugging even more painful.

The code that broke the transaction is from Symfony\Component\Cache\Traits\PdoTrait:

    /**
     * {@inheritdoc}
     */
    protected function doClear($namespace)
    {
        $conn = $this->getConnection();

        /// ...

        try {
            $conn->exec($sql);
        } catch (TableNotFoundException $e) {
        }

        return true;
    }

Real error is that the cache table does not exists. Whereas this code is actually valid, in some sense, when called during transactions, especially using savepoints, it will create false negative, cause rollbacks to happen, or cause COMMIT operations to explode later, and make the whole absolutely impossible to debug properly for a human being.

Now I’m aware this is a very edgy case, reason it happens on our CI is because it works recreating a new environment each run, which cause the cache_items table to be missing every new run. But ideally, this could be improved.

Now considering the project I’m actually working on, it’s not a blocking nor critical bug, because we have many solutions to work around, the easiest one will be to add a new migration to create the missing table even before any test run: problem solved.

But regarding Symfony, this could be solved by either:

  • leting the exception pass, I’m aware this would be a real regression for many projects,

  • providing a finer method to know if table exist, run it at least once to avoid raising exceptions and leaving the SQL state broken, but it would be a performance regression, doing an extra SQL query on every HTTP request for every PDO cache registered,

  • documenting in the cache guide that tables can and should be created manually to avoid this kind of errors,

  • write a SQL procedure such as:

    BEGIN
        DELETE FROM cache_items -- ... conditions
    EXCEPTION
        WHEN
            undefined_table
        THEN
            CREATE TABLE -- ...table definition
    end;
    

    but sadly this is PL/PgSQL code, and each driver will need its own variant.

This is not really a bug report, but it’s something, in my opinion, that would need some thoughts and could be improved.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (17 by maintainers)

Most upvoted comments

Before: why don’t we use createTable in a try/catch in the warmer?

That’s what I wanted to do, but errors are not normalized, using a specific method on the adapter, we can keep SQL DBAL vs PDO logic within whenever possible, and SQL business isolated from the rest of the world, even the warmup phase itself.