graphql-engine: "table or event-trigger not found in schema cache" error in logs

Just upgraded from 1.2.2 to 1.3.0, noticed this following error message popping up more than once:

graphql-engine {"type":"event-trigger","timestamp":"2020-07-24T23:37:43.617+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}

How do we go about debugging this error / what would be the cause? Hasura still functions normally, but the log message spams almost every second or minute.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 18 (9 by maintainers)

Commits related to this issue

Most upvoted comments

This has happened to us on multiple occasions specifically when we delete event triggers. Hasura seems to go crazy with it until we clear the event logs.

To diagnose I recommend the following:

select  trigger_name, archived, count(*) from hdb_catalog.event_log
group by trigger_name, archived;

There should be no event triggers listed that have been removed or look out of place.

If there are then clear them individually to avoid clearing out too much data that you might need from other event triggers. Replace <event_trigger_name>:

DELETE FROM hdb_catalog.event_invocation_logs
  WHERE event_id IN (
    SELECT id FROM hdb_catalog.event_log
    WHERE trigger_name = '<event_trigger_name>' );

DELETE FROM hdb_catalog.event_log
  WHERE trigger_name = '<event_trigger_name>';

These have fixed the issue for our case.

It’d be great for the Hasura team to understand what’s really causing this but as you mentioned it seems related to infinite loops of retries or something.

Hi @tirumaraiselvan - I work with @petecorreia and I have a little more detail to this issue which might help you track things down; specifically I think there is perhaps two separate but related bugs here that combine to cause this issue.

Before detailing the two issues we found, a little context on our setup:

  • we have two Hasura replicas connected to the same Cloud SQL DB
  • we use the cli-migrations/v2 image when deploying updates to run migrations and to update metadata because we only enable the graphql API for security purposes.
  • our Hasura instances are currently logging ~30 error messages per second caused by invalid event triggers and maxing out available connections to the DB which causes other processes to fail

Issue 1 - Metadata not applying properly

On investigating our DB which is currently suffering from this issue via psql we noticed that a set of triggers that should no longer exist were still present on some tables despite no longer being in the metadata files. It’s not clear to us how this has happened. The process we followed to remove the triggers was to remove them via the Hasura console, commit the changes to tables.yml and then deploy an updated docker image containing the updated metadata using the cli-migrations/v2 which theoretically should apply that metadata before starting Hasura.

It seems at some point this metadata application on deploy failed, and then once this had happened Hasura wasn’t able to detect that the triggers had not been removed (I understand this may be difficult because there may also be user created triggers within the DB).

So at this point we have the unfortunate situation that Postgres still has triggers attached to a table, but from Hasura’s perspective these triggers no longer exist.

Issue 2 - Event invocations for missing triggers does not update tries on the hdb_catalog.event_log table

When inspecting the event_log table, looking at some of the invalid calls I noticed the following issue: when Hasura sees an event for an invalid trigger it does not seem to update the tries field on the event_log table. This means that each event never hit the max_retries value so Hasura keeps trying these bogus events forever.

-------------------------------------------------------------------------------------------------------------
id            | 6fe105d7-085e-447c-ab15-5a7c7eee9125
schema_name   | public
table_name    | <redacted_table_name>
trigger_name  | <redacted-event-trigger-name>
payload       | {"op": "DELETE", "data": {...}": {"x-hasura-role": "admin"}}
delivered     | f
error         | f
tries         | 0
created_at    | 2020-10-05 13:00:22.745178
locked        | f
next_retry_at | 2020-10-05 15:50:23.507937
archived      | f

In this example entry from the hdb_catalog.event_table you can see that this event was created at 13:00 and Hasura has been retrying this event every minute since then, but tries remains at 0.

I also note that the error field is false when perhaps that should also be true.

Thanks in advance for your help

@tirumaraiselvan We’ve noticed a more serious issue related to these errors. It’s now clear that these errors are leading to an overload of DB connections from Hasura.

It can lead to failures in deployments as a new instance can struggle to connect to the DB, getting rejected due to max connections. We use kubernetes with replicated instances and rolling updates so we run into this often unfortunately.

You can see evidence of this in these two charts:

Screenshot 2020-09-30 at 14 38 44 Screenshot 2020-09-30 at 14 38 58

DB Connections match up perfectly to the amount of these errors per second.

The gap in the middle is when we cleared up the errors as mentioned in previous comment (#5461)

Improving the logs is definitely a step in the right direction but ideally this would be fixed?

Hey folks, while we roll out a fix for this in the next few days (v1.3.3). You can run this query to clear any invalid postgres triggers:

curl -d'{"type": "run_sql", "args": { "sql": "select 1", "check_metadata_consistency": true}}' localhost:8080/v1/query

Note that you may need to clear the events which might have been created due to these invalid triggers:

    WITH valid_trigger_names AS (
        SELECT DISTINCT name FROM hdb_catalog.event_triggers
    ), invalid_event_ids AS (
        SELECT id FROM hdb_catalog.event_log
        WHERE trigger_name NOT IN (SELECT * FROM valid_trigger_names)
    ), delete_event_invocations AS (
        DELETE FROM hdb_catalog.event_invocation_logs
        WHERE event_id IN (SELECT * FROM invalid_event_ids)
    )
    DELETE FROM hdb_catalog.event_log
    WHERE trigger_name NOT IN (SELECT * FROM valid_trigger_names);

Courtesy: https://github.com/hasura/graphql-engine/issues/5461#issuecomment-706095505

Another way is to run this SQL, before any metadata apply:

CREATE TEMP TABLE invalid_triggers AS
WITH valid_event_triggers AS (
  SELECT name FROM hdb_catalog.event_triggers
),
  archive_invalid_events AS (
  UPDATE hdb_catalog.event_log set archived = 't'
  WHERE trigger_name NOT IN (select name from valid_event_triggers)
)
SELECT routine_name FROM information_schema.routines
WHERE routine_type='FUNCTION' AND specific_schema='hdb_views' AND routine_name NOT IN (
  SELECT 'notify_hasura_' || name || '_INSERT' FROM valid_event_triggers
  UNION
  SELECT 'notify_hasura_' || name || '_UPDATE' FROM valid_event_triggers
  UNION
  select 'notify_hasura_' || name || '_DELETE' FROM valid_event_triggers
);

DO $$ DECLARE
r RECORD;
BEGIN
  FOR r IN (SELECT routine_name from invalid_triggers) LOOP
    EXECUTE 'DROP FUNCTION IF EXISTS hdb_views.' || quote_ident(r.routine_name) || ' CASCADE';
  END LOOP;
END $$;

Unfortunately, the improved error msg in https://github.com/hasura/graphql-engine/pull/5718 didn’t get included in v1.3.2. We will definitely try to incorporate it in the next one.

Meanwhile, pls check the comment https://github.com/hasura/graphql-engine/issues/5461#issuecomment-664463460 for getting to know the offending event trigger.

@tirumaraiselvan this error still appears after updating to 1.3.2

2020-09-07T23:39:06.453708315Z {"type":"event-trigger","timestamp":"2020-09-07T23:39:06.086+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:39:07.454725277Z {"type":"event-trigger","timestamp":"2020-09-07T23:39:06.086+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:39:07.454784718Z {"type":"event-trigger","timestamp":"2020-09-07T23:39:06.086+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:40:06.592350211Z {"type":"event-trigger","timestamp":"2020-09-07T23:40:06.204+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:40:06.592429803Z {"type":"event-trigger","timestamp":"2020-09-07T23:40:06.204+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:40:07.593581597Z {"type":"event-trigger","timestamp":"2020-09-07T23:40:06.204+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:41:06.748399829Z {"type":"event-trigger","timestamp":"2020-09-07T23:41:06.327+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:41:06.748460254Z {"type":"event-trigger","timestamp":"2020-09-07T23:41:06.327+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:41:07.749948355Z {"type":"event-trigger","timestamp":"2020-09-07T23:41:06.327+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:42:06.893702941Z {"type":"event-trigger","timestamp":"2020-09-07T23:42:06.455+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:42:06.893762517Z {"type":"event-trigger","timestamp":"2020-09-07T23:42:06.455+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:42:07.895079037Z {"type":"event-trigger","timestamp":"2020-09-07T23:42:06.455+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:43:07.038807559Z {"type":"event-trigger","timestamp":"2020-09-07T23:43:06.597+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:43:07.038866520Z {"type":"event-trigger","timestamp":"2020-09-07T23:43:06.597+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:43:08.040231802Z {"type":"event-trigger","timestamp":"2020-09-07T23:43:06.597+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:44:07.180090676Z {"type":"event-trigger","timestamp":"2020-09-07T23:44:06.742+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:44:07.180130289Z {"type":"event-trigger","timestamp":"2020-09-07T23:44:06.742+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:44:08.181308236Z {"type":"event-trigger","timestamp":"2020-09-07T23:44:06.742+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:45:07.319800084Z {"type":"event-trigger","timestamp":"2020-09-07T23:45:06.867+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:45:07.319856215Z {"type":"event-trigger","timestamp":"2020-09-07T23:45:06.867+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:45:08.320483837Z {"type":"event-trigger","timestamp":"2020-09-07T23:45:06.867+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:46:07.526498516Z {"type":"event-trigger","timestamp":"2020-09-07T23:46:06.970+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:46:07.526577878Z {"type":"event-trigger","timestamp":"2020-09-07T23:46:06.970+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:46:08.528339850Z {"type":"event-trigger","timestamp":"2020-09-07T23:46:06.970+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:47:07.691373407Z {"type":"event-trigger","timestamp":"2020-09-07T23:47:07.107+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:47:07.691451154Z {"type":"event-trigger","timestamp":"2020-09-07T23:47:07.107+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:47:08.692429740Z {"type":"event-trigger","timestamp":"2020-09-07T23:47:07.107+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:48:07.846698088Z {"type":"event-trigger","timestamp":"2020-09-07T23:48:07.246+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:48:08.848335482Z {"type":"event-trigger","timestamp":"2020-09-07T23:48:07.246+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:48:08.848402382Z {"type":"event-trigger","timestamp":"2020-09-07T23:48:07.246+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:49:07.999542997Z {"type":"event-trigger","timestamp":"2020-09-07T23:49:07.387+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:49:09.000834633Z {"type":"event-trigger","timestamp":"2020-09-07T23:49:07.387+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:49:09.000902762Z {"type":"event-trigger","timestamp":"2020-09-07T23:49:07.387+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:50:08.172744735Z {"type":"event-trigger","timestamp":"2020-09-07T23:50:07.501+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:50:09.174826859Z {"type":"event-trigger","timestamp":"2020-09-07T23:50:07.501+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
2020-09-07T23:50:09.174894688Z {"type":"event-trigger","timestamp":"2020-09-07T23:50:07.501+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}