graphql-engine: "table or event-trigger not found in schema cache" error in logs
Just upgraded from 1.2.2 to 1.3.0, noticed this following error message popping up more than once:
graphql-engine {"type":"event-trigger","timestamp":"2020-07-24T23:37:43.617+0000","level":"error","detail":{"path":"$","error":"table or event-trigger not found in schema cache","code":"unexpected"}}
How do we go about debugging this error / what would be the cause? Hasura still functions normally, but the log message spams almost every second or minute.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 18 (9 by maintainers)
Commits related to this issue
- Fix event trigger cleanup on deletion via replace_metadata (fix #5461) — committed to lexi-lambda/graphql-engine by lexi-lambda 4 years ago
- Fix event trigger cleanup on deletion via replace_metadata (fix #5461) — committed to lexi-lambda/graphql-engine by lexi-lambda 4 years ago
- Merge branch 'master' into sync-event-triggers-gh-5461 — committed to lexi-lambda/graphql-engine by tirumaraiselvan 4 years ago
- Merge branch 'master' into sync-event-triggers-gh-5461 — committed to lexi-lambda/graphql-engine by kodiakhq[bot] 4 years ago
- Fix the way replace_metadata drops event triggers (fix #5461) (#6137) https://github.com/hasura/graphql-engine/pull/6137 — committed to hasura/graphql-engine by lexi-lambda 4 years ago
- Fix the way replace_metadata drops event triggers (fix #5461) (#6137) https://github.com/hasura/graphql-engine/pull/6137 — committed to codingkarthik/graphql-engine by lexi-lambda 4 years ago
- Fix the way replace_metadata drops event triggers (fix #5461) (#6137) https://github.com/hasura/graphql-engine/pull/6137 — committed to codingkarthik/graphql-engine by lexi-lambda 4 years ago
- Fix the way replace_metadata drops event triggers (fix #5461) (#6137) https://github.com/hasura/graphql-engine/pull/6137 — committed to codingkarthik/graphql-engine by lexi-lambda 4 years ago
This has happened to us on multiple occasions specifically when we delete event triggers. Hasura seems to go crazy with it until we clear the event logs.
To diagnose I recommend the following:
There should be no event triggers listed that have been removed or look out of place.
If there are then clear them individually to avoid clearing out too much data that you might need from other event triggers. Replace <event_trigger_name>:
These have fixed the issue for our case.
It’d be great for the Hasura team to understand what’s really causing this but as you mentioned it seems related to infinite loops of retries or something.
Hi @tirumaraiselvan - I work with @petecorreia and I have a little more detail to this issue which might help you track things down; specifically I think there is perhaps two separate but related bugs here that combine to cause this issue.
Before detailing the two issues we found, a little context on our setup:
cli-migrations/v2image when deploying updates to run migrations and to update metadata because we only enable the graphql API for security purposes.Issue 1 - Metadata not applying properly
On investigating our DB which is currently suffering from this issue via
psqlwe noticed that a set of triggers that should no longer exist were still present on some tables despite no longer being in the metadata files. It’s not clear to us how this has happened. The process we followed to remove the triggers was to remove them via the Hasura console, commit the changes totables.ymland then deploy an updated docker image containing the updated metadata using thecli-migrations/v2which theoretically should apply that metadata before starting Hasura.It seems at some point this metadata application on deploy failed, and then once this had happened Hasura wasn’t able to detect that the triggers had not been removed (I understand this may be difficult because there may also be user created triggers within the DB).
So at this point we have the unfortunate situation that Postgres still has triggers attached to a table, but from Hasura’s perspective these triggers no longer exist.
Issue 2 - Event invocations for missing triggers does not update tries on the
hdb_catalog.event_logtableWhen inspecting the
event_logtable, looking at some of the invalid calls I noticed the following issue: when Hasura sees an event for an invalid trigger it does not seem to update thetriesfield on the event_log table. This means that each event never hit themax_retriesvalue so Hasura keeps trying these bogus events forever.In this example entry from the
hdb_catalog.event_tableyou can see that this event was created at 13:00 and Hasura has been retrying this event every minute since then, buttriesremains at 0.I also note that the
errorfield isfalsewhen perhaps that should also betrue.Thanks in advance for your help
@tirumaraiselvan We’ve noticed a more serious issue related to these errors. It’s now clear that these errors are leading to an overload of DB connections from Hasura.
It can lead to failures in deployments as a new instance can struggle to connect to the DB, getting rejected due to max connections. We use kubernetes with replicated instances and rolling updates so we run into this often unfortunately.
You can see evidence of this in these two charts:
DB Connections match up perfectly to the amount of these errors per second.
The gap in the middle is when we cleared up the errors as mentioned in previous comment (#5461)
Improving the logs is definitely a step in the right direction but ideally this would be fixed?
Hey folks, while we roll out a fix for this in the next few days (v1.3.3). You can run this query to clear any invalid postgres triggers:
Note that you may need to clear the events which might have been created due to these invalid triggers:
Courtesy: https://github.com/hasura/graphql-engine/issues/5461#issuecomment-706095505
Another way is to run this SQL, before any
metadata apply:Unfortunately, the improved error msg in https://github.com/hasura/graphql-engine/pull/5718 didn’t get included in v1.3.2. We will definitely try to incorporate it in the next one.
Meanwhile, pls check the comment https://github.com/hasura/graphql-engine/issues/5461#issuecomment-664463460 for getting to know the offending event trigger.
@tirumaraiselvan this error still appears after updating to 1.3.2