graphql-engine: Migrations drop and recreate hdb_views dozens of times, causing max_locks_per_transaction to be exceeded
Updating from v1.0.0-beta.6 to v1.0.0-beta.7 results in this error when running sudo docker-compose up -d: ERROR: manifest for hasura/graphql-engine:v1.0.0-beta.7 not found: manifest unknown: manifest unknown
Updating from v1.0.0-beta.6 to v1.0.0-beta.8 results in 502 Bad Gateway in the browser when accessing the console. The ui fails too because all graphql calls get a 502 response.
Updating from v1.0.0-beta.6 to v1.0.0-beta.9 or v1.0.0-beta.10 results in same error.
Reverting to v1.0.0-beta.6 instantly works after sudo docker-compose up -d.
What am I doing wrong?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 43 (16 by maintainers)
@lexi-lambda please have yourself a beer - you made my day!
I can confirm that my app is working fine with v1.1.0-beta.2, without changing
max_locks_per_transaction.Thanks a lot for your hard work!
@barbalex To be honest, I am not certain whether the root cause of the issue you’re seeing is this change or something else… but I realized it doesn’t actually matter, because we’re getting rid of
hdb_viewsfor insert permissions entirely! See #3598.@lexi-lambda I did:
then replaced v1.0.0-beta6 with pull3394-67093178
then
and it works 😄
Yes, my apologies—I was hoping to leave a comment on this thread yesterday, but one more unexpected issue came up that led me to hold off.
The good news: I have been working on a fix for this in #3394, and I think it basically works. It would be great if either of you could try the experimental build in https://github.com/hasura/graphql-engine/pull/3394#issuecomment-566198192 and let me know if it resolves your problem. I would have liked for this change to go into v1.0.0, but it’s a large change, and there are some outstanding subtleties, so I’ve been hoping to have some people try it out before merging it.
The bad news: the change should work fine, but there are some lingering performance issues that seem to stem primarily from a poor interaction with the parallel GC running on machines where the number of cores the OS reports are available is larger than the number of cores
graphql-engineshould probably reasonably be using. For example, on a Heroku free dyno,nprocreports8, sographql-enginecurrently defaults to running on 8 cores. That choice is not a good one, however, as Heroku free-tier dynos are shared, and this seems to create a significant performance hit.I am still looking into the appropriate solution for that, but in the meantime, if you want to try the build, consider restricting the number of cores
graphql-engineuses manually. The easiest way to do that is to set theGHCRTS=-N<x>environment variable, replacing<x>with the number of cores you’d like it to run on. SettingGHCRTS=-N1is a particularly conservative choice, since it will disable parallelism completely, but it will certainly mitigate the pathological behavior.As a final point of note, the performance running the migrations is still not good—on the database you sent me, they take 15-20 seconds. However, they do eventually finish, and since migrating is a one-time cost, I haven’t worried about that too much yet. There are ways we can improve that number much further over time, it’s just a matter of work.
I’m not completely certain. It might help, but it might not, since even squashed migrations could trigger the issue (since they still have individual calls to things like
track_table, IIUC?). If absolutely necessary, it would probably work to drop the catalog information and reapply the metadata in batches so that there aren’t too many individual query operations in eachbulkbatch (and therefore not too many query operations in a single transaction).Hopefully I’ll have a less awkward solution available soon. I’ll update this issue once I have a development build available for testing.
Thanks for this very good and transparent information.
So I know that if I want to update or if I run into any problems before the issue is solved, I will have to migrate the db to a virtual droplet so I can increase
max_locks_per_transaction.