readthedocs.org: Template errors result in long development/production response times
When a template that was included via {% include %}
raises an exception, the request response time explodes. I originally noticed this in local development, but I believe that in https://github.com/readthedocs/readthedocs.org/issues/11061 we might have confirmed that this is a bug in production as well (so far maybe only affecting the beta instance).
In local development, I noticed:
- Postgres container eats up 100% CPU doing what looks like repeated selects on the project table
- The browser takes 60s+ to respond, or times out
- The Docker logs for web container even take about 30-60s to output the stack trace
There is nothing special about my set up though, and my database is 27M.
In debugging https://github.com/readthedocs/readthedocs.org/issues/11061 I noticed:
- The database query takes 20 times longer to execute
- The response time is over 10s, but is not all DB time
- Profiling shows most of the time spent in logging and related modules
- I was not able to confirm where code time was spent in production transactions
To reproduce:
- Open a template that is included in another template via
{% include %}
- Throw a template exception from the included template
- Remove
load i18n
from the template but add/retain a translated string - Access a missing attribute, ie
project.foo.bar
- Remove
- Load the view that would trigger the template
- Notice long response time and long query time in django debug toolbar
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 32 (32 by maintainers)
Commits related to this issue
- Logging: disable nice formatter This is an example to show that disabling the nice formatter, reduces the response time and SQL queries executed when there is a `TemplateSyntaxError`. See https://gi... — committed to readthedocs/readthedocs.org by humitos 4 months ago
- Requirements: update `structlog` and `django-structlog` Related: * https://github.com/readthedocs/readthedocs.org/issues/10954 * https://github.com/readthedocs/readthedocs.org/issues/10933 — committed to readthedocs/readthedocs.org by humitos 4 months ago
Yes, I’m getting the Cloudflare 502 error page and it takes some time to display that page. However, I can’t say for sure that’s more time that just loading the dashboard when it works, since it takes ~6s from my side to get the initial HTML page.
Settings diff I’m using:
Theme diff:
With this configuration and running
supervisorctl restart all
I’m getting a 502. However, I’m not seeing anything in the logs under/var/log/readthedocs/*
. Note that I’m able to access https://beta.readthedocs.org/support/ for example, which is a page that doesn’t depend on theproject_list.html
file --so, the instance is up and running 👍🏼I found the issue. It’s what I guessed it could be in https://github.com/readthedocs/readthedocs.org/issues/10954#issuecomment-1856227684
Disabling the “nice logging output”, I get a similar CPU times and only 12 SQL queries more when it returns 500 (I suppose these are performed to show the traceback in the Django 500 debug page).
I pushed this changes to https://github.com/readthedocs/readthedocs.org/commit/592c108af4cf7678421c00b02692081bc06a020c to show how to disable this nice logger.
I found this discussion, and hoped it was related, but so far nothing there seems to help:
https://forum.djangoproject.com/t/changing-queryset-repr-to-avoid-unexpected-queries/21272/22?page=2
I’ve disabled debug toolbar, logging, and debug mode entirely. This must be something related to
include
or local vars there.Yet another clue:
projects/project_dashboard.html
) and removeload i18n
, the response is very quickprojects/partials/project_list.html
) and removeload i18n
, I notice the bug and the database is hit with looped queries.