readthedocs.org: Template errors result in long development/production response times

When a template that was included via {% include %} raises an exception, the request response time explodes. I originally noticed this in local development, but I believe that in https://github.com/readthedocs/readthedocs.org/issues/11061 we might have confirmed that this is a bug in production as well (so far maybe only affecting the beta instance).

In local development, I noticed:

  • Postgres container eats up 100% CPU doing what looks like repeated selects on the project table
  • The browser takes 60s+ to respond, or times out
  • The Docker logs for web container even take about 30-60s to output the stack trace

There is nothing special about my set up though, and my database is 27M.

In debugging https://github.com/readthedocs/readthedocs.org/issues/11061 I noticed:

  • The database query takes 20 times longer to execute
  • The response time is over 10s, but is not all DB time
  • Profiling shows most of the time spent in logging and related modules
  • I was not able to confirm where code time was spent in production transactions

To reproduce:

  • Open a template that is included in another template via {% include %}
  • Throw a template exception from the included template
    • Remove load i18n from the template but add/retain a translated string
    • Access a missing attribute, ie project.foo.bar
  • Load the view that would trigger the template
  • Notice long response time and long query time in django debug toolbar

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 32 (32 by maintainers)

Commits related to this issue

Most upvoted comments

Just to clarify, when you say 502, are you noticing a long response time too?

Yes, I’m getting the Cloudflare 502 error page and it takes some time to display that page. However, I can’t say for sure that’s more time that just loading the dashboard when it works, since it takes ~6s from my side to get the initial HTML page.

Settings diff I’m using:

docs@web-ext-theme-i-0ae3319726327bdbb(org):~/checkouts/readthedocs.org$ diff -u readthedocs/settings/managed.py.orig readthedocs/settings/managed.py
--- readthedocs/settings/managed.py.orig	2024-03-05 11:12:44.246182303 +0000
+++ readthedocs/settings/managed.py	2024-03-05 11:13:21.966164143 +0000
@@ -443,33 +443,6 @@
             'formatter': 'newrelic',
         }
 
-        # Sentry logging
-        logging['formatters']['sentry'] ={
-            "()": structlog.stdlib.ProcessorFormatter,
-            "processors": [
-                structlog.stdlib.ProcessorFormatter.remove_processors_meta,
-                SentryProcessor(
-                    event_level=ERROR,
-                    as_context=False,
-                    tag_keys="__all__",
-                    ignore_loggers=[
-                        'django_structlog.celery.receivers',
-                        'django_structlog.middlewares.request',
-                        'django.security.DisallowedHost',
-                        'celery.worker.strategy',
-                        'celery.worker.consumer.mingle',
-                        'celery.worker.consumer.connection',
-                    ],
-                ),
-            ],
-            "foreign_pre_chain": shared_processors,
-         }
-        logging['handlers']['sentry'] = {
-            'level': 'ERROR',
-            'class': 'logging.handlers.SysLogHandler',
-            'formatter': 'sentry',
-        }
-
         logging['handlers']['syslog'] = {
             'level': 'INFO',
             'class': 'logging.handlers.SysLogHandler',
@@ -481,7 +454,7 @@
         }
         logging['loggers'] = {
             '': {
-                'handlers': ['console', 'syslog', 'sentry', 'newrelic'],
+                'handlers': ['console', 'syslog', 'newrelic'],
                 'level': 'DEBUG',
             },
             'django_structlog.celery.receivers': {
docs@web-ext-theme-i-0ae3319726327bdbb(org):~/checkouts/readthedocs.org$

Theme diff:

docs@web-ext-theme-i-0ae3319726327bdbb(org):~/checkouts/ext-theme$ git diff
WARNING: terminal is not fully functional
Press RETURN to continue 
diff --git a/readthedocsext/theme/templates/projects/partials/project_list.html b/readthedocsext/theme/templates/projects/partials/project_list.html
index 86a22f3..0e3d716 100644
--- a/readthedocsext/theme/templates/projects/partials/project_list.html
+++ b/readthedocsext/theme/templates/projects/partials/project_list.html
@@ -1,6 +1,5 @@
 {% extends "includes/crud/table_list.html" %}
 
-{% load i18n %}
 {% load humanize %}
 
 {% load privacy_tags %}
docs@web-ext-theme-i-0ae3319726327bdbb(org):~/checkouts/ext-theme$

With this configuration and running supervisorctl restart all I’m getting a 502. However, I’m not seeing anything in the logs under /var/log/readthedocs/*. Note that I’m able to access https://beta.readthedocs.org/support/ for example, which is a page that doesn’t depend on the project_list.html file --so, the instance is up and running 👍🏼

I found the issue. It’s what I guessed it could be in https://github.com/readthedocs/readthedocs.org/issues/10954#issuecomment-1856227684

Disabling the “nice logging output”, I get a similar CPU times and only 12 SQL queries more when it returns 500 (I suppose these are performed to show the traceback in the Django 500 debug page).

I pushed this changes to https://github.com/readthedocs/readthedocs.org/commit/592c108af4cf7678421c00b02692081bc06a020c to show how to disable this nice logger.

I found this discussion, and hoped it was related, but so far nothing there seems to help:

https://forum.djangoproject.com/t/changing-queryset-repr-to-avoid-unexpected-queries/21272/22?page=2

I’ve disabled debug toolbar, logging, and debug mode entirely. This must be something related to include or local vars there.

Yet another clue:

  • If I edit a top level template (projects/project_dashboard.html) and remove load i18n, the response is very quick
  • If I edit an include from this top level file (projects/partials/project_list.html) and remove load i18n, I notice the bug and the database is hit with looped queries.