apm-agent-python: trace.id is not available in fastapi exception handler

Describe the bug: We have fastapi framework, and we add apm_client for starlette to the application, we want to generate trace.id to correlate logs when an exception happens. I have structlog package for the logging and import elasticapm.handlers.structlog.structlog_processor. We set exception handlers in fastapi application for our excpected handlers and also one for Exception class to capture the ones that we do not expect. I can get trace.id in child exceptions but not in the exception handler that is comparing to root.

My goal was to have log Exception types and have log correlation in them.

I am not sure if it is bug or feature request🤷‍♂️ To Reproduce

  1. create apm agent and fastapi app, add the apm agent to application
from fastapi import FastAPI
from elasticapm.contrib.starlette import make_apm_client
from elasticapm.contrib.starlette import ElasticAPM
apm = make_apm_client(config=config)
app = FastAPI()
app.add_middleware(ElasticAPM, client=apm)
  1. configure structlog
import structlog
from elasticapm.handlers.structlog import structlog_processor
def processor(logger, level, log_dict):
    return json.dumps(log_dict)

structlog.configure(processors=[structlog_processor, processor])

  1. create an child exceptions and root and child exception
class AuthException(Exception):
    def __ini__(self):
        super().__init__()


@app.exception_handler(Exception)
async def fallback_handler(request , exc: Exception):
    getLogger().info("Uncaught exception during request handling happend")
    return JSONResponse({"detail": "ROOT EXCEPTION"}, status_code=500)

@app.exception_handler(AuthException)
async def validation_exception_handler(request, exc: AuthException):
    getLogger().info(f"Request Validation exception happend{sys.exc_info()}")
    return JSONResponse({"detail": "CHILD EXCEPTION"}, status_code=404)
  1. create an endpoint to send request to see if logging formatter/processor works or not
@app.post("/")
async def post(body: Any = Body(...)):
    if body == 1: raise AuthException("child exception")
    if body == 2: raise ZeroDivisionError("root exception")
    return JSONResponse({"Response": "test SUCCESS"}, status_code=200)

Environment (please complete the following information)

  • OS: Win and Linux
  • Python version: 3.7
  • Framework and version [e.g. Django 2.1]: fastapi 0.63
  • APM Server version: _
  • Agent version: 6.*

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I have a POC in #1674 but it’s not working yet. My code isn’t being hit, still trying to figure out why.

@magichair I’m not sure how we missed this in our 8.6 planning, but I’m going to take a look. 😃

Edit: Oh, wait, just saw your edit. That sounds plausible. I’m still going to look into this, though, we should support handling the base Exception.

@basepi I’m wondering if it has something to do with how the .end_transaction() occurs. Perhaps in FastAPI / event loop async there is some race condition where the transaction is ended before we get to these access logs and other logs. It seems strange to me that the finally could occur before, but I also don’t know enough about event loop and how FastAPI might be short circuiting or prioritizing things on the loop if it’s an exception (just a theoretical, nothing to back up that claim). https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/starlette/__init__.py#L185

I’ll point out that the capture_exception that is sent to APM’s Error page does have the trace.id and transaction id - https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/starlette/__init__.py#L176-L178 maybe the final raise ( https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/starlette/__init__.py#L183 ) here goes off back to the event loop and the finally to end the transaction occurs immediately before FastAPI and it’s logging can take over and log the exception and pass it through the middlewares.

I’m not sure if it’s helpful, but maybe there is some historical context around these comments in the Flask module - https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/flask/__init__.py#L199-L201 perhaps they too ran into some race condition problems and had to defer the end_transaction until later. I’m not sure how that maps in a FastAPI world yet, but I figure it might help.

Sorry for the slow response here.

No worries whatsoever. We’re all doing the best we can. I’m happy to help in any way I can.

Generally, the problem is ordering the middlewares in Starlette. https://github.com/encode/starlette/blob/master/starlette/applications.py#L83

ElasticAPM middleware is a “user middleware”. The middlewares in Starlette are like wrappers over wrappers.

the ordering by response/exception processing is:

  1. Endpoint method controlled by decorators @app.post(..), @app.get(..),…
  2. ExceptionMiddleware controlled by exception_handler decorator where the argument is not Exception
  3. User middlewares (e.g. ElasticAPM middleware)
  4. ServerErrorMiddleware controlled by exception_handler(Exception) decorator (argument must be Exception or 500)

In the ordering above, you can see that ElasticAPM is under the ExceptionMiddleware. The ExceptionMiddleware overwrites the raised exceptions to JSONResponse (in your case) and the tracking information of the exception under is lost.

It might not be easy to fix the problem in this project, it may be easier to add a constructor argument to overwrite ExceptionMiddleware and ServerErrorMiddleware or define own the middleware ordering in Starlette.