apm-agent-python: trace.id is not available in fastapi exception handler
Describe the bug: We have fastapi framework, and we add apm_client for starlette to the application, we want to generate trace.id to correlate logs when an exception happens. I have structlog package for the logging and import elasticapm.handlers.structlog.structlog_processor
. We set exception handlers in fastapi application for our excpected handlers and also one for Exception
class to capture the ones that we do not expect. I can get trace.id in child exceptions but not in the exception handler that is comparing to root.
My goal was to have log Exception types and have log correlation in them.
I am not sure if it is bug or feature request🤷♂️ To Reproduce
- create apm agent and fastapi app, add the apm agent to application
from fastapi import FastAPI
from elasticapm.contrib.starlette import make_apm_client
from elasticapm.contrib.starlette import ElasticAPM
apm = make_apm_client(config=config)
app = FastAPI()
app.add_middleware(ElasticAPM, client=apm)
- configure structlog
import structlog
from elasticapm.handlers.structlog import structlog_processor
def processor(logger, level, log_dict):
return json.dumps(log_dict)
structlog.configure(processors=[structlog_processor, processor])
- create an child exceptions and root and child exception
class AuthException(Exception):
def __ini__(self):
super().__init__()
@app.exception_handler(Exception)
async def fallback_handler(request , exc: Exception):
getLogger().info("Uncaught exception during request handling happend")
return JSONResponse({"detail": "ROOT EXCEPTION"}, status_code=500)
@app.exception_handler(AuthException)
async def validation_exception_handler(request, exc: AuthException):
getLogger().info(f"Request Validation exception happend{sys.exc_info()}")
return JSONResponse({"detail": "CHILD EXCEPTION"}, status_code=404)
- create an endpoint to send request to see if logging formatter/processor works or not
@app.post("/")
async def post(body: Any = Body(...)):
if body == 1: raise AuthException("child exception")
if body == 2: raise ZeroDivisionError("root exception")
return JSONResponse({"Response": "test SUCCESS"}, status_code=200)
Environment (please complete the following information)
- OS: Win and Linux
- Python version: 3.7
- Framework and version [e.g. Django 2.1]: fastapi 0.63
- APM Server version: _
- Agent version: 6.*
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 15 (8 by maintainers)
I have a POC in #1674 but it’s not working yet. My code isn’t being hit, still trying to figure out why.
@magichair I’m not sure how we missed this in our 8.6 planning, but I’m going to take a look. 😃
Edit: Oh, wait, just saw your edit. That sounds plausible. I’m still going to look into this, though, we should support handling the base
Exception
.@basepi I’m wondering if it has something to do with how the
.end_transaction()
occurs. Perhaps in FastAPI / event loop async there is some race condition where the transaction is ended before we get to these access logs and other logs. It seems strange to me that thefinally
could occur before, but I also don’t know enough about event loop and how FastAPI might be short circuiting or prioritizing things on the loop if it’s an exception (just a theoretical, nothing to back up that claim). https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/starlette/__init__.py#L185I’ll point out that the
capture_exception
that is sent to APM’s Error page does have the trace.id and transaction id - https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/starlette/__init__.py#L176-L178 maybe the finalraise
( https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/starlette/__init__.py#L183 ) here goes off back to the event loop and thefinally
to end the transaction occurs immediately before FastAPI and it’s logging can take over and log the exception and pass it through the middlewares.I’m not sure if it’s helpful, but maybe there is some historical context around these comments in the Flask module - https://github.com/elastic/apm-agent-python/blob/master/elasticapm/contrib/flask/__init__.py#L199-L201 perhaps they too ran into some race condition problems and had to defer the
end_transaction
until later. I’m not sure how that maps in a FastAPI world yet, but I figure it might help.No worries whatsoever. We’re all doing the best we can. I’m happy to help in any way I can.
Generally, the problem is ordering the middlewares in Starlette. https://github.com/encode/starlette/blob/master/starlette/applications.py#L83
ElasticAPM middleware is a “user middleware”. The middlewares in Starlette are like wrappers over wrappers.
the ordering by response/exception processing is:
@app.post(..)
,@app.get(..)
,…ExceptionMiddleware
controlled byexception_handler
decorator where the argument is notException
ServerErrorMiddleware
controlled byexception_handler(Exception)
decorator (argument must beException
or500
)In the ordering above, you can see that ElasticAPM is under the ExceptionMiddleware. The ExceptionMiddleware overwrites the raised exceptions to
JSONResponse
(in your case) and the tracking information of the exception under is lost.It might not be easy to fix the problem in this project, it may be easier to add a constructor argument to overwrite
ExceptionMiddleware
andServerErrorMiddleware
or define own the middleware ordering in Starlette.