cls-hooked: Context occasionally gets corrupted
Hello!
Trying to debug a pretty tricky issue. We use this library to thread a distributed traceId down to our logs and shared HTTP library across many many microservices. In rare cases, we’re seeing logs and requests hop from one traceId to another. In essence, this code:
const oldTrace = clsHooked.getNamespace('DEFAULT_NAMESPACE').get('context').traceId;
const res = await httpClient.get(someUrl);
const newTrace = clsHooked.getNamespace('DEFAULT_NAMESPACE').get('context').traceId;
if(oldTrace !== newTrace) {
console.log('~~ TRACE ID MISMATCH: ', oldTrace, newTrace);
process.exit(1);
}
Fails pretty consistently, and I’m trying to figure out why. I’m still grappling with how async_hooks and cls-hooked work in general, but my understanding is that things like older or non-standard promise libraries or custom thenables (as mentioned in #37) can cause this to happen.
Any advice on tracking down exactly what’s happening?
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (1 by maintainers)
Has anyone found a resolution to this? We appear to be experiencing the same issue as described here. Still running some tests to see if I can triage the issue further, but the description sounds exactly like what we see - data that is leaking between requests.
+1 to this issue.
We are currently using cls-hooked with Express. We have it as a middleware and are setting a few request specific values such as a client and tenant ID which are used for routing the connection to the correct database.
We have noticed our service is generating 404’s (as part of validation from database queries) as the connections were being routed to the wrong databases. As part of debugging this issue we wrote some scripts to load the server calling GET’s with valid information but quickly swapping between two different tenants and thus the connections should be routed to different databases.
Within 30 seconds or less, we would get a 404 and upon inspecting the logs, the request would be using the tenant ID from the previous request.
We have not been able to make any inroads on fixing this bug except for the scripts which easily reproduce it. We don’t have any problems in times of low concurrent requests.
We are running Node 12.13.0.