envoy: FileSystemHTTPCache crash
Title: FileSystemHTTPCache crash
Note Envoy’s security team was already contacted, and permission to file an issue for a crash was granted:
Since the HTTP Cache filter is still work in progress and not ready for production use the issue can be fixed in open. Please open an issue for the described above problem in https://github.com/envoyproxy/envoy.
Description:
We use an xDS management server to dynamically generate configurations and have Envoy pull them (which is being used as an edge proxy). Since incorporating the use of this filter, we have noticed that the Envoy process completely crashes whenever a new configuration is being loaded and a response for an HTTP request comes in around the same time. What the “new configuration” is can be as simple as changing a route (therefore resulting in an RDS update .
Repro steps:
We don’t have clear reproduction steps for this issue, however as mentioned it seems to be some kind of race condition relating to the HTTP cache filter when a response passes through it while Envoy is simultaneously updating its configuration. The docker
debug-dev-9c1dcef98fe3982e26ec9e24ccd19de39862949dimage is being used, but it should be possible to replicate in both earlier and later versions
Admin and Stats Output:
N/A
Config:
Not included because this would be over 20 MB, and the only relevant details should be:
- I’m using the FileSystemHTTPCache filter for all responses passing through Envoy
- Envoy gets it configuration via an xDS management server
Logs:
Call Stack:
{"log":"addr2line: DWARF error: could not find abbrev number 190518\n","stream":"stderr","time":"2023-08-14T19:26:54.500126192Z"}
{"log":"addr2line: DWARF error: could not find abbrev number 190518\n","stream":"stderr","time":"2023-08-14T19:26:54.550418586Z"}
{"log":"addr2line: DWARF error: could not find abbrev number 190518\n","stream":"stderr","time":"2023-08-14T19:26:54.601255321Z"}
{"log":"addr2line: DWARF error: could not find abbrev number 190518\n","stream":"stderr","time":"2023-08-14T19:26:54.653538255Z"}
{"log":"[2023-08-14 19:26:54.379][123][debug][cache_filter] [source/extensions/filters/http/cache/cache_filter.cc:98] [Tags: \"ConnectionId\":\"5\",\"StreamId\":\"510665027642276531\"] CacheFilter::decodeHeaders starting lookup\n","stream":"stdout","time":"2023-08-14T19:27:02.778483393Z"}
{"log":"[2023-08-14 19:26:54.379][38][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x55f0d43f9570\n","stream":"stdout","time":"2023-08-14T19:27:02.778499784Z"}
{"log":"[2023-08-14 19:26:54.379][38][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):\n","stream":"stdout","time":"2023-08-14T19:27:02.778502874Z"}
{"log":"[2023-08-14 19:26:54.379][38][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 9c1dcef98fe3982e26ec9e24ccd19de39862949d/1.28.0-dev/Clean/RELEASE/BoringSSL\n","stream":"stdout","time":"2023-08-14T19:27:02.778505314Z"}
{"log":"[2023-08-14 19:26:54.380][38][critical][backtrace] [./source/server/backtrace.h:96] #0: __restore_rt [0x7fd40395b420]-\u003e[0x29e32e5c7420] ??:0\n","stream":"stdout","time":"2023-08-14T19:27:02.778507784Z"}
{"log":"[2023-08-14 19:26:54.387][38][critical][backtrace] [./source/server/backtrace.h:96] #1: Envoy::Extensions::Common::AsyncFiles::AsyncFileActionWithResult\u003c\u003e::execute() [0x55f0d5d5610a]-\u003e[0x9c210a] snapshot.cc:?\n","stream":"stdout","time":"2023-08-14T19:27:02.778510524Z"}
{"log":"[2023-08-14 19:26:54.393][38][critical][backtrace] [./source/server/backtrace.h:96] #2: Envoy::Extensions::Common::AsyncFiles::AsyncFileManagerThreadPool::resolveActions() [0x55f0d5d57b29]-\u003e[0x9c3b29] snapshot.cc:?\n","stream":"stdout","time":"2023-08-14T19:27:02.778518394Z"}
{"log":"[2023-08-14 19:26:54.402][38][critical][backtrace] [./source/server/backtrace.h:96] #3: Envoy::Extensions::Common::AsyncFiles::AsyncFileManagerThreadPool::worker() [0x55f0d5d57a2e]-\u003e[0x9c3a2e] snapshot.cc:?\n","stream":"stdout","time":"2023-08-14T19:27:02.778521044Z"}
{"log":"[2023-08-14 19:26:54.412][38][critical][backtrace] [./source/server/backtrace.h:96] #4: std::__1::__thread_proxy\u003c\u003e() [0x55f0d5d58cb1]-\u003e[0x9c4cb1] snapshot.cc:?\n","stream":"stdout","time":"2023-08-14T19:27:02.778523694Z"}
{"log":"[2023-08-14 19:26:54.413][38][critical][backtrace] [./source/server/backtrace.h:96] #5: start_thread [0x7fd40394f609]-\u003e[0x29e32e5bb609] ??:0\n","stream":"stdout","time":"2023-08-14T19:27:02.778526254Z"}
About this issue
- Original URL
- State: open
- Created 10 months ago
- Comments: 21 (14 by maintainers)
Still in my queue, but super busy.
Keep on nagging, stalebot.