==26076==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6250008cc918 at pc 0x000000e096e6 bp 0x2b88973a8600 sp 0x2b88973a85f0
READ of size 8 at 0x6250008cc918 thread T2 ([ET_NET 0])
#0 0xe096e5 in Ptr<ProxyMutex>::operator bool() const ../../include/tscore/Ptr.h:112
#1 0xe096e5 in Continuation::handleEvent(int, void*) /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:189
#2 0xe096e5 in read_signal_and_update /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:83
#3 0xe0e7fa in UnixNetVConnection::mainEvent(int, Event*) /usr/local/src/trafficserver/iocore/net/UnixNetVConnection.cc:1148
#4 0xde3cef in Continuation::handleEvent(int, void*) /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:190
#5 0xde3cef in InactivityCop::check_inactivity(int, Event*) /usr/local/src/trafficserver/iocore/net/UnixNet.cc:85
#6 0xf290ac in Continuation::handleEvent(int, void*) /usr/local/src/trafficserver/iocore/eventsystem/I_Continuation.h:190
#7 0xf290ac in EThread::process_event(Event*, int) /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:136
#8 0xf2b9ad in EThread::execute_regular() /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:249
#9 0xf2cd71 in EThread::execute() /usr/local/src/trafficserver/iocore/eventsystem/UnixEThread.cc:338
#10 0xf2717a in spawn_thread_internal /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:92
#11 0x2b8891a17e64 in start_thread (/lib64/libpthread.so.0+0x7e64)
#12 0x2b889274d88c in clone (/lib64/libc.so.6+0xfe88c)
Address 0x6250008cc918 is a wild pointer.
SUMMARY: AddressSanitizer: heap-buffer-overflow ../../include/tscore/Ptr.h:112 in Ptr<ProxyMutex>::operator bool() const
Shadow bytes around the buggy address:
0x0c4a801118d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a801118e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a801118f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c4a80111920: fa fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111930: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111940: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111960: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c4a80111970: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Thread T2 ([ET_NET 0]) created by T0 ([TS_MAIN]) here:
#0 0x2b888f0c1a7f in pthread_create (/lib64/libasan.so.4+0x37a7f)
#1 0xf282de in ink_thread_create ../../include/tscore/ink_thread.h:159
#2 0xf282de in Thread::start(char const*, void*, unsigned long, std::function<void ()> const&) /usr/local/src/trafficserver/iocore/eventsystem/Thread.cc:109
#3 0xf366ff in EventProcessor::spawn_event_threads(int, int, unsigned long) /usr/local/src/trafficserver/iocore/eventsystem/UnixEventProcessor.cc:392
#4 0xf37935 in EventProcessor::start(int, unsigned long) /usr/local/src/trafficserver/iocore/eventsystem/UnixEventProcessor.cc:455
#5 0x4c05c8 in main traffic_server/traffic_server.cc:1982
#6 0x2b8892671504 in __libc_start_main (/lib64/libc.so.6+0x22504)
==26076==ABORTING
After another day of staring and tweaking, I think I found the issue and it is indeed due to commit dac14897b3c395c30b55cf2796cd19b0b80fd3c9. With that commit, the read_vio ndone is no longer being updated. For the post bodies, we are entirely relying on the client correctly setting the stream end flag. The asserts were catching cases where the origin tunnel consumer received the WRITE_COMPLETE but the user agent tunnel producer never received the READ_COMPLETE. Instead it hung out until it received the INACTIVE TIMEOUT. In this sequence the server_session was not being cleaned up correctly, leaving a read_vio pointing back to the deleted state machine.
I have a branch which updates the ndone and sends the READ_COMPLETE once read_vio.ntodo() is 0 or the end stream flag is set. I’m leaving this run. And I need to back out some of my asserts before putting up a PR tomorrow.