memcached: segfault when sasl enabled and using the java spymemcached client
When using the java spymemcached client configured with invalid credentials, we’re seeing segfaults, even with a single client and a single-threaded memcached server. This seems to mirror the experience of the users in this old issue: https://code.google.com/archive/p/memcached/issues/278
When using valid credentials in the client, we do not see any segfaults. We experience this up through memcached 1.5.7. We’ve tested on both CentOS 6 and Ubuntu 18.04.
When running under valgrind, it outputs:
==23422== Thread 6:
==23422== Invalid read of size 8
==23422== at 0x528C89E: sasl_server_step (server.c:1430)
==23422== by 0x1150B2: complete_nread (memcached.c:2146)
==23422== by 0x119FD7: drive_machine (memcached.c:5567)
==23422== by 0x4E40F8B: event_base_loop (event.c:1350)
==23422== by 0x1227CA: worker_libevent (thread.c:385)
==23422== by 0x549EAA0: start_thread (pthread_create.c:301)
==23422== by 0xABBD6FF: ???
==23422== Address 0x10 is not stack'd, malloc'd or (recently) free'd
==23422==
==23422==
==23422== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==23422== Access not within mapped region at address 0x10
==23422== at 0x528C89E: sasl_server_step (server.c:1430)
==23422== by 0x1150B2: complete_nread (memcached.c:2146)
==23422== by 0x119FD7: drive_machine (memcached.c:5567)
==23422== by 0x4E40F8B: event_base_loop (event.c:1350)
==23422== by 0x1227CA: worker_libevent (thread.c:385)
==23422== by 0x549EAA0: start_thread (pthread_create.c:301)
==23422== by 0xABBD6FF: ???
==23422== If you believe this happened as a result of a stack
==23422== overflow in your program's main thread (unlikely but
==23422== possible), you can try to increase the size of the
==23422== main thread stack using the --main-stacksize= flag.
==23422== The main thread stack size used in this run was 8388608.
==23422==
==23422== HEAP SUMMARY:
==23422== in use at exit: 3,395,838 bytes in 216 blocks
==23422== total heap usage: 3,372 allocs, 3,156 frees, 14,194,509 bytes allocated
==23422==
==23422== LEAK SUMMARY:
==23422== definitely lost: 0 bytes in 0 blocks
==23422== indirectly lost: 0 bytes in 0 blocks
==23422== possibly lost: 5,328 bytes in 9 blocks
==23422== still reachable: 3,390,510 bytes in 207 blocks
==23422== suppressed: 0 bytes in 0 blocks
==23422== Rerun with --leak-check=full to see details of leaked memory
Here is a log with -vvv
:
>12 Writing an error: Auth failure.
>12 Writing bin response:
>12 0x81 0x21 0x00 0x00
>12 0x00 0x00 0x00 0x20
>12 0x00 0x00 0x00 0x0d
>12 0x00 0x00 0x00 0x19
>12 0x00 0x00 0x00 0x00
>12 0x00 0x00 0x00 0x00
12: going from conn_nread to conn_mwrite
12: going from conn_mwrite to conn_new_cmd
12: going from conn_new_cmd to conn_waiting
12: going from conn_waiting to conn_read
12: going from conn_read to conn_closing
<12 connection closed.
12: going from conn_closing to conn_closed
<12 new auto-negotiating client connection
12: going from conn_new_cmd to conn_waiting
12: going from conn_waiting to conn_read
12: going from conn_read to conn_parse_cmd
12: Client using the binary protocol
<12 Read binary protocol data:
<12 0x80 0x22 0x00 0x05
<12 0x00 0x00 0x00 0x00
<12 0x00 0x00 0x00 0x2a
<12 0x00 0x00 0x00 0x1a
<12 0x00 0x00 0x00 0x00
<12 0x00 0x00 0x00 0x00
authenticated() in cmd 0x22 is true
12: going from conn_parse_cmd to conn_nread
mech: ``PLAIN'' with 37 bytes of data
Segmentation fault (core dumped)
The bit that seems suspicious to me is:
12: going from conn_closing to conn_closed
<12 new auto-negotiating client connection
the connection was closed and then a new command on the same connection went through the new connection flow so I’m guessing this was a use-after-free.
To remedy this issue internally, I made a temp fix by adding this code to the conn new function:
} else if (c->state == conn_closed || c->state == conn_closing) {
if (settings.verbose) {
fprintf(stderr, "recieved command on connection in state conn_closed sfd=%d\n", c->sfd);
return NULL;
}
}
(On this line: https://github.com/memcached/memcached/blob/1.5.7/memcached.c#L559)
That change has remedied the segfaults in all of our stress testing. It seems a little dirty though and I’m not sure if there are any negative implications.
Please let me know if you need any more information
Thanks! Paul
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (13 by maintainers)
Nice, thanks for the report! I don’t have a java toolchain handy, any chance you could get a GDB backtrace from a memcached-debug binary that crashed? That might give me slightly more context than the valgrind output.
Else I’ll tweak or build or port your repro case.