memcached: segfault when sasl enabled and using the java spymemcached client

When using the java spymemcached client configured with invalid credentials, we’re seeing segfaults, even with a single client and a single-threaded memcached server. This seems to mirror the experience of the users in this old issue: https://code.google.com/archive/p/memcached/issues/278

When using valid credentials in the client, we do not see any segfaults. We experience this up through memcached 1.5.7. We’ve tested on both CentOS 6 and Ubuntu 18.04.

When running under valgrind, it outputs:

==23422== Thread 6:
==23422== Invalid read of size 8
==23422==    at 0x528C89E: sasl_server_step (server.c:1430)
==23422==    by 0x1150B2: complete_nread (memcached.c:2146)
==23422==    by 0x119FD7: drive_machine (memcached.c:5567)
==23422==    by 0x4E40F8B: event_base_loop (event.c:1350)
==23422==    by 0x1227CA: worker_libevent (thread.c:385)
==23422==    by 0x549EAA0: start_thread (pthread_create.c:301)
==23422==    by 0xABBD6FF: ???
==23422==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==23422==
==23422==
==23422== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==23422==  Access not within mapped region at address 0x10
==23422==    at 0x528C89E: sasl_server_step (server.c:1430)
==23422==    by 0x1150B2: complete_nread (memcached.c:2146)
==23422==    by 0x119FD7: drive_machine (memcached.c:5567)
==23422==    by 0x4E40F8B: event_base_loop (event.c:1350)
==23422==    by 0x1227CA: worker_libevent (thread.c:385)
==23422==    by 0x549EAA0: start_thread (pthread_create.c:301)
==23422==    by 0xABBD6FF: ???
==23422==  If you believe this happened as a result of a stack
==23422==  overflow in your program's main thread (unlikely but
==23422==  possible), you can try to increase the size of the
==23422==  main thread stack using the --main-stacksize= flag.
==23422==  The main thread stack size used in this run was 8388608.
==23422==
==23422== HEAP SUMMARY:
==23422==     in use at exit: 3,395,838 bytes in 216 blocks
==23422==   total heap usage: 3,372 allocs, 3,156 frees, 14,194,509 bytes allocated
==23422==
==23422== LEAK SUMMARY:
==23422==    definitely lost: 0 bytes in 0 blocks
==23422==    indirectly lost: 0 bytes in 0 blocks
==23422==      possibly lost: 5,328 bytes in 9 blocks
==23422==    still reachable: 3,390,510 bytes in 207 blocks
==23422==         suppressed: 0 bytes in 0 blocks
==23422== Rerun with --leak-check=full to see details of leaked memory

Here is a log with -vvv:

>12 Writing an error: Auth failure.
>12 Writing bin response:
>12   0x81 0x21 0x00 0x00
>12   0x00 0x00 0x00 0x20
>12   0x00 0x00 0x00 0x0d
>12   0x00 0x00 0x00 0x19
>12   0x00 0x00 0x00 0x00
>12   0x00 0x00 0x00 0x00
12: going from conn_nread to conn_mwrite
12: going from conn_mwrite to conn_new_cmd
12: going from conn_new_cmd to conn_waiting
12: going from conn_waiting to conn_read
12: going from conn_read to conn_closing
<12 connection closed.
12: going from conn_closing to conn_closed
<12 new auto-negotiating client connection
12: going from conn_new_cmd to conn_waiting
12: going from conn_waiting to conn_read
12: going from conn_read to conn_parse_cmd
12: Client using the binary protocol
<12 Read binary protocol data:
<12    0x80 0x22 0x00 0x05
<12    0x00 0x00 0x00 0x00
<12    0x00 0x00 0x00 0x2a
<12    0x00 0x00 0x00 0x1a
<12    0x00 0x00 0x00 0x00
<12    0x00 0x00 0x00 0x00
authenticated() in cmd 0x22 is true
12: going from conn_parse_cmd to conn_nread
mech:  ``PLAIN'' with 37 bytes of data
Segmentation fault (core dumped)

The bit that seems suspicious to me is:

12: going from conn_closing to conn_closed
<12 new auto-negotiating client connection

the connection was closed and then a new command on the same connection went through the new connection flow so I’m guessing this was a use-after-free.

To remedy this issue internally, I made a temp fix by adding this code to the conn new function:

    } else if (c->state == conn_closed || c->state == conn_closing) {
       if (settings.verbose) {
         fprintf(stderr, "recieved command on connection in state conn_closed sfd=%d\n", c->sfd);
         return NULL;
       }
    }

(On this line: https://github.com/memcached/memcached/blob/1.5.7/memcached.c#L559)

That change has remedied the segfaults in all of our stress testing. It seems a little dirty though and I’m not sure if there are any negative implications.

Please let me know if you need any more information

Thanks! Paul

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 18 (13 by maintainers)

Most upvoted comments

Nice, thanks for the report! I don’t have a java toolchain handy, any chance you could get a GDB backtrace from a memcached-debug binary that crashed? That might give me slightly more context than the valgrind output.

Else I’ll tweak or build or port your repro case.

dormando on May 23, 2018