memcached: segfault when sasl enabled and using the java spymemcached client

When using the java spymemcached client configured with invalid credentials, we’re seeing segfaults, even with a single client and a single-threaded memcached server. This seems to mirror the experience of the users in this old issue: https://code.google.com/archive/p/memcached/issues/278

When using valid credentials in the client, we do not see any segfaults. We experience this up through memcached 1.5.7. We’ve tested on both CentOS 6 and Ubuntu 18.04.

When running under valgrind, it outputs:

==23422== Thread 6:
==23422== Invalid read of size 8
==23422==    at 0x528C89E: sasl_server_step (server.c:1430)
==23422==    by 0x1150B2: complete_nread (memcached.c:2146)
==23422==    by 0x119FD7: drive_machine (memcached.c:5567)
==23422==    by 0x4E40F8B: event_base_loop (event.c:1350)
==23422==    by 0x1227CA: worker_libevent (thread.c:385)
==23422==    by 0x549EAA0: start_thread (pthread_create.c:301)
==23422==    by 0xABBD6FF: ???
==23422==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==23422==
==23422==
==23422== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==23422==  Access not within mapped region at address 0x10
==23422==    at 0x528C89E: sasl_server_step (server.c:1430)
==23422==    by 0x1150B2: complete_nread (memcached.c:2146)
==23422==    by 0x119FD7: drive_machine (memcached.c:5567)
==23422==    by 0x4E40F8B: event_base_loop (event.c:1350)
==23422==    by 0x1227CA: worker_libevent (thread.c:385)
==23422==    by 0x549EAA0: start_thread (pthread_create.c:301)
==23422==    by 0xABBD6FF: ???
==23422==  If you believe this happened as a result of a stack
==23422==  overflow in your program's main thread (unlikely but
==23422==  possible), you can try to increase the size of the
==23422==  main thread stack using the --main-stacksize= flag.
==23422==  The main thread stack size used in this run was 8388608.
==23422==
==23422== HEAP SUMMARY:
==23422==     in use at exit: 3,395,838 bytes in 216 blocks
==23422==   total heap usage: 3,372 allocs, 3,156 frees, 14,194,509 bytes allocated
==23422==
==23422== LEAK SUMMARY:
==23422==    definitely lost: 0 bytes in 0 blocks
==23422==    indirectly lost: 0 bytes in 0 blocks
==23422==      possibly lost: 5,328 bytes in 9 blocks
==23422==    still reachable: 3,390,510 bytes in 207 blocks
==23422==         suppressed: 0 bytes in 0 blocks
==23422== Rerun with --leak-check=full to see details of leaked memory

Here is a log with -vvv:

>12 Writing an error: Auth failure.
>12 Writing bin response:
>12   0x81 0x21 0x00 0x00
>12   0x00 0x00 0x00 0x20
>12   0x00 0x00 0x00 0x0d
>12   0x00 0x00 0x00 0x19
>12   0x00 0x00 0x00 0x00
>12   0x00 0x00 0x00 0x00
12: going from conn_nread to conn_mwrite
12: going from conn_mwrite to conn_new_cmd
12: going from conn_new_cmd to conn_waiting
12: going from conn_waiting to conn_read
12: going from conn_read to conn_closing
<12 connection closed.
12: going from conn_closing to conn_closed
<12 new auto-negotiating client connection
12: going from conn_new_cmd to conn_waiting
12: going from conn_waiting to conn_read
12: going from conn_read to conn_parse_cmd
12: Client using the binary protocol
<12 Read binary protocol data:
<12    0x80 0x22 0x00 0x05
<12    0x00 0x00 0x00 0x00
<12    0x00 0x00 0x00 0x2a
<12    0x00 0x00 0x00 0x1a
<12    0x00 0x00 0x00 0x00
<12    0x00 0x00 0x00 0x00
authenticated() in cmd 0x22 is true
12: going from conn_parse_cmd to conn_nread
mech:  ``PLAIN'' with 37 bytes of data
Segmentation fault (core dumped)

The bit that seems suspicious to me is:

12: going from conn_closing to conn_closed
<12 new auto-negotiating client connection

the connection was closed and then a new command on the same connection went through the new connection flow so I’m guessing this was a use-after-free.

To remedy this issue internally, I made a temp fix by adding this code to the conn new function:

    } else if (c->state == conn_closed || c->state == conn_closing) {
       if (settings.verbose) {
         fprintf(stderr, "recieved command on connection in state conn_closed sfd=%d\n", c->sfd);
         return NULL;
       }
    }

(On this line: https://github.com/memcached/memcached/blob/1.5.7/memcached.c#L559)

That change has remedied the segfaults in all of our stress testing. It seems a little dirty though and I’m not sure if there are any negative implications.

Please let me know if you need any more information

Thanks! Paul

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (13 by maintainers)

Most upvoted comments

Nice, thanks for the report! I don’t have a java toolchain handy, any chance you could get a GDB backtrace from a memcached-debug binary that crashed? That might give me slightly more context than the valgrind output.

Else I’ll tweak or build or port your repro case.