mediasoup: OpenSSL send buffer growing without bounds in DtlsTransport (worker memory leak)

Bug Report

We discovered a problem with the memory usage of Mediasoup Workers growing over time. The memory would go down only when clients disconnected. We reproduced the problem by using clients using msc-node/werift that create only a data producer and then send a lot of messages (3000/s). The problem also occurs when sending at a slower rate. Note that you do not need any consumers for the problem to occur.

Cause

I’ve tracked the problem down to the following. When data is being sent over the WebRTC channel from the client to the server the server will send ACK messages back. These are sent to OpenSSL to be encrypted. Recently, in 3.13.0, changes were introduced (in the ‘flatbuffers’ change), that include changes to the way the OpenSSL data is handled. These changes cause a buffer (DtlsTransport.sslBioToNetwork, aa BIO_s_mem) for the data coming out of OpenSSL to keep growing.

It looks like this bug was introduced due to a misunderstanding on how BIO_set_callback_ex operates. As far as I can see from the documentation this “can be used for debugging purposes to trace operations on a BIO or to modify its operation.” The way it is currently used in Mediasoup is to spy on the ‘write’ operations OpenSSL does to the outgoing buffer (DtlsTransport.sslBioToNetwork). The data is then sent out over the network. However, it does not remove the data from the buffer, resulting in the leak.

Workaround

A workaround is to to go back to version 3.12.16, which does not experience the problem. We’ll be doing this for now.

Fix?

I don’t know anything about OpenSSL, but I could get the problem to go away by placing (void)BIO_reset(this->sslBioToNetwork); at the end of DtlsTransport::SendDtlsData. However, I do have some doubts on what is going on with the callback, as the documentation says it’s called twice for every operation, and perhaps we’re sending the data twice now? That would have to be investigated.

If there’s no OpenSSL experts around, I can try doing a PR. However, it will have to be reviewed and tested well as I’d be mostly programming in the dark. And that’s dangerous when encryption and networking is involved.

More Info

The PR that introduced the problem (3.13.0): Worker: Make DTLS fragment stay within MTU size range (https://github.com/versatica/mediasoup/pull/1156, based on https://github.com/versatica/mediasoup/pull/1143 by @vpnts-se).

Note that this is not a ‘true’ memory leak, as the memory is freed when the client disconnects.

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 22 (17 by maintainers)

Commits related to this issue

Most upvoted comments

@pnts-se don’t worry, I’m doing it. Thanks a lot.