ws: Per-Message Deflate Memory Leak

sorry for not following the issue template, this way is easier to explain the issue, I will be happy to elaborate more

Description

In a production server after running a long time / mass-disconnection, I have noticed memory leak according to zlib & websockets.

This happens to all servers, but extremely happened to one of our servers which got mass-disconnect from 2.5k connected users to 50 connceted users. memory dump on this server showed:

  • We have 87k instances of WebSocket class
  • AsyncLimiter Queue have thousands of items

I managed to reproduce locally via:

  • socket.io server (this my setup) with per-message deflate enabled
  • connected with 200 users simultaneously and disconnected at once - memory stays the same, did manual garbage collection, wait for 2 for minutes, taking memory dump shows server still have WebSocket classes in memory.
  • Connected again with 20 users and now I can’t connect - not getting responses from the server. the server is stuck.

Root cause

Because of AsyncLimiter have 10 pending jobs that never finished, this happens as a result of a race - calling close on inflate/deflate won’t close any further flushes.

Example:

inflate.write(...);
inflate.close(); // happened async
inflate.flush(() => /* won't be called */);

I made some patches in the code to make sure I call the callback, but I didn’t like the solution of attaching a callback to inflate/deflate objects, since the object can handle multiple operations at the same time and we can have further racers.

In my opinion, we should decouple the inflate/deflate from per-message deflate and create a class which will have an internal queue and call to inflate/deflate, one cleanup it will flush the queue - this way we can the solve the issues I wrote above.

I can submit a pull request for the above approach.

Reproducible in:

  • ws version: 6.4.1
  • socket.io version: 2.2.0
  • Node.js version(s): v10.15.3
  • OS version(s): Mac (local) / Linux (production)

Memory dump screenshot

Screen Shot 2019-08-11 at 16 12 25

The picture above is a memory dump for the server which had mass-disconnect from 2.5k to 50 sockets:

  • 88,992 instance of PerMessageDeflate
  • 178,012 instances of WebSocket
  • Array / Queue - is AsyncLimiter queue

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 26 (25 by maintainers)

Commits related to this issue

Most upvoted comments

@phiresky the scenario I wrote in the PR is the same as yours, please try to upgrade to latest version and report back.

make sense, thanks I’ll PR later!

Cool @lpinca ! I will make a PR.

I started re-making my old, the question is what should I pass in the arguments to the callback, passing data their I think is in-correct, I think I should pass Error make sense?

They should not have the issue described here because close() should never be called on inflate streams while data is being decompressed due to how the WebSocket is closed.