engine.io: CORS pre-flight breaks socket.io behind load balancer

I ran into an issue on our servers. We are running socket.io v1.0.6 on multiple server instances behind a load balancer. For polling, the requests go through the ELB with sticky sessions turned on. Our real-time service is on a subdomain, and thanks to CORS pre-flight requests, socket.io fucks up. Here is what happens on the client when the polling transport is used:

  1. A socket.io handshake POST request occurs. The response comes back valid with an sid, and the headers include the AWS ELB cookie.
  2. Next, a pre-flight OPTIONS request is made by the browser. The ELB cookie is not included by the browser here. As a result, the OPTIONS request is routed to a potentially different server which will not recognize the sid in the query string.
  3. When the request is routed to the wrong server, socket.io responds with a 400 HTTP status code and an Session ID unknown error.
  4. Since the pre-flight request fails, the browser also fails the actual GET polling request, and tries to re-do the handshake from the beginning
  5. Possibly due to the headers being sent, the browser sends the OPTIONS pre-flight request fairly regularly as opposed to doing it only once, so this cycle repeats over and over.

The fix on our end currently is to respond to all OPTIONS requests with a 200 and all the usual Access-Control-Allow-… headers the browser knows and loves. We do this before they even get to socket.io in our nginx config.

Now, engine.io appears to already handle this case here: https://github.com/Automattic/engine.io/blob/master/lib/transports/polling-xhr.js#L40

However, that check is only reached if the sid is valid here: https://github.com/Automattic/engine.io/blob/master/lib/server.js#L180

which it isn’t, of course. I can submit a PR but I’d like to know how you guys think it’d be best to handle this. AFAIK, if a request method is OPTIONS, we can make the assumption that we are polling. But, since we don’t have a valid sid to look up a client by, this might mean moving fairly transport-specific logic into server.js which sounds less than ideal.

Thoughts?

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 6
  • Comments: 43 (25 by maintainers)

Most upvoted comments

I’m trying to reproduce the issue, but so far the connection seems actually stable. screen shot 2017-06-17 at 18 02 06 screen shot 2017-06-17 at 18 01 33

I’m using https://github.com/socketio/engine.io/tree/a63c7b787c54b3a47da7f355826bf2770139c62b.

var app = require('express')();
var cors = require('cors');

app.options('*', cors({
  origin: true,
  methods: 'POST',
  allowedHeaders: ['Content-Type'],
  credentials: true,
}));

var server = require('http').Server(app);
var io = require('socket.io')(server, {
  handlePreflightRequest: false
});

...

For anyone else with this issue, the following code will fix it:

module.exports = function(srv) {
  var listeners = srv.listeners('request').slice(0);
  srv.removeAllListeners('request');
  srv.on('request', function(req, res) {
    if(req.method === 'OPTIONS' && req.url.indexOf('/socket.io') === 0) {
      var headers = {};
      if (req.headers.origin) {
        headers['Access-Control-Allow-Credentials'] = 'true';
        headers['Access-Control-Allow-Origin'] = req.headers.origin;
      } else {
        headers['Access-Control-Allow-Origin'] = '*';
      }

      headers['Access-Control-Allow-Headers'] = 'origin, content-type, accept';
      res.writeHead(200, headers);
      res.end();
    } else {
      listeners.forEach(function(fn) {
        fn.call(srv, req, res);
      });
    }
  });
};

Applied after socket.io:

e.g:

io.listen(server);
handleOptions(server);

This is one of the best issue descriptions I’ve read in a while. Thanks for taking the time.

Can you send me the complete headers of the request that yields OPTIONS? Ideally we wouldn’t need the pre-flight. I’m suspecting you’re sending binary data which is resulting in a Content-Type switch?

Requests that do not need pre-flight according to MDN are:

A simple cross-site request is one that:

Only uses GET, HEAD or POST. If POST is used to send data to the server, the Content-Type of the data sent to the server with the HTTP POST request is one of application/x-www-form-urlencoded, multipart/form-data, or text/plain.
Does not set custom headers with the HTTP Request (such as X-Modified, etc.)

Before proceeding with the OPTIONS fix, I want to make sure it’s happening for a good reason.

The flow I currently see is:

  1. GET /socket.io/?EIO=3&transport=polling&t=...
  2. GET /socket.io/?EIO=3&transport=polling&t=...&sid=...
  3. OPTIONS /socket.io/?EIO=3&transport=polling&t=...&sid=...
  4. POST /socket.io/?EIO=3&transport=polling&t=...&sid=...

In 1., the session is created. socket.io assigns a sid and ALB create a new cookie. In 2., we send a new request with the sid provided by socket.io and the cookie sent by ALB. In 3., we send an OPTIONS request before doing the POST request used in the ping process. But it seems that OPTIONS requests are not sending the cookies. (https://fetch.spec.whatwg.org/#cors-protocol-and-credentials Note that even so, a CORS-preflight request never includes credentials.). Because of that, ALB send a new cookie, which might associate the client to the same server or a new server. In 4., if the cookie links the user to a new server, the ping request will arrive to a server without the session, and we have to re-establish the connection from 1.

All of that is happening because:

The only way to make the polling transport in the current state of ALB would be to avoir the preflight requests.

However, it should work fine with any loadbalancer using a consistent hashing algorithm, or without setting a new cookie on preflight requests.

Basically, the issue didn’t change much since the original post. #484 is just making it fail at the POST request instead of the OPTIONS request.

Hello,

I’m trying to set up socket.io servers behind AWS ALB. The stickiness is using cookies as stated in http://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#sticky-sessions

Application Load Balancers support load balancer-generated cookies only. The name of the cookie is AWSALB. The contents of these cookies are encrypted using a rotating key. You cannot decrypt or modify load balancer-generated cookies.

Because the web application and the ALB are not on the same domain, OPTIONS requests are sent during the handshake phase for long-polling. However the OPTIONS requests don’t have cookies attached to them, and are sent following the ALB distribution method instead of being sticky. The OPTIONS requests being handled late inside the polling-xhr transport, there is a verification step being done early which fails because the session is not valid on all the servers. All of that leads to the handshake failure.

While I agree that the aforementioned points should be addressed to avoid doing OPTIONS requests, OPTIONS requests should not trigger all the described logic.

The rfc2616 says:

The OPTIONS method represents a request for information about the communication options available on the request/response chain identified by the Request-URI. This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval. From my understanding, OPTIONS should only specify the headers for CORS allowing subsequent requests to be sent. It SHOULD NOT:

I’ve come up with a way to by-pass the verification steps and delegate earlier the processing of the OPTIONS requests to the transport with https://github.com/MiLk/engine.io/commit/c8012eb24b20dda7b9a679cde8629361cadf2f12 I’m not convinced that’s the best way to handle it. Maybe the handling the OPTIONS requests directly in the listener is a better idea as proposed by other people in this thread.

I’m willing to spend more time on this issue. What are your views on that?

I’ll explore the other solution which is trying to disable CORS by changing the content-type and see where I go with that.

Edit: I’ve added an option to let the application handle OPTIONS requests instead of engine.io. https://github.com/socketio/engine.io/pull/484

We should fix this. OPTIONS preflights can happen when doing CORS stuff and this behavior is really annoying to an end user to deal with. It basically renders this module unusable behind the amazon ELB even when sticky sessions are on. I do not think this has anything to do with careless coding and is happening under the simplest uses of this module.

Using 2.0.3 on the server and the following on the client, we successfully serve WebSockets and long polling with a HA setup in production behind ALB (about 2k conn / server).

   "socket.io": {
      "version": "1.6.0",
      "resolved": "https://registry.npmjs.org/socket.io/-/socket.io-1.6.0.tgz",
      "integrity": "sha1-PkDZMmN+a9kjmBslyvfFPoO24uE=",
      "requires": {
        "debug": "2.3.3",
        "engine.io": "1.8.0",
        "has-binary": "0.1.7",
        "object-assign": "4.1.0",
        "socket.io-adapter": "0.5.0",
        "socket.io-client": "1.6.0",
        "socket.io-parser": "2.3.1"
      }

We didn’t do any change on the server since last August (except Node version upgrade).

The setup is what I described earlier in https://github.com/socketio/engine.io/issues/279#issuecomment-309203357

What I meant is adding Access-Control-Allow-Methods: GET, POST would remove the OPTIONS request in 3/, right?

  1. OPTIONS /socket.io/?EIO=3&transport=polling&t=... (allow GET and POST, not only GET)
  2. GET /socket.io/?EIO=3&transport=polling&t=...
  3. GET /socket.io/?EIO=3&transport=polling&t=...&sid=...
  4. POST /socket.io/?EIO=3&transport=polling&t=...&sid=...

+1. Big time priority.

On Mon Dec 01 2014 at 3:17:05 AM Mark Mokryn notifications@github.com wrote:

So basically what I think we need is:

  1. OPTIONS turned off by default on engine.io server side
  2. By default forceBase64 for XHR if client detects CORS scenario (WS is of course fine)
  3. Add a forceBinaryXHRCors flag (default=false) which will disable (2) above, thus the browser will emit OPTIONS requests, and the developer will need to turn on the OPTIONS flag server-side.

So basically, by default all works smoothly and no OPTIONS as most people would probably prefer. If people insist on using octet-stream then they should manually configure this, but it should be supported. I really don’t think we should just send binary data blindly in CORS scenarios.

— Reply to this email directly or view it on GitHub https://github.com/Automattic/engine.io/issues/279#issuecomment-65024839 .