testem: Testem Chrome Timeouts

Hey,

as I mention in https://github.com/testem/testem/issues/1015 we frequently get testem failures in CI when running against the Chrome (we use Testem through Ember CLI). Our test runs either fail to start (as if testem can’t find the browser) or fail to finish (all tests have run and pass, but the browser window sits there and it eventually times out). It usually just tells us testem.js has not loaded.

As the issue is intermittent it can be hard to reproduce. I’ve tried to recreate it in a repo by simply forking Travis CI’s web repo (uses Ember CLI) and adding ember-exam to it (Ember Exam just allows you to easily use Testem’s parallel feature). If running a single test run it’d probably take some time to get a timeout, but if you run ember exam --split=9 --parallel you should start getting some timeouts in a few runs.

I’ve generated some (pretty verbose) logs here.

Generally on startup I’ve seen the following errors:

not ok 323 Chrome - error
    ---
        message: >
            Error: Browser failed to connect within 30s. testem.js not loaded?
            Stderr:
             2016-11-19 00:42:01.877 Google Chrome[28575:622538] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa45b8022a0>. Break on NSLog to debug.
            2016-11-19 00:42:01.878 Google Chrome[28575:622538] Call stack:
            (
                "+callStackSymbols disabled for performance reasons"
            )


        Log: |
            { type: 'error',
              text: 'Error: Browser failed to connect within 30s. testem.js not loaded?' }
            { type: 'error',
              text: '2016-11-19 00:42:01.877 Google Chrome[28575:622538] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa45b8022a0>. Break on NSLog to debug.\n2016-11-19 00:42:01.878 Google Chrome[28575:622538] Call stack:\n(\n    "+callStackSymbols disabled for performance reasons"\n)\n' }
    ...

On timeouts at the end I’ve seen

  engine:ws writing "3" +1ms
  engine:polling closing +649ms
  engine:polling transport not writable - buffering orderly close +0ms
  socket.io:client client close with reason ping timeout +0ms
  socket.io:socket closing socket - reason ping timeout +0ms
  engine:polling closing +8ms
  engine:polling transport not writable - buffering orderly close +1ms
  socket.io:client client close with reason ping timeout +0ms
  socket.io:socket closing socket - reason ping timeout +0ms
  engine:polling closing +654ms
  engine:polling transport not writable - buffering orderly close +0ms
  socket.io:client client close with reason ping timeout +0ms
  socket.io:socket closing socket - reason ping timeout +0ms
  engine:polling closing +78ms
  engine:polling transport not writable - buffering orderly close +0ms
  socket.io:client client close with reason ping timeout +0ms
  socket.io:socket closing socket - reason ping timeout +0ms
  engine:polling closing +5s
  engine:polling transport not writable - buffering orderly close +0ms
  socket.io:client client close with reason ping timeout +0ms
  socket.io:socket closing socket - reason ping timeout +0ms
  engine:polling closing +2s
  engine:polling transport not writable - buffering orderly close +0ms
  socket.io:client client close with reason ping timeout +0ms
  socket.io:socket closing socket - reason ping timeout +0ms
not ok 205 Chrome - error
    ---
        message: >
            Error: Browser disconnected
            Stderr:
             2016-11-19 00:49:10.060 Google Chrome[28965:629220] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa7a07af8a0>. Break on NSLog to debug.
            2016-11-19 00:49:10.061 Google Chrome[28965:629220] Call stack:
            (
                "+callStackSymbols disabled for performance reasons"
            )


        Log: |
            { type: 'error', text: 'Error: Browser disconnected' }
            { type: 'error',
              text: '2016-11-19 00:49:10.060 Google Chrome[28965:629220] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa7a07af8a0>. Break on NSLog to debug.\n2016-11-19 00:49:10.061 Google Chrome[28965:629220] Call stack:\n(\n    "+callStackSymbols disabled for performance reasons"\n)\n' }
    ...
  socket.io:client client close with reason transport close +8s
  socket.io:socket closing socket - reason transport close +0ms

Other things to note:

  • we’ve been experiencing this for a while now, through multiple versions of Chrome, Node and Testem. Currently running latest Chrome, Testem 1.13 and Node 7.
  • We have a large codebase (~150K LOC) so not sure if that amplifies the issue or not.

/cc @stefanpenner (as we chatted briefly about this)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 12
  • Comments: 133 (60 by maintainers)

Commits related to this issue

Most upvoted comments

This is how I fixed timeout issue with Chrome 59+:

{
    "framework": "qunit",
    "test_page": "tests/index.html?hidepassed",
    "disable_watching": true,
    "launch_in_ci": [
        "Chrome"
    ],
    "launch_in_dev": [
        "Chrome"
    ],
    "browser_start_timeout": 60,
    "browser_args": {
        "Chrome": [
            "--disable-gpu",
            "--disable-web-security", // optional, since chrome 60+
            "--headless",
            "--incognito", // optional
            "--no-sandbox", // optional, since chrome 60+
            "--remote-debugging-address=0.0.0.0",
            "--remote-debugging-port=9222"
        ]
    }
}

We are using ember exam and also seeing similar issues with our Team City build server. The error message is slightly different though.

[Chrome - error] Error: Browser failed to connect within 30s. testem.js not loaded?
Stderr: 
 [0419/061938.039012:WARNING:audio_manager.cc(321)] Multiple instances of AudioManager detected
[0419/061938.039065:WARNING:audio_manager.cc(278)] Multiple instances of AudioManager detected
[0419/061938.371223:ERROR:socket_posix.cc(126)] bind() returned an error, errno=98: Address already in use
[0419/061938.371271:ERROR:devtools_http_handler.cc(221)] Cannot start http server for devtools. Stop devtools.

We’re seeing Error: Browser disconnected in our Ember tests and the above does not help. 😭 Different issue?

@stefanpenner I’ll see if I can carve out some time this week to put something together

No improvements here, unfortunately.

I realise this issue can arise in many different ways. But, just in case this helps anyone - our app had a few hyperlinks that pointed to outside the Ember app, which when clicked would leave Qunit. Fixing that helped solve the browser timeout issues.

@stefanpenner my app is a reproduction https://github.com/NullVoxPopuli/emberclear/ but, I’m not sure where this is happening. I have event listeners in multiple places. The only way I can run my tests at a reasonable pace is in headless

@scalvert I only have about 250 tests (including ESLint) but I need to create a lot of data using mirage. Just started running into this today

I can confirm that what @amk221 describes was the main issue for us a well. A QUnit test helper that throws an error and fails the test, when any form of navigation occurs, would be sweet.

🎉 Using qunit test isolation and the stack traces it provides in the console I believe I have tracked my issues down to some leaky async in ember-cli-loading-slider. 🤞

Was able to get mine fixed by moving to the latest version of ember-gestures (1.1.1) There was a memory leak there that was causing Chrome to run out of memory and thus the socket disconnected.

@NullVoxPopuli The latest version of ember-qunit (4.2.0) added test-isolation that may help with this: https://github.com/emberjs/ember-qunit/blob/master/TEST_ISOLATION_VALIDATION.md

@step2yeung hey there, I do get the browser timeout exceeded error everytime, but it states a different test as the problem every run. seems to be arbitrary

Noticed headless chrome to run tests much slower than regular chrome. Further research led me to https://github.com/Codeception/CodeceptJS/issues/561.

Adding --proxy-server='direct://' and --proxy-bypass-list=* to the list of browser_args in the testem configuration seems to have solved the initial timeout problem for me.

Our theory was some of our tests is blocking the main thread for some period of time. When such a test is executing, while the socket.io’s heartbeat is waiting for a reply, it could cause the socket to disconnect since a reply for the heartbeat does not reach the socket in time.

The socket disconnecting will emit a disconnect event, which testem handles in ondisconnect(), and waits for browser_disconnect_timeout amount of time until throwing Browser Timeout Exceeded error.

After submitting #1248 , which makes socketio’s heartbeat timeout configurable in testem’s config. We set the heartbeat timeout to a value larger than 5s (which is the default timeout value socketio sets) and have seen an improvement although not totally eliminating the timeouts.

Happy to make the 100MB a default again as DoS by malicious clients shouldn’t really be an issue for testem.

We’ve traced this down to these changes https://github.com/socketio/engine.io/compare/3.5.0...3.6.0

Specifically: https://github.com/socketio/engine.io/commit/58e274c437e9cbcf69fd913c813aad8fbd253703

It appears the new default of 1MB instead of 100MB for maxBufferSize is not enough for many test setups. We’ll need to either set this config in testem or provide a way for users to pass engine.io/socket.io config from the testem config.

We also had the same issue using faker to generate URLs, which would invariably be external. Overriding this behavior also helped stabilize our tests.

This ultimately turned out to be a memory issue. Our node instance was leaking the Ember App container. After fixing, upgrading worked. Related to this - https://github.com/emberjs/data/issues/5061

Is it possible this could be also triggered by memory? We have an Ember test suite with over 3000 tests, and several hundred of them are acceptance tests. We’ve been seeing this timeout error sporadically when running the whole suite in a single travis instance, but splitting the suite into multiple travis runners seems to have helped.

I’d like to chime in and mention I started seeing the same error that @allthesignals mentioned (“Browser timeout exceeded: 10s”) after adding a single window.addEventListener(...); in my app…I’ve tried to set browser_disconnect_timeout to a large value but still fails. Also, for me the error is thrown only when running the entire test suite.

With https://github.com/testem/testem/pull/1283 we have correct the error message for when the total test suite time has been reached…

Released as v2.9.1.

Some subset of you timeouts are due to that, and this error message should help track those down.

I also noticed by turning JOBS=2 On circleci cleared up the issue I had completely.

I thought this only impacted build but apparently the problem can leak into tests somehow.

I encountered the Lost UI shared context variant of this today after upgrading a 3.0 app to 3.2(ish), when running on CircleCI (2.0), also with ember-exam.

I tried many of the suggestions in this thread, and what did work was removing the partition/ parallel option to ember-exam.

UPDATE: Putting ember-exam back to normal but setting the JOBS environment variable to 1 also works (https://github.com/ember-cli/ember-cli/blob/master/docs/build-concurrency.md)

Second UPDATE: The same failure has reoccured, but it no longer happens consistently.

Ember 3.2.0 ember-cli 3.2.0-beta.2 testem 2.8.2

Upgrading ember-cli and testem to latest releases didn’t help.

But I’ve found the cause of my issue! It is setting the amount of ember-exam partitions (parallel test runs) to a number larger than the amount of CPU cores available on the CI. I assumed that extra partitions would be queued (I still think they should).

+1 to @stefanpenner suggestion @astronomersiva I am totally open to that 😃

@step2yeung , can confirm this. I am yet to see a failure after trying this out. Will see if there are any failures over the coming days.

I have set it equal to the browser timeout itself to be on the safe side. I believe @stefanpenner’s idea is correct. We should be setting this automatically based on the browser timeout. I can work on it if you guys are okay with it.

EDIT: I am seeing failures now.

After submitting #1248 , which makes socketio’s heartbeat timeout configurable in testem’s config. We set the heartbeat timeout to a value larger than 5s (which is the default timeout value socketio sets) and have seen an improvement although not totally eliminating the timeouts.

Should we increase the heartbeat timeout, automatically when the primary timeout config is changed? Something like 80% the primary timeout, unless the heartbeat timeout has its own config?

@lolmaus Chrome 60 was released today, try it. 🤓

It would be great to get the chrome debug console output in those cases, maybe using the remote debugger are somehow manually.

Looks like in some cases the initial connect takes >30s with can have multiple reasons: unparsable js, slow js parsing, failed websocket, testem client bug, etc.

Sadly there isn’t much, which can be done before connecting (as there is obviously no connection), but maybe the remote debug protocol could provide some access to the console output

Hi.

We have the same problem. Chrome just open the window and the tab spinner is running, but it seems it cannot load the page from localhost:33062/921/tests/index.html. We’re using ember-cli too.

We have set the timeout to 180 seconds, but still no dice. 😭

Firefox on the other hand just works, but Chrome starts only 1/50 times. Chrome only show the Xlib: extension "RANDR" missing on display warning message.

I used x11vnc to have a look at the Xvfb display via SSH, and now chrome immediately displays the “Aw Snap, something went wrong while loading the page” error message. I’ve tried version 54 and 55.

When I reload the page and open the devtools, I can see that loading the 10.6 MiB resources of the page takes 2.5 minutes to finish (whereas firefox only takes a few seconds). So, after setting the timeout to ridiculous 300 seconds, chrome actually starts to run the tests…