testem: Testem Chrome Timeouts
Hey,
as I mention in https://github.com/testem/testem/issues/1015 we frequently get testem failures in CI when running against the Chrome (we use Testem through Ember CLI). Our test runs either fail to start (as if testem can’t find the browser) or fail to finish (all tests have run and pass, but the browser window sits there and it eventually times out). It usually just tells us testem.js has not loaded.
As the issue is intermittent it can be hard to reproduce. I’ve tried to recreate it in a repo by simply forking Travis CI’s web repo (uses Ember CLI) and adding ember-exam to it (Ember Exam just allows you to easily use Testem’s parallel feature). If running a single test run it’d probably take some time to get a timeout, but if you run ember exam --split=9 --parallel you should start getting some timeouts in a few runs.
I’ve generated some (pretty verbose) logs here.
Generally on startup I’ve seen the following errors:
not ok 323 Chrome - error
---
message: >
Error: Browser failed to connect within 30s. testem.js not loaded?
Stderr:
2016-11-19 00:42:01.877 Google Chrome[28575:622538] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa45b8022a0>. Break on NSLog to debug.
2016-11-19 00:42:01.878 Google Chrome[28575:622538] Call stack:
(
"+callStackSymbols disabled for performance reasons"
)
Log: |
{ type: 'error',
text: 'Error: Browser failed to connect within 30s. testem.js not loaded?' }
{ type: 'error',
text: '2016-11-19 00:42:01.877 Google Chrome[28575:622538] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa45b8022a0>. Break on NSLog to debug.\n2016-11-19 00:42:01.878 Google Chrome[28575:622538] Call stack:\n(\n "+callStackSymbols disabled for performance reasons"\n)\n' }
...
On timeouts at the end I’ve seen
engine:ws writing "3" +1ms
engine:polling closing +649ms
engine:polling transport not writable - buffering orderly close +0ms
socket.io:client client close with reason ping timeout +0ms
socket.io:socket closing socket - reason ping timeout +0ms
engine:polling closing +8ms
engine:polling transport not writable - buffering orderly close +1ms
socket.io:client client close with reason ping timeout +0ms
socket.io:socket closing socket - reason ping timeout +0ms
engine:polling closing +654ms
engine:polling transport not writable - buffering orderly close +0ms
socket.io:client client close with reason ping timeout +0ms
socket.io:socket closing socket - reason ping timeout +0ms
engine:polling closing +78ms
engine:polling transport not writable - buffering orderly close +0ms
socket.io:client client close with reason ping timeout +0ms
socket.io:socket closing socket - reason ping timeout +0ms
engine:polling closing +5s
engine:polling transport not writable - buffering orderly close +0ms
socket.io:client client close with reason ping timeout +0ms
socket.io:socket closing socket - reason ping timeout +0ms
engine:polling closing +2s
engine:polling transport not writable - buffering orderly close +0ms
socket.io:client client close with reason ping timeout +0ms
socket.io:socket closing socket - reason ping timeout +0ms
not ok 205 Chrome - error
---
message: >
Error: Browser disconnected
Stderr:
2016-11-19 00:49:10.060 Google Chrome[28965:629220] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa7a07af8a0>. Break on NSLog to debug.
2016-11-19 00:49:10.061 Google Chrome[28965:629220] Call stack:
(
"+callStackSymbols disabled for performance reasons"
)
Log: |
{ type: 'error', text: 'Error: Browser disconnected' }
{ type: 'error',
text: '2016-11-19 00:49:10.060 Google Chrome[28965:629220] NSWindow warning: adding an unknown subview: <FullSizeContentView: 0x7fa7a07af8a0>. Break on NSLog to debug.\n2016-11-19 00:49:10.061 Google Chrome[28965:629220] Call stack:\n(\n "+callStackSymbols disabled for performance reasons"\n)\n' }
...
socket.io:client client close with reason transport close +8s
socket.io:socket closing socket - reason transport close +0ms
Other things to note:
- we’ve been experiencing this for a while now, through multiple versions of Chrome, Node and Testem. Currently running latest Chrome, Testem 1.13 and Node 7.
- We have a large codebase (~150K LOC) so not sure if that amplifies the issue or not.
/cc @stefanpenner (as we chatted briefly about this)
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 12
- Comments: 133 (60 by maintainers)
Commits related to this issue
- Try testem config from https://github.com/testem/testem/issues/1021 — committed to m-basov/ember-django-adapter by m-basov 7 years ago
- chore(ember): update ember and ember-data to 2.17 and fix deprecations * update ember and ember-data to 2.17 and fix deprecations * Update test config * Use latest Chrome on CI * Use script from G... — committed to dustinfarris/ember-django-adapter by m-basov 7 years ago
- chore(ci): Increase browser start timeout We're frequently seeing an error about testem failing to connect to the browser (Chrome) within the 30 second default timeout in our CI builds. This increas... — committed to HeroicEric/data by HeroicEric 5 years ago
- chore(ci): Increase browser start timeout (#6592) We're frequently seeing an error about testem failing to connect to the browser (Chrome) within the 30 second default timeout in our CI builds. ... — committed to emberjs/data by HeroicEric 5 years ago
- chore(ci): Increase browser start timeout (#6592) We're frequently seeing an error about testem failing to connect to the browser (Chrome) within the 30 second default timeout in our CI builds. ... — committed to emberjs/data by HeroicEric 5 years ago
- chore(ci): Increase browser start timeout (#6592) We're frequently seeing an error about testem failing to connect to the browser (Chrome) within the 30 second default timeout in our CI builds. ... — committed to emberjs/data by HeroicEric 5 years ago
- chore(ci): Increase browser start timeout (#6592) We're frequently seeing an error about testem failing to connect to the browser (Chrome) within the 30 second default timeout in our CI builds. ... — committed to emberjs/data by HeroicEric 5 years ago
- fix(): testem browser start timeout https://github.com/fabricjs/fabric.js/pull/8344#issuecomment-1279641265 https://github.com/testem/testem/issues/1021 — committed to luizzappa/fabric.js-bbox by ShaMan123 2 years ago
This is how I fixed timeout issue with Chrome 59+:
We are using
ember examand also seeing similar issues with our Team City build server. The error message is slightly different though.We’re seeing
Error: Browser disconnectedin our Ember tests and the above does not help. 😭 Different issue?@stefanpenner I’ll see if I can carve out some time this week to put something together
No improvements here, unfortunately.
I realise this issue can arise in many different ways. But, just in case this helps anyone - our app had a few hyperlinks that pointed to outside the Ember app, which when clicked would leave Qunit. Fixing that helped solve the browser timeout issues.
@stefanpenner my app is a reproduction https://github.com/NullVoxPopuli/emberclear/ but, I’m not sure where this is happening. I have event listeners in multiple places. The only way I can run my tests at a reasonable pace is in headless
@scalvert I only have about 250 tests (including ESLint) but I need to create a lot of data using mirage. Just started running into this today
I can confirm that what @amk221 describes was the main issue for us a well. A QUnit test helper that throws an error and fails the test, when any form of navigation occurs, would be sweet.
🎉 Using qunit test isolation and the stack traces it provides in the console I believe I have tracked my issues down to some leaky async in ember-cli-loading-slider. 🤞
Was able to get mine fixed by moving to the latest version of ember-gestures (1.1.1) There was a memory leak there that was causing Chrome to run out of memory and thus the socket disconnected.
@NullVoxPopuli The latest version of ember-qunit (4.2.0) added test-isolation that may help with this: https://github.com/emberjs/ember-qunit/blob/master/TEST_ISOLATION_VALIDATION.md
@step2yeung hey there, I do get the
browser timeout exceedederror everytime, but it states a different test as the problem every run. seems to be arbitraryNoticed headless chrome to run tests much slower than regular chrome. Further research led me to https://github.com/Codeception/CodeceptJS/issues/561.
Adding
--proxy-server='direct://'and--proxy-bypass-list=*to the list of browser_args in the testem configuration seems to have solved the initial timeout problem for me.Our theory was some of our tests is blocking the main thread for some period of time. When such a test is executing, while the socket.io’s heartbeat is waiting for a reply, it could cause the socket to disconnect since a reply for the heartbeat does not reach the socket in time.
The socket disconnecting will emit a disconnect event, which testem handles in
ondisconnect(), and waits forbrowser_disconnect_timeoutamount of time until throwingBrowser Timeout Exceedederror.After submitting #1248 , which makes socketio’s heartbeat timeout configurable in testem’s config. We set the heartbeat timeout to a value larger than 5s (which is the default timeout value socketio sets) and have seen an improvement although not totally eliminating the timeouts.
Happy to make the 100MB a default again as DoS by malicious clients shouldn’t really be an issue for testem.
We’ve traced this down to these changes https://github.com/socketio/engine.io/compare/3.5.0...3.6.0
Specifically: https://github.com/socketio/engine.io/commit/58e274c437e9cbcf69fd913c813aad8fbd253703
It appears the new default of 1MB instead of 100MB for maxBufferSize is not enough for many test setups. We’ll need to either set this config in testem or provide a way for users to pass engine.io/socket.io config from the testem config.
We also had the same issue using faker to generate URLs, which would invariably be external. Overriding this behavior also helped stabilize our tests.
This ultimately turned out to be a memory issue. Our node instance was leaking the Ember App container. After fixing, upgrading worked. Related to this - https://github.com/emberjs/data/issues/5061
Is it possible this could be also triggered by memory? We have an Ember test suite with over 3000 tests, and several hundred of them are acceptance tests. We’ve been seeing this timeout error sporadically when running the whole suite in a single travis instance, but splitting the suite into multiple travis runners seems to have helped.
I’d like to chime in and mention I started seeing the same error that @allthesignals mentioned (“Browser timeout exceeded: 10s”) after adding a single
window.addEventListener(...);in my app…I’ve tried to setbrowser_disconnect_timeoutto a large value but still fails. Also, for me the error is thrown only when running the entire test suite.With https://github.com/testem/testem/pull/1283 we have correct the error message for when the total test suite time has been reached…
Released as v2.9.1.
Some subset of you timeouts are due to that, and this error message should help track those down.
I also noticed by turning
JOBS=2On circleci cleared up the issue I had completely.I thought this only impacted build but apparently the problem can leak into tests somehow.
I encountered the
Lost UI shared contextvariant of this today after upgrading a 3.0 app to 3.2(ish), when running on CircleCI (2.0), also withember-exam.I tried many of the suggestions in this thread, and what did work was removing the
partition/paralleloption toember-exam.UPDATE: Putting
ember-examback to normal but setting theJOBSenvironment variable to 1 also works (https://github.com/ember-cli/ember-cli/blob/master/docs/build-concurrency.md)Second UPDATE: The same failure has reoccured, but it no longer happens consistently.
Ember 3.2.0 ember-cli 3.2.0-beta.2 testem 2.8.2
Upgrading
ember-cliandtestemto latest releases didn’t help.But I’ve found the cause of my issue! It is setting the amount of
ember-exampartitions (parallel test runs) to a number larger than the amount of CPU cores available on the CI. I assumed that extra partitions would be queued (I still think they should).+1 to @stefanpenner suggestion @astronomersiva I am totally open to that 😃
@step2yeung , can confirm this. I am yet to see a failure after trying this out. Will see if there are any failures over the coming days.
I have set it equal to the browser timeout itself to be on the safe side. I believe @stefanpenner’s idea is correct. We should be setting this automatically based on the browser timeout. I can work on it if you guys are okay with it.
EDIT: I am seeing failures now.
Should we increase the heartbeat timeout, automatically when the primary timeout config is changed? Something like 80% the primary timeout, unless the heartbeat timeout has its own config?
@lolmaus Chrome 60 was released today, try it. 🤓
It would be great to get the chrome debug console output in those cases, maybe using the remote debugger are somehow manually.
Looks like in some cases the initial connect takes >30s with can have multiple reasons: unparsable js, slow js parsing, failed websocket, testem client bug, etc.
Sadly there isn’t much, which can be done before connecting (as there is obviously no connection), but maybe the remote debug protocol could provide some access to the console output
Hi.
We have the same problem. Chrome just open the window and the tab spinner is running, but it seems it cannot load the page from localhost:33062/921/tests/index.html. We’re using ember-cli too.
We have set the timeout to 180 seconds, but still no dice. 😭
Firefox on the other hand just works, but Chrome starts only 1/50 times. Chrome only show the
Xlib: extension "RANDR" missing on display warningmessage.I used x11vnc to have a look at the Xvfb display via SSH, and now chrome immediately displays the “Aw Snap, something went wrong while loading the page” error message. I’ve tried version 54 and 55.
When I reload the page and open the devtools, I can see that loading the 10.6 MiB resources of the page takes 2.5 minutes to finish (whereas firefox only takes a few seconds). So, after setting the timeout to ridiculous 300 seconds, chrome actually starts to run the tests…