testcafe: Unstable error "Unable to establish one or more of the specified browser connections" after Chrome update (v83)
What is the Test Scenario?
TestCafe tests in TeamCity against headless Chrome in parallel.
What is the Current behavior?
After Chrome update to v83 TestCafe tests periodically (in about half the times) do not start with the following error:
GeneralError: Unable to establish one or more of the specified browser connections. This can be caused by network issues or remote device failure.
at BrowserSet._waitConnectionsOpened (E:\BuildAgent\work\6d1660a0b0f4fce5\testcafe\node_modules\testcafe\src\runner\browser-set.ts:91:30)
at E:\BuildAgent\work\6d1660a0b0f4fce5\testcafe\node_modules\testcafe\src\runner\browser-set.ts:114:35
at processTicksAndRejections (internal/process/task_queues.js:94:5)
at Bootstrapper._getBrowserConnections (E:\BuildAgent\work\6d1660a0b0f4fce5\testcafe\node_modules\testcafe\src\runner\bootstrapper.ts:215:16)
at async Promise.all (index 0)
at Bootstrapper._bootstrapParallel (E:\BuildAgent\work\6d1660a0b0f4fce5\testcafe\node_modules\testcafe\src\runner\bootstrapper.ts:391:38)
at Bootstrapper.createRunnableConfiguration (E:\BuildAgent\work\6d1660a0b0f4fce5\testcafe\node_modules\testcafe\src\runner\bootstrapper.ts:424:42) {
code: 'E1004',
data: []
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! test.testcafe@1.0.0 test:teamcity
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the se.test.testcafe@1.0.0 test:teamcity script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\TC_BuildService\AppData\Roaming\npm-cache\_logs\2020-06-16T13_04_02_492Z-debug.log
Process exited with code 1
Process exited with code 1 (Step: Tests (Command Line))
Tests (Command Line) failed
Before the update these errors were very very rare.
It happens not only in TeamCity, but on local run too (less often).
Using chrome:headless:userProfile or/and chrome:headless --no-sandbox didn’t help.
What is the web application and TestCafe test code?
Parameters:
- target browser: chrome:headless
- concurrency level: 6
- hostname: localhost
- port1: 1337
- port2: 1338
- skipJsErrors: false
- skipUncaughtErrors: true
TestCafe runner code:
const createTestCafe = require('testcafe');
const path = require('path');
const config = require('./.testcafe.config');
let testcafe = null;
createTestCafe(config.hostname, config.port1, config.port2)
.then(tc => {
testcafe = tc;
const runner = testcafe
.createRunner()
.browsers(config.browsers)
.src(config.src)
.concurrency(config.concurrency)
.reporter(config.reporter);
return runner.run({
...config.runnerOptions,
quarantineMode: true
});
})
.then(function(failedCount) {
testcafe.close();
process.exit(failedCount ? 1 : 0);
})
.catch(function(error) {
console.error(error);
testcafe.close();
process.exit(1);
});
Custom configuration file:
{
hostname: 'localhost',
port1: 1337,
port2: 1338,
browser: 'chrome:headless',
src: './tests/*.js',
concurrency: 6,
reporter: 'teamcity',
screenshots: {
fullPage: false,
takeOnFails: false
},
runnerOptions: {
skipJsErrors: false,
skipUncaughtErrors: true,
pageLoadTimeout: 15000,
selectorTimeout: 6000,
assertionTimeout: 6000
}
}
Environment details:
- testcafe version: 1.8.6
- node.js version: 12.14.1
- browser name and version: Chrome 83.0.4103.106 / Windows 10
- platform and version: Microsoft Windows Server 2016 Standard
- TeamCity: 2019.2.4 (build 72059)
- testcafe-reporter-teamcity: 1.0.10
Comments
I’m sorry I can’t provide a public link or a stable repro.
Please tell me which parameters I should pay attention at and which test configuration I should try. Could it be really related to last Chrome update?
If you need additional info I’m happy to provide it.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (8 by maintainers)
I’ve identified the root cause in my case. If CPU usage on my machine exceeds 70-80% this error occurs. In case of lower CPU usage TestCafe works as expected.
Hello @bryg217,
I’m glad this is helpful. Further feedback on whether the updates will solve the problem with stability would be much appreciated.
Hello, @bryg217,
Quarantine mode doesn’t retry the connection when it’s not being established during the timeout. This mode is designed for retrying unstable tests - it’ll not be helpful with an unstable browser connection.
We changed the error message so that it provides more information in the context of the following PR: Improve the message shown by “Unable to establish connection” error. Also, in this PR we introduced the functionality to specify the timeout during which browser connection should be established. The changes from the PR are not yet released/reflected in our documentation, but you can already test them and see if increasing the default timeout by specifying “–browser-init-timeout” flag resolves the problem with stability. You can install it by running the following command:
It is not a problem of CPU or RAM in my case. The issue happens also when the machine is not overloaded. Also I don’t want to execute tests with the minimal concurrency because I would like my test to be run in reasonable times. Even if it worked It’s just a workaround not the real answer to the problem.
Hi. Same for me. Constantly face the issue with
Unable to establish one or more of the specified browser connections. This can be caused by network issues or remote device failureerror. Please, help to find out the workarounds.Even with DEBUG=hammerhead*,testcafe* and with aggressively debug() enhanced source, we were not able to discern why chrome is not processing the IDLE_PAGE content. If any chrome wizards lurk in here, we attempted to use the TestCafe (TC) “custom” testcafe browser, multiplex the chromium output, capture
chrome.log& stdio from the foreground chrome process, but it didn’t prove super useful because we were unable to get log content from tabs/windows: https://stackoverflow.com/questions/66926607/how-can-i-run-chromium-in-the-foreground-and-capture-native-and-webview-logs . If we can reliably get tab level, network level, and window level events from the chromium cli & emitted logs, such information could perhaps improve robustness in the TC workflow.The TestCafe initialization process can improve robustness, now, by changing the synchronization mechanisms executed between the browser process & node runner.
Current process:
A more robust approach would be:
Currently, various effects are executed & coordinate by careful alignment of assets. Success is achieved by optimistically expecting that each downstream, uncontrolled effect executes successfully. It is dangerous to pass ownership of initialization control flow to chrome, and chrome to the embedded webapp, as TC does not have hooks into either of these systems by the time
runInitScriptsis called. TC launches chrome, passes a URL, & 🤞 wishes both chrome and the downstream web-app the best of luck. What if TC managed the whole init process, vs implicitly marshaling that responsibility to these other (generally reliable, but currently failing) entities?Current process (pseudo):
Possible future:
We are finding this error on the daily in chrome as well. We’ve gone nuts and added debug(…) statements everywhere 😃
the HTML idle page document either doesn’t make it to chrome (unlikely), or, the on page javascript is periodically failing, which prevents TC from bootstrapping itself.
I’d like to find a way to capture the local chrome output emitted localChrome.start(…). Anyone know if this is feasible?
I think we have found the reason.
@AndreyBelym
We were moving TestCafe to run on Google Cloud Run, and we were using always only concurrency 1 (just one browser at a time). It was strange to us that even one browser was failing with this error.
We have tried:
Nothing helped.
But… after bit of googling we have found that Puppeteer has similar issue. Googling for the solution for Puppeteer failed.
However, our Engineer started writing down everything that is different between good and bad starts, and he have found that ports were consistently same in failed runs.
So we have hardcoded the port like this:
After hardcoding it like that - everything works perfectly.
Maybe you could add some feature that checks if Port is available before trying to start process with this port?