selenium: [šŸ› Bug]: Hub hangs with "Unable to find session" after a while

What happened?

I’ve set up a Hub and 2 Nodes on three different Ubuntu servers. The Hub is on Ubuntu 16.04.7 LTS and the two nodes on Ubuntu 20.04.4 LTS. Selenium is driven by a Java program using standard libraries.

All works fine for a while (an hour or so), then suddenly the Hub stops responding. The symptoms are:

  • when connecting to http://myhub:4444/ui/index.html I get the normal ā€œSelenium Gridā€ page but it is stuck on the ā€œLoadingā€¦ā€ message
  • the hub output shows many messages containing this:
"exception.message": "Unable to execute request for an existing session: Unable to find session with ID: 98d1a9efa8b787398b29c2d6c5f2ddab
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'controller', ip: 'xx.xx.xx.xx', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-210-generic', java.version: '1.8.0_292'
Driver info: driver.version: unknown"
  • the java program gets the exception shown in the ā€œRelevant log outputā€ box.

If I restart the Hub, all goes back to normal: I don’t have to restart either the nodes or the Java program.

Command to start the hub:

java -jar selenium-server-4.1.2.jar hub --config hub.toml

Content of hub.toml:

[server]
port = 4444
registration-secret = "xxxxxx"
[sessionqueue]
session-request-timeout = 300

Command to start the node:

java -jar selenium-server-4.1.2.jar node --config node.toml

Content of node.toml:

[server]
port = 5555
registration-secret = "xxxxxx"
[node]
hub = "xx.xx.xx.xx"
override-max-sessions = true
max-sessions = 8
session-timeout = 60
detect-drivers = false
[[node.driver-configuration]]
display-name = "Chrome Latest"
stereotype = "{\"browserName\": \"chrome\"}"
webdriver-executable = '/mypath/chromedriver'

Output of top (last line is the hub):

top - 15:11:10 up 22:00,  3 users,  load average: 0.20, 0.23, 0.19
Tasks: 208 total,   1 running, 207 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  1.2 sy,  0.0 ni, 82.3 id,  0.2 wa,  0.0 hi,  1.7 si,  2.2 st
KiB Mem :  8175040 total,  1773696 free,  3951596 used,  2449748 buff/cache
KiB Swap:  8388604 total,  8348196 free,    40408 used.  3875604 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  908 tomcat8   20   0 4055844 1.196g  17808 S  21.9 15.3 115:44.34 java
 1183 mysql     20   0 1670676 528960  15472 S   2.3  6.5  25:39.78 mysqld
15065 root      20   0 4685516 346108  19128 S   2.0  4.2  11:59.94 java

If I restart just the hub, all goes back to normal: nodes register and I see the queue growing in the Selenium Grid web console.

How can we reproduce the issue?

I know you need to reproduce the problem but I can't share my whole application: it's not just a script and it needs a lot of work to be set up (tomcat, mysql, user data etc.)

Relevant log output

org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'controller', ip: 'xx.xx.xx.xx', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-210-generic', java.version: '1.8.0_292'
Driver info: net.yadaframework.selenium.YadaSeleniumUtil$1
Command: [null, newSession {capabilities=[Capabilities {browserName: chrome, goog:chromeOptions: {args: [user-data-dir=/srv/xxx/..., user-agent=Mozilla/5.0 (Win...], extensions: []}, proxy: {httpProxy: yy.yy.yy.yy:9092, proxyType: manual, sslProxy: yy.yy.yy.yy:9092}}], desiredCapabilities=Capabilities {browserName: chrome, goog:chromeOptions: {args: [user-data-dir=/srv/xxx/..., user-agent=Mozilla/5.0 (Win...], extensions: []}, proxy: Proxy(manual, http=yy.yy.yy...}}]
Capabilities {}
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:576)
        at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:245)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:161)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:143)
        at net...
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.UncheckedIOException: java.io.IOException: Stream closed
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:80)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
        at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
        at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
        at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
        at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
        at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)
        at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:102)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
        at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
        at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:558)
        ... 17 common frames omitted
Caused by: java.io.IOException: Stream closed
        at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:159)
        at java.io.BufferedInputStream.available(BufferedInputStream.java:410)
        at java.io.FilterInputStream.available(FilterInputStream.java:168)
        at java.io.PushbackInputStream.available(PushbackInputStream.java:277)
        at io.netty.handler.stream.ChunkedStream.isEndOfInput(ChunkedStream.java:82)
        at io.netty.handler.stream.ChunkedStream.readChunk(ChunkedStream.java:109)
        at io.netty.handler.stream.ChunkedStream.readChunk(ChunkedStream.java:36)
        at io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:229)
        at io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:131)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:311)
        at org.asynchttpclient.netty.request.body.NettyInputStreamBody.write(NettyInputStreamBody.java:77)
        at org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:433)
        at org.asynchttpclient.netty.channel.NettyConnectListener.writeRequest(NettyConnectListener.java:80)
        at org.asynchttpclient.netty.channel.NettyConnectListener.onSuccess(NettyConnectListener.java:156)
        at org.asynchttpclient.netty.channel.NettyChannelConnector$1.onSuccess(NettyChannelConnector.java:92)
        at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:26)
        at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 common frames omitted

Operating System

Ubuntu 16.04.7 LTS

Selenium version

Java 1.8.0_292, Selenium 4.1.2.

What are the browser(s) and version(s) where you see this issue?

Chrome 98.0.4758.102

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver 98.0.4758.80

Are you using Selenium Grid?

4.1.2

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Here is a jar I generated with this fix, you can try it while the release is done.

selenium-server-4.1.3-df41b58162.zip

I upgraded my grid to 4 and now have a similar issue. After about an hour of running tests, I’ll start getting ā€œCould not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.ā€ when trying to get a new chrome instance. When this happens, I notice that the hub api stops responding to my requests when I call the graphql endpoint to get status details. 4.1.3 did not fix the issue for me.

We have similar issues. I could track it down to the grid nodes. It looks like after doing some work and the in the idle phase, internal messages are sent regularly, which either are empty or already consumed. This leads to this stack-trace:

Name: pool-3-thread-64 State: TIMED_WAITING on java.io.PipedInputStream@5772c3f2 Total blocked: 0 Total waited: 60

Stack trace: java.base@17.0.2/java.lang.Object.wait(Native Method) java.base@17.0.2/java.io.PipedInputStream.read(PipedInputStream.java:326) java.base@17.0.2/java.io.PipedInputStream.read(PipedInputStream.java:377) java.base@17.0.2/java.io.InputStream.read(InputStream.java:218) app//com.google.common.io.ByteStreams.copy(ByteStreams.java:112) app//org.openqa.selenium.remote.http.Contents$MemoizedSupplier.get(Contents.java:158) locked org.openqa.selenium.remote.http.Contents$MemoizedSupplier@334449ae app//org.openqa.selenium.remote.http.Contents$MemoizedSupplier.get(Contents.java:141) app//org.openqa.selenium.remote.http.Contents.bytes(Contents.java:75) app//org.openqa.selenium.remote.http.Contents.string(Contents.java:92) app//org.openqa.selenium.remote.http.Contents.string(Contents.java:96) app//org.openqa.selenium.grid.node.CustomLocatorHandler.execute(CustomLocatorHandler.java:110) app//org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336) app//org.openqa.selenium.remote.http.Route.execute(Route.java:68) app//org.openqa.selenium.grid.node.Node.execute(Node.java:240) app//org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336) app//org.openqa.selenium.remote.http.Route.execute(Route.java:68) app//org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35) app//org.openqa.selenium.remote.AddWebDriverSpecHeaders$$Lambda$434/0x0000000800e1e298.execute(Unknown Source) app//org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44) app//org.openqa.selenium.remote.ErrorFilter$$Lambda$433/0x0000000800e1e068.execute(Unknown Source) app//org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64) app//org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44) app//org.openqa.selenium.remote.ErrorFilter$$Lambda$433/0x0000000800e1e068.execute(Unknown Source) app//org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64) app//org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44) app//org.openqa.selenium.netty.server.SeleniumHandler$$Lambda$430/0x0000000800e1da08.run(Unknown Source) java.base@17.0.2/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) java.base@17.0.2/java.util.concurrent.FutureTask.run(FutureTask.java:264) java.base@17.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) java.base@17.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) java.base@17.0.2/java.lang.Thread.run(Thread.java:833)

From the screenshot, you can see there are many thready named ā€˜pool-3-thread-XX’. We have set our sessionTimeout to 180 seconds, which matches the interval a thread is added.

selenium-grid-node-thread-leak

selenium-grid-node-thread-leak2

Eventually, the grid node has to many handles open and blocks/hangs. Restarting the node solves the issue temporarly.

We’re using Selenium Grid 4.1.2 on a Windows 10 system