selenium: [🐛 Bug]: Timeout exception thrown randomly with RemoteWebDriver

What happened?

When executing tests at scale there are some random failures with read timeouts (seems to wait for 3 mins). We are observing this primarily when clicking an element but have also observed it during navigate to page. Seems to be a reoccurrence of https://github.com/SeleniumHQ/selenium/issues/9528

We are using dockerized grid v4.10 with the hub/node model and we confirmed that the docker process starts with jdk-http-client

java -Dwebdriver.http.factory=jdk-http-client -jar /opt/selenium/selenium-server.jar --ext /opt/selenium/selenium-http-jdk-client.jar hub --session-request-timeout 3600 --session-retry-interval 15 --relax-checks true --bind-host false --config /opt/selenium/config.toml

How can we reproduce the issue?

Unfortunately similar to https://github.com/SeleniumHQ/selenium/issues/9528 we are unable to find a consistent repro for this. Seems flaky and rerunning the test passes but we are still seeing this happen daily.

Relevant log output

We use an internal framework to run selenium tests and the DEBUG logs indicate that a 500 was returned during the click event. We check to see if the element is enabled and clickable (which passes) but the subsequent click seems to fail.


DEBUG    2023-08-15 19:24:02.928  selenium.webdriver.remote.remote_connection:remote_connection.py:303 -> POST http://selenium-four-hub-nomad.service.eng.consul:4444/session/ab89759c17aac47ed60f6d6b9a8b6882/element {"using": "xpath", "value": "//*[@class='extension-name'][text()='JSON']"}
DEBUG    2023-08-15 19:24:02.943  urllib3.connectionpool:connectionpool.py:473 -> http://selenium-four-hub-nomad.service.eng.consul:4444 "POST /session/ab89759c17aac47ed60f6d6b9a8b6882/element HTTP/1.1" 200 95
DEBUG    2023-08-15 19:24:02.943  selenium.webdriver.remote.remote_connection:remote_connection.py:319 -> Remote response: status=200 | data={"value":{"element-6066-11e4-a52e-4f735466cecf":"3AFE27AB794812F985C270183415B5A4_element_21"}} | headers=HTTPHeaderDict({'content-length': '95', 'content-type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
DEBUG    2023-08-15 19:24:02.944  selenium.webdriver.remote.remote_connection:remote_connection.py:346 -> Finished Request
DEBUG    2023-08-15 19:24:02.944  selenium.webdriver.remote.remote_connection:remote_connection.py:303 -> GET http://selenium-four-hub-nomad.service.eng.consul:4444/session/ab89759c17aac47ed60f6d6b9a8b6882/element/3AFE27AB794812F985C270183415B5A4_element_21/enabled {"id": "3AFE27AB794812F985C270183415B5A4_element_21"}
DEBUG    2023-08-15 19:24:02.961  urllib3.connectionpool:connectionpool.py:473 -> http://selenium-four-hub-nomad.service.eng.consul:4444 "GET /session/ab89759c17aac47ed60f6d6b9a8b6882/element/3AFE27AB794812F985C270183415B5A4_element_21/enabled HTTP/1.1" 200 14
DEBUG    2023-08-15 19:24:02.962  selenium.webdriver.remote.remote_connection:remote_connection.py:319 -> Remote response: status=200 | data={"value":true} | headers=HTTPHeaderDict({'content-length': '14', 'content-type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
DEBUG    2023-08-15 19:24:02.962  selenium.webdriver.remote.remote_connection:remote_connection.py:346 -> Finished Request
DEBUG    2023-08-15 19:24:02.963  selenium.webdriver.remote.remote_connection:remote_connection.py:303 -> POST http://selenium-four-hub-nomad.service.eng.consul:4444/session/ab89759c17aac47ed60f6d6b9a8b6882/element {"using": "xpath", "value": "//*[@class='extension-name'][text()='JSON']"}
DEBUG    2023-08-15 19:24:02.978  urllib3.connectionpool:connectionpool.py:473 -> http://selenium-four-hub-nomad.service.eng.consul:4444 "POST /session/ab89759c17aac47ed60f6d6b9a8b6882/element HTTP/1.1" 200 95
DEBUG    2023-08-15 19:24:02.979  selenium.webdriver.remote.remote_connection:remote_connection.py:319 -> Remote response: status=200 | data={"value":{"element-6066-11e4-a52e-4f735466cecf":"3AFE27AB794812F985C270183415B5A4_element_21"}} | headers=HTTPHeaderDict({'content-length': '95', 'content-type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
DEBUG    2023-08-15 19:24:02.979  selenium.webdriver.remote.remote_connection:remote_connection.py:346 -> Finished Request
DEBUG    2023-08-15 19:24:02.980  selenium.webdriver.remote.remote_connection:remote_connection.py:303 -> POST http://selenium-four-hub-nomad.service.eng.consul:4444/session/ab89759c17aac47ed60f6d6b9a8b6882/element/3AFE27AB794812F985C270183415B5A4_element_21/click {"id": "3AFE27AB794812F985C270183415B5A4_element_21"}
DEBUG    2023-08-15 19:27:03.006  urllib3.connectionpool:connectionpool.py:473 -> http://selenium-four-hub-nomad.service.eng.consul:4444 "POST /session/ab89759c17aac47ed60f6d6b9a8b6882/element/3AFE27AB794812F985C270183415B5A4_element_21/click HTTP/1.1" 500 3517
DEBUG    2023-08-15 19:27:03.007  selenium.webdriver.remote.remote_connection:remote_connection.py:319 -> Remote response: status=500 | data={
  "value": {
    "error": "script timeout",
    "message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.10.0', revision: 'c14d967899'\nSystem info: os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-477.13.1.el8_8.x86_64', java.version: '11.0.19'\nDriver info: driver.version: unknown",
    "stacktrace": "java.lang.RuntimeException: Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.10.0', revision: 'c14d967899'\nSystem info: os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-477.13.1.el8_8.x86_64', java.version: '11.0.19'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:160)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:384)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:347)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:87)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:347)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:347)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:347)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:347)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:377)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:91)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:125)\n\t... 26 more\n"
  }
} | headers=HTTPHeaderDict({'content-length': '3517', 'Cache-Control': 'none', 'Content-Type': 'application/json; charset=utf-8'})
DEBUG    2023-08-15 19:27:03.008  selenium.webdriver.remote.remote_connection:remote_connection.py:346 -> Finished Request


### Operating System

Linux

### Selenium version

4.10

### What are the browser(s) and version(s) where you see this issue?

Chrome v.114.0.5735.133

### What are the browser driver(s) and version(s) where you see this issue?

v.114.0.5735.133

### Are you using Selenium Grid?

4.10

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 40 (29 by maintainers)

Commits related to this issue

Most upvoted comments

@pujagani i think i have some code to provoke this, i will create a new issue for the click retry on timeout.

closing based on 5f3f7d096a75700e055effbe13cbc22117014e2c, e8c77b8e08f74905a049805922b57d8a299e2338 & #12838

Yes, I think so. So now that it is narrowed down, I am going to try to fix our code tomorrow.

I also observed that if the ErrorFilter is added then the response contains the exception details. So the error is not swallowed.

HttpHandler handler =
        new RetryRequest()
            .andThen(new ErrorFilter())
            .andFinally(
                (HttpRequest request) -> {
                  throw new UnsupportedOperationException();
                });
    HttpResponse response = handler.execute(new HttpRequest(GET, "/"));

@titusfortner i just pushed the timeout less version in https://github.com/SeleniumHQ/selenium/commit/e8c77b8e08f74905a049805922b57d8a299e2338. So #12368 must not care about this timeout any more.

@pujagani here is some code to provoke this, the fallback (aka null) is taken as the operation itself failing.

  @Test
  void unexpectedExceptionsFallthrough() {
    HttpHandler handler = new RetryRequest().andFinally((HttpRequest request) -> {
      throw new UnsupportedOperationException();
    });

    Assertions.assertThrows(UnsupportedOperationException.class, () -> handler.execute(new HttpRequest(GET, "/")));
  }

Yes, thanks for the hint. I have a change prepared to completly remove this timeout. We can check every minute and if it has not been used for some time, if not check the session is alive or not. If the session is alive let the client alive, if not close it.

But lets see if this issue is resolved with the increased timeout first.

Thank you for the update. Due to the traffic we run on the grid I’ll try capture FINE level logs during the weekend and attach them here.