OpenSearch: [CI] o.o.cluster.remote.test.RemoteClustersIT.testHAProxyModeConnectionWorks multiple failures

Multiple PR test failures (most recent for the following:

./gradlew ':qa:remote-clusters:integTest' --tests "org.opensearch.cluster.remote.test.RemoteClustersIT.testHAProxyModeConnectionWorks" -Dtests.seed=403F055E1F14E391 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-TN -Dtests.timezone=Africa/Conakry -Druntime.java=17
2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([403F055E1F14E391:4749D095249295CC]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.opensearch.cluster.remote.test.RemoteClustersIT.testHAProxyModeConnectionWorks(RemoteClustersIT.java:125)
  1> [2021-12-10T20:14:09,766][INFO ][o.o.c.r.t.RemoteClustersIT] [testProxyModeConnectionWorks] before test
  1> [2021-12-10T20:14:10,385][INFO ][o.o.c.r.t.RemoteClustersIT] [testProxyModeConnectionWorks] Configuring remote cluster [opensearch-2:9300]
  1> [2021-12-10T20:14:10,487][INFO ][o.o.c.r.t.RemoteClustersIT] [testProxyModeConnectionWorks] Connection info: org.opensearch.client.cluster.RemoteConnectionInfo@688b190
  1> [2021-12-10T20:14:10,693][INFO ][o.o.c.r.t.RemoteClustersIT] [testProxyModeConnectionWorks] after test
  1> [2021-12-10T20:14:10,737][INFO ][o.o.c.r.t.RemoteClustersIT] [testSniffModeConnectionFails] before test
  1> [2021-12-10T20:14:11,294][INFO ][o.o.c.r.t.RemoteClustersIT] [testSniffModeConnectionFails] Configuring remote cluster [opensearch-2:9300]
  1> [2021-12-10T20:14:11,363][INFO ][o.o.c.r.t.RemoteClustersIT] [testSniffModeConnectionFails] Connection info: org.opensearch.client.cluster.RemoteConnectionInfo@5aef8603
  1> [2021-12-10T20:14:11,500][INFO ][o.o.c.r.t.RemoteClustersIT] [testSniffModeConnectionFails] after test
  2> NOTE: leaving temporary files on disk at: /var/CITOOL/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/search/qa/remote-clusters/build/testrun/integTest/temp/org.opensearch.cluster.remote.test.RemoteClustersIT_403F055E1F14E391-001
  2> NOTE: test params are: codec=Lucene87, sim=Asserting(RandomSimilarity(queryNorm=false): {}), locale=ar-TN, timezone=Africa/Conakry
  2> NOTE: Linux 5.4.0-1045-aws amd64/Eclipse Adoptium 17.0.1 (64-bit)/cpus=72,threads=1,free=451487832,total=536870912
  2> NOTE: All tests run in this JVM: [RemoteClustersIT]

Note:

        RemoteConnectionInfo rci = cluster1Client().cluster().remoteInfo(new RemoteInfoRequest(), RequestOptions.DEFAULT).getInfos().get(0);
        logger.info("Connection info: {}", rci);
        assertTrue(rci.isConnected());

RemoteConnectionInfo logging is useless. (todo: add toString support for logging)

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

The failure occurs due to the number of connected sockets being 0. However, the RemoteConnectionInfo containing this data is obtained through a transport request, so the cause of the sockets not connecting is unknown. Since the error is non-reproducible, #5667 adds additional logging when there are no connected sockets, printing out the cluster health at the time of failure. If the flaky test failure occurs again, there will be more logged information that can hopefully lead to a solution.