trino: Flaky test `BaseFailureRecoveryTest.testParallel`: "There should be no remaining tmp_trino tables that are queryable"

https://github.com/trinodb/trino/actions/runs/4279764588/jobs/7451040205

Error:    TestBigQueryQueryFailureRecoveryTest>BaseFailureRecoveryTest.testParallel:218->BaseFailureRecoveryTest.testInsert:235->BaseFailureRecoveryTest.testTableModification:333->BaseFailureRecoveryTest.testTableModification:338->BaseFailureRecoveryTest.testNonSelect:367->BaseFailureRecoveryTest.lambda$testNonSelect$10:367 
Expecting throwable message:
  <"[There should be no remaining tmp_trino_ab7ec455_* tables. They are: [tmp_trino_ab7ec455_517f28a7]] 
Expecting:
 <1>
to be equal to:
 <0>
but was not.">
to contain:
  <"This error is injected by the failure injection service">
but did not.

Throwable that failed the check:

java.lang.AssertionError: [There should be no remaining tmp_trino_ab7ec455_* tables. They are: [tmp_trino_ab7ec455_517f28a7]] 
Expecting:
 <1>
to be equal to:
 <0>
but was not.
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.execute(BaseFailureRecoveryTest.java:531)
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.executeActual(BaseFailureRecoveryTest.java:496)
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.executeActualNoRetries(BaseFailureRecoveryTest.java:481)
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.lambda$failsWithoutRetries$12(BaseFailureRecoveryTest.java:673)
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.failsWithoutRetries(BaseFailureRecoveryTest.java:673)
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.failsAlways(BaseFailureRecoveryTest.java:664)
	at io.trino.testing.BaseFailureRecoveryTest.testNonSelect(BaseFailureRecoveryTest.java:367)
	at io.trino.testing.BaseFailureRecoveryTest.testTableModification(BaseFailureRecoveryTest.java:338)
	at io.trino.testing.BaseFailureRecoveryTest.testTableModification(BaseFailureRecoveryTest.java:333)
	at io.trino.testing.BaseFailureRecoveryTest.testInsert(BaseFailureRecoveryTest.java:235)
	at io.trino.testing.BaseFailureRecoveryTest$ParallelTestRunnable.run(BaseFailureRecoveryTest.java:850)
	at io.trino.testing.BaseFailureRecoveryTest.testParallel(BaseFailureRecoveryTest.java:218)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
	at org.testng.internal.TestMethodWithDataProviderMethodWorker.call(TestMethodWithDataProviderMethodWorker.java:75)
	at org.testng.internal.TestMethodWithDataProviderMethodWorker.call(TestMethodWithDataProviderMethodWorker.java:14)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 42 (42 by maintainers)

Commits related to this issue

Most upvoted comments

Still encountering this after latest fix https://github.com/trinodb/trino/actions/runs/5004142646/jobs/8966655659?pr=17212

io.trino.plugin.oracle.TestOracleQueryFailureRecoveryTest.testParallel[testCreateTable](9)  Time elapsed: 54.428 s  <<< FAILURE!
org.junit.ComparisonFailure: 
[There should be no remaining tmp_trino tables that are queryable. They are:
	For queryId [20230517_144752_00198_biqit] (prefix [tmp_trino_9d29747f_]) remaining tables: [tmp_trino_9d29747f_06f5762c]
		With errors: [
			Expecting code to raise a throwable.]] 
Expecting value to be true but was false expected:<[tru]e> but was:<[fals]e>
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at io.trino.testing.BaseFailureRecoveryTest.checkTemporaryTables(BaseFailureRecoveryTest.java:463)
	at io.trino.testing.BaseFailureRecoveryTest$FailureRecoveryAssert.cleansUpTemporaryTables(BaseFailureRecoveryTest.java:624)
	at io.trino.testing.BaseFailureRecoveryTest.testNonSelect(BaseFailureRecoveryTest.java:409)
	at io.trino.testing.BaseFailureRecoveryTest.testTableModification(BaseFailureRecoveryTest.java:347)
	at io.trino.testing.BaseFailureRecoveryTest.testTableModification(BaseFailureRecoveryTest.java:342)
	at io.trino.testing.BaseFailureRecoveryTest.testCreateTable(BaseFailureRecoveryTest.java:236)
	at io.trino.testing.BaseFailureRecoveryTest$ParallelTestRunnable.run(BaseFailureRecoveryTest.java:910)
	at io.trino.testing.BaseFailureRecoveryTest.testParallel(BaseFailureRecoveryTest.java:227)

in the meantime, would it make sense to mark TestOracleQueryFailureRecoveryTest.testParallel as @Flaky @losipiuk @mwd410 ?

Good idea. @mwd410 please send the PR

in the meantime, would it make sense to mark TestOracleQueryFailureRecoveryTest.testParallel as @Flaky @losipiuk @mwd410 ?

Let me reopen because the test is still failing on master.

Error:  Failures: 
Error:    TestOracleQueryFailureRecoveryTest>BaseFailureRecoveryTest.testParallel:227->BaseFailureRecoveryTest.testExplainAnalyze:320->BaseFailureRecoveryTest.testTableModification:342->BaseFailureRecoveryTest.testTableModification:347->BaseFailureRecoveryTest.testNonSelect:409->BaseFailureRecoveryTest.checkTemporaryTables:463 [There should be no remaining tmp_trino tables that are queryable. They are:
	For queryId [20230607_221201_00339_7dv8e] (prefix [tmp_trino_68459c86_]) remaining tables: [tmp_trino_68459c86_f31eaf34]
		With errors: [
			Expecting code to raise a throwable.]] 
Expecting value to be true but was false expected:<[tru]e> but was:<[fals]e>

https://github.com/trinodb/trino/actions/runs/5203676867/jobs/9388677568

There was hope it would get fixed with: https://github.com/trinodb/trino/pull/17575. cc: @mwd410

Apparently did not help. We are still getting

io.trino.spi.TrinoException: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

	at io.trino.plugin.oracle.OracleClient.dropTable(OracleClient.java:321)
	at io.trino.plugin.jdbc.BaseJdbcClient.rollbackCreateTable(BaseJdbcClient.java:1050)
	at io.trino.plugin.jdbc.ForwardingJdbcClient.rollbackCreateTable(ForwardingJdbcClient.java:249)
	at io.trino.plugin.jdbc.jmx.StatisticsAwareJdbcClient.lambda$rollbackCreateTable$36(StatisticsAwareJdbcClient.java:330)
	at io.trino.plugin.jdbc.jmx.JdbcApiStats.wrap(JdbcApiStats.java:47)
	at io.trino.plugin.jdbc.jmx.StatisticsAwareJdbcClient.rollbackCreateTable(StatisticsAwareJdbcClient.java:330)
	at io.trino.plugin.jdbc.CachingJdbcClient.rollbackCreateTable(CachingJdbcClient.java:392)
	at io.trino.plugin.jdbc.CachingJdbcClient.rollbackCreateTable(CachingJdbcClient.java:392)
	at io.trino.plugin.jdbc.DefaultJdbcMetadata.lambda$beginInsert$20(DefaultJdbcMetadata.java:853)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at io.trino.plugin.jdbc.DefaultJdbcMetadata.rollback(DefaultJdbcMetadata.java:841)
	at io.trino.plugin.jdbc.JdbcTransactionManager.rollback(JdbcTransactionManager.java:64)
	at io.trino.plugin.jdbc.JdbcConnector.rollback(JdbcConnector.java:109)
	at io.trino.metadata.CatalogTransaction.abort(CatalogTransaction.java:93)
	at io.trino.metadata.CatalogMetadata.safeAbort(CatalogMetadata.java:167)
	at io.trino.metadata.CatalogMetadata.abort(CatalogMetadata.java:161)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
	at io.trino.$gen.Trino_testversion____20230601_084025_3091.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLException: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

	at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:629)
	at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:563)
	at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1230)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:771)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:298)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:511)
	at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:122)
	at oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1199)
	at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1819)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1471)
	at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:2504)
	at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:2459)
	at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:327)
	at oracle.ucp.jdbc.proxy.oracle$1ucp$1jdbc$1proxy$1oracle$1StatementProxy$2oracle$1jdbc$1internal$1OracleStatement$$$Proxy.execute(Unknown Source)
	at io.trino.plugin.jdbc.BaseJdbcClient.execute(BaseJdbcClient.java:1211)
	at io.trino.plugin.oracle.OracleClient.dropTable(OracleClient.java:318)
	... 24 more
	Suppressed: java.lang.RuntimeException: Query: DROP TABLE "TRINO_TEST"."TMP_TRINO_EC95E300_4B157D1C" PURGE
		at io.trino.plugin.jdbc.BaseJdbcClient.execute(BaseJdbcClient.java:1214)
		... 25 more
Caused by: Error : 54, Position : 24, Sql = DROP TABLE "TRINO_TEST"."TMP_TRINO_EC95E300_4B157D1C" PURGE, OriginalSql = DROP TABLE "TRINO_TEST"."TMP_TRINO_EC95E300_4B157D1C" PURGE, Error Msg = ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

	at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:636)
	... 39 more

From a cursory glance, this appears to be due to Oracle’s “Recycling Bin” mechanism. I have opened a pr to address this issue, where I also give further explanation.

@findepi @ebyhr

https://github.com/trinodb/trino/actions/runs/4553222890/jobs/8033462647?pr=16784

[INFO] 
Error:  Failures: 
Error:    TestBigQueryTaskFailureRecoveryTest>BaseFailureRecoveryTest.testParallel:228->BaseFailureRecoveryTest.testExplainAnalyze:321->BaseFailureRecoveryTest.testTableModification:343->BaseFailureRecoveryTest.testTableModification:348->BaseFailureRecoveryTest.testNonSelect:389 » QueryFailed
Error:    TestBigQueryTaskFailureRecoveryTest>BaseFailureRecoveryTest.testParallel:228->BaseFailureRecoveryTest.testInsert:245->BaseFailureRecoveryTest.testTableModification:343->BaseFailureRecoveryTest.testTableModification:348->BaseFailureRecoveryTest.testNonSelect:398 » QueryFailed
[INFO] 

This is unrelated to this BaseFailureRecoveryTest.testParallel. Look like infra issue talking to big query. Logged https://github.com/trinodb/trino/issues/16803. cc @ebyhr

This will be a problem with any system which doesn’t have a strongly consistent information_schema like MySQL, SingleStore and maybe others.

I think that since it is remote database specific, we need to handle this by some abstraction that will be implemented per each remote database (connector). And in some cases where information_schema is very unreliable we need to accept that we could consider skipping that assertions. Hopefully assertEventually in few of them would be good enough solution.

Yeah - I agree. Maybe we can just expose protected assertNoTmpTablesLeftBehind() with default implementation and override where neeeded.