trino: Adding column in SQL Server connector sometimes fails even though column has been added successfully

On SQLServer:

Error:  io.trino.plugin.sqlserver.TestSqlServerConnectorTest.testAddColumnConcurrently  Time elapsed: 1.245 s  <<< FAILURE!
java.lang.AssertionError: Unexpected concurrent add column failure
	at io.trino.testing.BaseConnectorTest.verifyConcurrentAddColumnFailurePermissible(BaseConnectorTest.java:2912)
	at io.trino.testing.BaseConnectorTest.lambda$testAddColumnConcurrently$25(BaseConnectorTest.java:2877)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	Suppressed: java.lang.RuntimeException: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
		at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:122)
		at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:86)
		at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:514)
		at io.trino.testing.BaseConnectorTest.lambda$testAddColumnConcurrently$25(BaseConnectorTest.java:2871)
		... 4 more
	Caused by: io.trino.spi.TrinoException: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
		at io.trino.plugin.jdbc.BaseJdbcClient.getColumns(BaseJdbcClient.java:283)
		at io.trino.plugin.jdbc.ForwardingJdbcClient.getColumns(ForwardingJdbcClient.java:90)
		at io.trino.plugin.jdbc.jmx.StatisticsAwareJdbcClient.lambda$getColumns$4(StatisticsAwareJdbcClient.java:110)
		at io.trino.plugin.jdbc.jmx.JdbcApiStats.wrap(JdbcApiStats.java:35)
		at io.trino.plugin.jdbc.jmx.StatisticsAwareJdbcClient.getColumns(StatisticsAwareJdbcClient.java:110)
		at io.trino.plugin.jdbc.CachingJdbcClient.lambda$getColumns$3(CachingJdbcClient.java:161)
		at io.trino.collect.cache.EmptyCache.get(EmptyCache.java:66)
		at io.trino.plugin.jdbc.CachingJdbcClient.get(CachingJdbcClient.java:700)
		at io.trino.plugin.jdbc.CachingJdbcClient.getColumns(CachingJdbcClient.java:161)
		at io.trino.plugin.jdbc.CachingJdbcClient.lambda$getColumns$3(CachingJdbcClient.java:161)
		at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4868)
		at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
		at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282)
		at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
		at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
		at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
		at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
		at io.trino.collect.cache.EvictableCache.get(EvictableCache.java:108)
		at io.trino.plugin.jdbc.CachingJdbcClient.get(CachingJdbcClient.java:700)
		at io.trino.plugin.jdbc.CachingJdbcClient.getColumns(CachingJdbcClient.java:161)
		at io.trino.plugin.jdbc.DefaultJdbcMetadata.getColumnHandles(DefaultJdbcMetadata.java:625)
		at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getColumnHandles(ClassLoaderSafeConnectorMetadata.java:288)
		at io.trino.metadata.MetadataManager.getColumnHandles(MetadataManager.java:485)
		at io.trino.execution.AddColumnTask.execute(AddColumnTask.java:97)
		at io.trino.execution.AddColumnTask.execute(AddColumnTask.java:55)
		at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:145)
		at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:243)
		at io.trino.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
		at io.trino.$gen.Trino_testversion____20220524_132750_5.run(Unknown Source)
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
		at java.base/java.lang.Thread.run(Thread.java:829)
	Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
		at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:265)
		at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1676)
		at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:615)
		at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:537)
		at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7785)
		at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3793)
		at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:268)
		at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:242)
		at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:456)
		at com.microsoft.sqlserver.jdbc.SQLServerDatabaseMetaData.getColumns(SQLServerDatabaseMetaData.java:665)
		at io.trino.plugin.jdbc.BaseJdbcClient.getColumns(BaseJdbcClient.java:301)
		at io.trino.plugin.jdbc.BaseJdbcClient.getColumns(BaseJdbcClient.java:235)
		... 31 more
Caused by: [CIRCULAR REFERENCE: io.trino.spi.TrinoException: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.]

https://github.com/trinodb/trino/runs/6572998375?check_suite_focus=true

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 30 (30 by maintainers)

Commits related to this issue

Most upvoted comments

Just note that this may or may not fix the issue.

This issue was created on May 25, 2022. It appears to have been fixed with #12695 (Jun 6, 2022), but the issue is not closed.

#15225 was merged into master a two weeks ago. The bug reports for this issue have also started appearing frequently in the last two weeks. This also confirms it.

@guiyanakuang Big thanks, now it’s clear, I missed this 3rd call (read after alter), I thought that it’s failing on first read and was wonder how column still could be added.

Yes, there are three places where a deadlock exception may be thrown.

  1. https://github.com/trinodb/trino/blob/4d83020f0cdb20d25ad5951794ea17e2079975ca/core/trino-main/src/main/java/io/trino/execution/AddColumnTask.java#L96
  2. https://github.com/trinodb/trino/blob/4d83020f0cdb20d25ad5951794ea17e2079975ca/core/trino-main/src/main/java/io/trino/metadata/MetadataManager.java#L744
  3. https://github.com/trinodb/trino/blob/4d83020f0cdb20d25ad5951794ea17e2079975ca/core/trino-main/src/main/java/io/trino/metadata/MetadataManager.java#L748
No. Order of call Jdbc api Does the test fail when a deadlock exception is thrown (Y/N)
1 1 read No. No alter has been executed. This throws a deadlock, which is caught and returned Option.empty(). As expected.
2 2 alter No. Alter execution throws an exception and the test case returns Option.empty(). No fields have been added. As expected.
3 3 read Yes. Alter has been executed,. After this an exception is thrown and the test case returns Option.empty(). The field has actually been added. Not as expected.

And this column is still added, so cannot understand how it could be. Maybe some retries on engine level.

@vlad-lyutenko. So the test will actually only fail if the code deadlocks in the third place. Because in the second place we have successfully executed alter. Sqlserver does not retry.

Some flow execution from logs https://github.com/trinodb/trino/actions/runs/4091562653/jobs/7055747852:

  1. Create table and submit tasks:
2023-02-04T06:44:57.610-0600	INFO	TestNGInvoker-testAddColumnConcurrently()	io.trino.testing.BaseConnectorTest	Starting add columns for table test_add_columnks9wqhln9v
2023-02-04T06:44:57.617-0600	INFO	TestNGInvoker-testAddColumnConcurrently()	io.trino.testing.BaseConnectorTest	All alter tasks submitted, futures size: 5 for table test_add_columnks9wqhln9v


2023-02-04T06:44:57.618-0600	INFO	pool-266-thread-1	io.trino.testing.BaseConnectorTest	About to add column: col0 to table test_add_columnks9wqhln9v
2023-02-04T06:44:57.623-0600	INFO	pool-266-thread-2	io.trino.testing.BaseConnectorTest	About to add column: col1 to table test_add_columnks9wqhln9v
2023-02-04T06:44:57.627-0600	INFO	pool-266-thread-3	io.trino.testing.BaseConnectorTest	About to add column: col2 to table test_add_columnks9wqhln9v
2023-02-04T06:44:57.633-0600	INFO	pool-266-thread-5	io.trino.testing.BaseConnectorTest	About to add column: col4 to table test_add_columnks9wqhln9v
2023-02-04T06:44:57.633-0600	INFO	pool-266-thread-4	io.trino.testing.BaseConnectorTest	About to add column: col3 to table test_add_columnks9wqhln9v
  1. Execution for col0 - successfull
2023-02-04T06:44:57.883-0600	INFO	dispatcher-query-10	io.trino.event.QueryMonitor	TIMELINE: Query 20230204_124457_00426_279z4 :: FINISHED :: elapsed 261ms :: planning 0ms :: waiting 0ms :: scheduling 261ms :: running 0ms :: finishing 261ms :: begin 2023-02-04T06:44:57.621-06:00 :: end 2023-02-04T06:44:57.882-06:00
2023-02-04T06:44:57.883-0600	INFO	pool-266-thread-1	io.trino.testing.BaseConnectorTest	Column col0 added for table test_add_columnks9wqhln9v
  1. Execution for col2 and col4 fails inside getColumns:
2023-02-04T06:44:59.239-0600	INFO	dispatcher-query-0	io.trino.event.QueryMonitor	TIMELINE: Query 20230204_124457_00429_279z4 :: FAILED (JDBC_ERROR) :: elapsed 1597ms :: planning 1597ms :: waiting 0ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2023-02-04T06:44:57.641-06:00 :: end 2023-02-04T06:44:59.238-06:00
2023-02-04T06:44:59.240-0600	ERROR	pool-266-thread-3	io.trino.testing.BaseConnectorTest	Original exception during submitting alter column col2 for table test_add_columnks9wqhln9v:
Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
	at io.trino.plugin.jdbc.BaseJdbcClient.getColumns(BaseJdbcClient.java:347)
	... 32 more
2023-02-04T06:44:59.240-0600	WARN	pool-266-thread-3	io.trino.testing.BaseConnectorTest	Return fail value for column :col2 table test_add_columnks9wqhln9v


2023-02-04T06:44:59.241-0600	ERROR	pool-266-thread-5	io.trino.testing.BaseConnectorTest	Original exception during submitting alter column col4 for table test_add_columnks9wqhln9v:
Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
	at io.trino.plugin.jdbc.BaseJdbcClient.getColumns(BaseJdbcClient.java:347)
	... 32 more
2023-02-04T06:44:59.241-0600	WARN	pool-266-thread-5	io.trino.testing.BaseConnectorTest	Return fail value for column :col4 table test_add_columnks9wqhln9v
  1. Now attempt to insert data in col2, which failed before to be added:
2023-02-04T06:44:59.261-0600	WARN	TestNGInvoker-testAddColumnConcurrently()	io.trino.testing.BaseConnectorTest	About to insert data for failed column :col2 table test_add_columnks9wqhln9v
2023-02-04T06:44:59.261-0600	INFO	dispatcher-query-0	io.trino.event.QueryMonitor	TIMELINE: Query 20230204_124457_00430_279z4 :: FINISHED :: elapsed 1618ms :: planning 0ms :: waiting 0ms :: scheduling 1618ms :: running 0ms :: finishing 1618ms :: begin 2023-02-04T06:44:57.642-06:00 :: end 2023-02-04T06:44:59.260-06:00
2023-02-04T06:44:59.339-0600	INFO	dispatcher-query-12	io.trino.event.QueryMonitor	TIMELINE: Query 20230204_124459_00431_279z4 :: FAILED (COLUMN_NOT_FOUND) :: elapsed 73ms :: planning 73ms :: waiting 0ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2023-02-04T06:44:59.265-06:00 :: end 2023-02-04T06:44:59.338-06:00
2023-02-04T06:44:59.340-0600	ERROR	TestNGInvoker-testAddColumnConcurrently()	io.trino.testing.BaseConnectorTest	Exception during insert into column col2 for table test_add_columnks9wqhln9v:
io.trino.testing.QueryFailedException: Insert column name does not exist in target table: col2
  1. Now attempt to insert data in col4, which failed before to be added, but successfully added data:
2023-02-04T06:44:59.340-0600	WARN	TestNGInvoker-testAddColumnConcurrently()	io.trino.testing.BaseConnectorTest	About to insert data for failed column :col4 table test_add_columnks9wqhln9v
2023-02-04T06:44:59.846-0600	INFO	TestNGInvoker-testAddColumnConcurrently()	io.trino.testing.BaseConnectorTest	!!!!!!!!!!!! Data inserted for failed column col4 for table test_add_columnks9wqhln9v

Can we mark the test @Flaky or disable it while the investigation is ongoing? cc @kokosing

@vlad-lyutenko is working on this. Thank you Vlad!