azure-functions-durable-extension: Update Storage SDK's to fix deadlock

Per @ConnorMcMahon , there is a deadlock issue with the Azure Storage SDK’s. We found that these deadlocks would lead to an application getting stuck to the point where it would fail to process Durable Functions triggers. Because of this, whenever any storage operation takes longer than 2 minutes, we shut down the worker with the following exception:

The operation * did not complete in '00:02:00'. Terminating the process to mitigate potential deadlock.

The only real way around this would be to update Durable Functions to use a newer version of the Storage SDK. This is a rather large undertaking, and we have no timeframe to when that would happen.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 5
  • Comments: 22 (2 by maintainers)

Most upvoted comments

I’ve moved on to a new company and project and don’t recall enough of the details. But the iterations would have been in a tight loop calculating among other things a hash that drove up cpu use so a short time interval I’d say.

This issue is still on the team’s backlog. The primary issue that tracks the storage SDK upgrade is here: https://github.com/Azure/durabletask/issues/516.

I have been investigating the storage SDK deadlocks and found that one specific circumstance that seems to trigger them is running a tight spin loop on a thread after an await (azure/azure-sdk-for-net#16825). Perhaps the same happens for running other CPU-intensive work.

My current workaround is to insert Thread.Sleep(10) into the spin loop; that seems to prevent the deadlock for whatever reason. Perhaps putting one of those before your CPU-intensive function could help? This is just a guess of course, as of now I have no idea why the deadlock actually happens.