fedimint: `UnitSaver::flush` is using wrong `block_on`?

While investigating something entirely different I’ve noticed that UnitSaver::flush is using futures::executor::block_on which I don’t think is entirely right thing to do when running in tokio Runtime (and thus everything hanged when tokio::task::yield_now was added to the mix). Unfortunately these kinds of issues require at least 200 IQ points to solve so I could use some help. 😄

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 26 (26 by maintainers)

Most upvoted comments

Just published fedimint-aleph-bft, last step is to switch fedimint over to it.

Oh, so did I read it right that the old alephbft implementation was calling a blocking io trait without offloading it from the executor?

yes, too many function colors. makes my head spin async(aleph-bft) => sync(UnitSaver::flush) => async(db transaction) => sync (rocksdb)

yep https://github.com/fedimint/fedimint/commit/28b036126d361325bd9c167b96b3ad9e7cae2a1c fixes CI for https://github.com/fedimint/fedimint/pull/3989

This might be random, and generally the whole problem is only exposed by #3995, so I would hope/expect the proper fix to #3995 (at least stop it from hanging alephbft tasks as I described). If it isn’t … something might still be wrong.

I need to find the caller alephbft code and check how exactly is it calling blocking Write::flush operations. I suspect if offloaded these to blocking thread, possibly via own block_in_place call. So we’re now in a situation where alepbft offloaded already ,we are running on blocking threads and trying to async again … which might required something special to work. Or just causes tokio bug.