legion: Hang at start up on multinode runs

If HTR is compiled on the latest version of master, it hangs at startup when executed on multiple nodes. I think that the top-level task does not even start its execution. The backtraces obtained on a two-node execution are contained in the attached files bt_0.log bt_1.log

The backtraces are produced on sapling but the problem reproduces on every system that I have tried so far.

@elliottslaughter, can you please add this issue to #1032?

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 23 (6 by maintainers)

Most upvoted comments

cudaipc-fix fixes the issue. Thanks for working on it