legion: Non-deterministic hangs while running HTR
I am running a 45-node simulation on Lassen using HTR and the 11/24 Legion commit f4f80752 of control_replication. The runtime hangs non-deterministically, usually after ~10-100 time steps. Attached are two backtraces taken during one of these hangs, taken several minutes apart. Legion is compiled with CXXFLAGS='-g' and the GasNet version is 2021.9.0. HTR is run with DEBUG=1.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 65 (33 by maintainers)
@elliottslaughter can you add this issue to #1032? I do not know if this is Realm or Legion related but it is definitely top priority,