legion: equivalence sets: `Assertion 'active_once' failed.`
I have a program that looks like this:
auto aPart, bPart, cPart = /* create some partitions of A, B, C */;
assert(is_complete, is_disjoint for all aPart, bPart, cPart);
auto aPart2, bPart2, cPart2 = /* create different partitions of A, B, C */;
for (...) {
index launch t1 (aPart, bPart, cPart) -- write_only privileges
index launch t2 (aPart, bPart, cPart) -- read_only privileges
index launch t3 (aPart2 /* read_write */, bPart2 /* read_only */, cPart2 /* read_only */)
}
At some iterations on 1 node, I see a warning for a subtask of t3 that it’s using uninitialized data for a subregion of aPart2, which seems impossible since I write over aPart , a complete and disjoint partition.
In debug mode, I see this error:
cannonMM-cuda: /home/rohany/taco-ctrl-rep-bug/legion/legion/runtime/legion/legion_analysis.cc:9905: virtual void Legion::Internal::EquivalenceSet::notify_active(Legion::Internal::ReferenceMutator*): Assertion `active_once' failed.
Signal 6 received by node 0, process 1104584 (thread 7f8d14077000) - obtaining backtrace
Signal 6 received by process 1104584 (thread 7f8d14077000) at: stack trace: 20 frames
[0] = /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f8d279b13c0]
[1] = /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7f8d25c3f18b]
[2] = /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7f8d25c1e859]
[3] = /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7f8d25c1e729]
[4] = /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7f8d25c2ff36]
[5] = bin/cannonMM-cuda(Legion::Internal::EquivalenceSet::notify_active(Legion::Internal::ReferenceMutator*)+0x42) [0x5558cec17690]
[6] = bin/cannonMM-cuda(Legion::Internal::DistributedCollectable::add_valid_reference(Legion::Internal::ReferenceMutator*)+0xaa) [0x5558cebe28ec]
[7] = bin/cannonMM-cuda(Legion::Internal::DistributedCollectable::add_base_valid_ref(Legion::Internal::ReferenceSource, Legion::Internal::ReferenceMutator*, int)+0x91) [0x5558ce38209f]
[8] = bin/cannonMM-cuda(Legion::Internal::VersionManager::initialize_nonexclusive_virtual_analysis(AVXTLBitMask<512u> const&, Legion::Internal::FieldMaskSet<Legion::Internal::EquivalenceSet> const&, std::set<Legion::Internal::RtEvent, std::less<Legion::Internal::RtEvent>, std::allocator<Legion::Internal::RtEvent> >&)+0x181) [0x5558cec4bbb7]
[9] = bin/cannonMM-cuda(Legion::Internal::RegionNode::initialize_nonexclusive_virtual_analysis(unsigned int, AVXTLBitMask<512u> const&, Legion::Internal::FieldMaskSet<Legion::Internal::EquivalenceSet> const&, std::set<Legion::Internal::RtEvent, std::less<Legion::Internal::RtEvent>, std::allocator<Legion::Internal::RtEvent> >&)+0x4c) [0x5558ce60fbc4]
[10] = bin/cannonMM-cuda(Legion::Internal::InnerContext::initialize_region_tree_contexts(std::vector<Legion::RegionRequirement, std::allocator<Legion::RegionRequirement> > const&, std::vector<Legion::Internal::VersionInfo, Legion::Internal::AlignedAllocator<Legion::Internal::VersionInfo> > const&, std::vector<Legion::Internal::EquivalenceSet*, std::allocator<Legion::Internal::EquivalenceSet*> > const&, std::vector<Legion::Internal::ApUserEvent, std::allocator<Legion::Internal::ApUserEvent> > const&, std::set<Legion::Internal::RtEvent, std::less<Legion::Internal::RtEvent>, std::allocator<Legion::Internal::RtEvent> >&, std::set<Legion::Internal::RtEvent, std::less<Legion::Internal::RtEvent>, std::allocator<Legion::Internal::RtEvent> >&)+0x499) [0x5558cecf084f]
[11] = bin/cannonMM-cuda(Legion::Internal::SingleTask::launch_task(bool)+0x118f) [0x5558ce489565]
[12] = bin/cannonMM-cuda(Legion::Internal::Runtime::legion_runtime_task(void const*, unsigned long, void const*, unsigned long, Realm::Processor)+0xc94) [0x5558ce708f18]
[13] = bin/cannonMM-cuda(+0x2210588) [0x5558cf1b2588]
[14] = bin/cannonMM-cuda(+0x22772b7) [0x5558cf2192b7]
[15] = bin/cannonMM-cuda(+0x227c43a) [0x5558cf21e43a]
[16] = bin/cannonMM-cuda(+0x227a3cb) [0x5558cf21c3cb]
[17] = bin/cannonMM-cuda(+0x2282164) [0x5558cf224164]
[18] = bin/cannonMM-cuda(+0x2290440) [0x5558cf232440]
[19] = /lib/x86_64-linux-gnu/libc.so.6(+0x5e660) [0x7f8d25c57660]
A reproducer is available on sapling at /home/rohany/taco-ctrl-rep-bug/build. Run bin/cannonMM-cuda -n 20000 -gx 1 -gy 1 -dm:exact_region -tm:untrack_valid_regions -ll:ocpu 1 -ll:othr 10 -ll:csize 1500 -ll:util 4 -dm:replicate 1 -ll:gpu 4 -ll:fsize 15000 -ll:bgwork 12 -ll:bgnumapin 1.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (21 by maintainers)
Build with
make VERBOSE=1and check the actual build commands.CMake only picks up certain variables on the first configuration, so make sure it’s a clean build too.