legion: legion: assertion failure in debug mode
When running on a single node with debug mode, I see this assertion failure:
mttkrp-cuda: /home/rohany/taco-ctrl-rep-bug/legion/legion/runtime/legion/runtime.cc:2548: Legion::Internal::FutureInstance::FutureInstance(const void*, size_t, Legion::Memory, Legion::Internal::ApEvent, Legion::Internal::Runtime*, bool, bool, bool, Legion::Internal::PhysicalInstance, void (*)(void*, size_t), Legion::Processor, Legion::Internal::RtEvent): Assertion `size > 0' failed.
Signal 6 received by node 0, process 3943112 (thread 7fd73403a000) - obtaining backtrace
Signal 6 received by process 3943112 (thread 7fd73403a000) at: stack trace: 21 frames
[0] = /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fd747cd63c0]
[1] = /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7fd745f6418b]
[2] = /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7fd745f43859]
[3] = /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7fd745f43729]
[4] = /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7fd745f54f36]
[5] = bin/mttkrp-cuda(Legion::Internal::FutureInstance::FutureInstance(void const*, unsigned long, Realm::Memory, Legion::Internal::ApEvent, Legion::Internal::Runtime*, bool, bool, bool, Realm::RegionInstance, void (*)(void*, unsigned long), Realm::Processor, Legion::Internal::RtEvent)+0x148) [0x56168c98ca90]
[6] = bin/mttkrp-cuda(Legion::Internal::MemoryManager::create_future_instance(Legion::Internal::Operation*, unsigned long long, Legion::Internal::ApEvent, unsigned long, bool)+0x5a7) [0x56168c9b0b43]
[7] = bin/mttkrp-cuda(Legion::Internal::FutureImpl::request_application_instance(Realm::Memory, Legion::Internal::SingleTask*, unsigned long long, unsigned int, Legion::Internal::ApUserEvent, unsigned long)+0x579) [0x56168c98677f]
[8] = bin/mttkrp-cuda(Legion::Internal::SingleTask::finalize_map_task_output(Legion::Mapping::Mapper::MapTaskInput&, Legion::Mapping::Mapper::MapTaskOutput&, Legion::Internal::MustEpochOp*, std::vector<Legion::Internal::InstanceSet, std::allocator<Legion::Internal::InstanceSet> >&)+0x949) [0x56168c77e979]
[9] = bin/mttkrp-cuda(Legion::Internal::SingleTask::invoke_mapper(Legion::Internal::MustEpochOp*)+0x184) [0x56168c7831e8]
[10] = bin/mttkrp-cuda(Legion::Internal::SingleTask::map_all_regions(Legion::Internal::MustEpochOp*, Legion::Internal::TaskOp::DeferMappingArgs const*)+0x367) [0x56168c784d5f]
[11] = bin/mttkrp-cuda(Legion::Internal::IndividualTask::perform_mapping(Legion::Internal::MustEpochOp*, Legion::Internal::TaskOp::DeferMappingArgs const*)+0x3f) [0x56168c790431]
[12] = bin/mttkrp-cuda(Legion::Internal::SingleTask::trigger_mapping()+0x2d5) [0x56168c77d2ff]
[13] = bin/mttkrp-cuda(Legion::Internal::Runtime::legion_runtime_task(void const*, unsigned long, void const*, unsigned long, Realm::Processor)+0x45a) [0x56168ca020e0]
[14] = bin/mttkrp-cuda(+0x217ce4a) [0x56168d4a7e4a]
[15] = bin/mttkrp-cuda(+0x21e3ae1) [0x56168d50eae1]
[16] = bin/mttkrp-cuda(+0x21e8c64) [0x56168d513c64]
[17] = bin/mttkrp-cuda(+0x21e6bf5) [0x56168d511bf5]
[18] = bin/mttkrp-cuda(+0x21ee98e) [0x56168d51998e]
[19] = bin/mttkrp-cuda(+0x21fcc6a) [0x56168d527c6a]
[20] = /lib/x86_64-linux-gnu/libc.so.6(+0x5e660) [0x7fd745f7c660]
To reproduce the error, go to /home/rohany/taco-ctrl-rep-bug/build/ and run ./runner.sh. It the binary is compiled against a debug build of legion.
In my code, there is a loop like this:
Future f;
for (...) {
TaskLauncher launcher(...);
if (f.valid()) {
launcher.add_future(f);
}
f = runtime->execute_task(ctx, launcher);
}
when I remove this tracking of the future (i.e. the loop just launches the tasks), the code succeeds. However, I don’t think that this is incorrect, as I’ve done the same in a testing application with no errors, so I think that there is some interaction with this and other parts of the system that I don’t understand.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (25 by maintainers)
Commits related to this issue
- legion: fix for #1123 — committed to StanfordLegion/legion by lightsighter 3 years ago
I started work on the fix tonight. It’s unclear when I’ll be able to get it done.
Nevermind, you can ignore the previous comment, I see what is going wrong.