legion: legion: assertion failure in debug mode

When running on a single node with debug mode, I see this assertion failure:

mttkrp-cuda: /home/rohany/taco-ctrl-rep-bug/legion/legion/runtime/legion/runtime.cc:2548: Legion::Internal::FutureInstance::FutureInstance(const void*, size_t, Legion::Memory, Legion::Internal::ApEvent, Legion::Internal::Runtime*, bool, bool, bool, Legion::Internal::PhysicalInstance, void (*)(void*, size_t), Legion::Processor, Legion::Internal::RtEvent): Assertion `size > 0' failed.
Signal 6 received by node 0, process 3943112 (thread 7fd73403a000) - obtaining backtrace
Signal 6 received by process 3943112 (thread 7fd73403a000) at: stack trace: 21 frames
  [0] = /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fd747cd63c0]
  [1] = /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7fd745f6418b]
  [2] = /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7fd745f43859]
  [3] = /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7fd745f43729]
  [4] = /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7fd745f54f36]
  [5] = bin/mttkrp-cuda(Legion::Internal::FutureInstance::FutureInstance(void const*, unsigned long, Realm::Memory, Legion::Internal::ApEvent, Legion::Internal::Runtime*, bool, bool, bool, Realm::RegionInstance, void (*)(void*, unsigned long), Realm::Processor, Legion::Internal::RtEvent)+0x148) [0x56168c98ca90]
  [6] = bin/mttkrp-cuda(Legion::Internal::MemoryManager::create_future_instance(Legion::Internal::Operation*, unsigned long long, Legion::Internal::ApEvent, unsigned long, bool)+0x5a7) [0x56168c9b0b43]
  [7] = bin/mttkrp-cuda(Legion::Internal::FutureImpl::request_application_instance(Realm::Memory, Legion::Internal::SingleTask*, unsigned long long, unsigned int, Legion::Internal::ApUserEvent, unsigned long)+0x579) [0x56168c98677f]
  [8] = bin/mttkrp-cuda(Legion::Internal::SingleTask::finalize_map_task_output(Legion::Mapping::Mapper::MapTaskInput&, Legion::Mapping::Mapper::MapTaskOutput&, Legion::Internal::MustEpochOp*, std::vector<Legion::Internal::InstanceSet, std::allocator<Legion::Internal::InstanceSet> >&)+0x949) [0x56168c77e979]
  [9] = bin/mttkrp-cuda(Legion::Internal::SingleTask::invoke_mapper(Legion::Internal::MustEpochOp*)+0x184) [0x56168c7831e8]
  [10] = bin/mttkrp-cuda(Legion::Internal::SingleTask::map_all_regions(Legion::Internal::MustEpochOp*, Legion::Internal::TaskOp::DeferMappingArgs const*)+0x367) [0x56168c784d5f]
  [11] = bin/mttkrp-cuda(Legion::Internal::IndividualTask::perform_mapping(Legion::Internal::MustEpochOp*, Legion::Internal::TaskOp::DeferMappingArgs const*)+0x3f) [0x56168c790431]
  [12] = bin/mttkrp-cuda(Legion::Internal::SingleTask::trigger_mapping()+0x2d5) [0x56168c77d2ff]
  [13] = bin/mttkrp-cuda(Legion::Internal::Runtime::legion_runtime_task(void const*, unsigned long, void const*, unsigned long, Realm::Processor)+0x45a) [0x56168ca020e0]
  [14] = bin/mttkrp-cuda(+0x217ce4a) [0x56168d4a7e4a]
  [15] = bin/mttkrp-cuda(+0x21e3ae1) [0x56168d50eae1]
  [16] = bin/mttkrp-cuda(+0x21e8c64) [0x56168d513c64]
  [17] = bin/mttkrp-cuda(+0x21e6bf5) [0x56168d511bf5]
  [18] = bin/mttkrp-cuda(+0x21ee98e) [0x56168d51998e]
  [19] = bin/mttkrp-cuda(+0x21fcc6a) [0x56168d527c6a]
  [20] = /lib/x86_64-linux-gnu/libc.so.6(+0x5e660) [0x7fd745f7c660]

To reproduce the error, go to /home/rohany/taco-ctrl-rep-bug/build/ and run ./runner.sh. It the binary is compiled against a debug build of legion.

In my code, there is a loop like this:

Future f;
for (...) {
  TaskLauncher launcher(...);
  if (f.valid()) {
    launcher.add_future(f);
  }
  f = runtime->execute_task(ctx, launcher);
}

when I remove this tracking of the future (i.e. the loop just launches the tasks), the code succeeds. However, I don’t think that this is incorrect, as I’ve done the same in a testing application with no errors, so I think that there is some interaction with this and other parts of the system that I don’t understand.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (25 by maintainers)

Commits related to this issue

Most upvoted comments

I started work on the fix tonight. It’s unclear when I’ll be able to get it done.

Nevermind, you can ignore the previous comment, I see what is going wrong.