autograph: Tests fail on Mac OS Monterey, Rust 1.57
Tests fail to finish on M1 Mac
$ cargo test device_new --features device_tests
running 1 test
test device::tests::device_new has been running for over 60 seconds
error: test failed, to rerun pass '--lib'
Caused by:
process didn't exit successfully: `/Users/rjzak/Downloads/autograph/target/debug/deps/autograph-868587c6365604da device_new` (signal: 9, SIGKILL: kill)
$ cargo test --features "full device_tests"
test device::buffer::tests::device_buffer_copy_from_slice has been running for over 60 seconds
test device::buffer::tests::device_buffer_serde has been running for over 60 seconds
test device::buffer::tests::fill_bf16 has been running for over 60 seconds
test device::buffer::tests::fill_f16 has been running for over 60 seconds
test device::buffer::tests::fill_f32 has been running for over 60 seconds
test device::buffer::tests::fill_f64 has been running for over 60 seconds
test device::buffer::tests::fill_i16 has been running for over 60 seconds
test device::buffer::tests::fill_i32 has been running for over 60 seconds
error: test failed, to rerun pass '--lib'
Caused by:
process didn't exit successfully: `/Users/rjzak/Downloads/autograph/target/debug/deps/autograph-aa9dbc5e89ab94bc` (signal: 9, SIGKILL: kill)
$ uname -a
Darwin macmini.local 21.1.0 Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:24 PDT 2021; root:xnu-8019.41.5~1/RELEASE_ARM64_T8101 arm64
$ rustc --version
rustc 1.57.0 (f1edd0429 2021-11-29)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 37 (20 by maintainers)
Ok so I found some potential issues. This really needs unit tests because there are at least a few different common setups for host / device memory. Typical discrete gpus have a big DEVICE_LOCAL heap, a small DEVICE_LOCAL | CPU_VISIBLE heap, and a CPU_VISIBLE | COHERENT and potentially CPU_CACHED heap. But for integrated gpus and the M1, it may be different, with more DEVICE_LOCAL | CPU_VISIBLE memory in addition to DEVICE_LOCAL memory alone. The impl was intended to handle this but without tests it got neglected. This should have tests for known configs, as this is not only necessary to run on M1 but also iGPU’s and mobile chips as well.
Making some progress!! The “device new” test worked, so I also ran all of the tests. 145 passed, 1 failed, 2 ignored.
Well that’s encouraging. I’m trying to figure out how to use ash-molten to do the static linking, it seems like it worked but the function I used returned None for some reason. But all the extensions are provided in the api so it will be possible to select impls based on that rather than guessing based on platform.
Ok so that didn’t actually work as expected. Anyway, I was pretty sure that it was the atomic ops. Metal does support some atomic operations, but not atomic_or (used in glsl impls for storing bf16) and atomic_compare_exchange (Metal has atomic_compare_exchange_weak which should work fine). So I think if I use atomic_compare_exchange_weak it will work.
However, it turns out that gfx-hal (the base API that I used to abstract over the 3 backends) uses spirv_cross to compile spirv to hlsl, but for whatever reason uses naga by default to compile for metal. Naga is a Rust cross compilation tool, but it doesn’t support everything yet, and does not appear to support atomic operations at all. So I should be able to use spirv_cross instead, which should fix the issue, at the very least it should be able to parse the spirv as it has worked on dx12, but there may still be limitations / some things that don’t translate to metal. So I’m working on trying validate this at shader compile time to catch issues like this. I should be able to check that it will compile to metal and hlsl cross platform, including testing on CI as well.
But hopefully with a few small changes everything should work correctly without separate impls.
Eventually I would like to fix the issue with shader compilation at runtime poisoning the device, it would be potentially nicer if it blocked instead. This way you could try different versions with different capabilities / extensions, it would be more flexible.
Awesome! I think I found the issue. Can you pull the changes and retry please? It looks like I requested CPU_CACHED for host memory when that is not always available.