iree: Sliding window test failing on NVIDIA GPU
bazel test //integrations/tensorflow/e2e:e2e_tests_sliding_window_test__tf__iree_vulkan
Output:
[ RUN ] SlidingWindowTest.test_slidingwindow
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=9, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
2020-07-24 21:55:48.253200: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:118] None of the MLIR optimization passes are enabled (registered 1)
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=10, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11744] RAW: Created new block Size=8388608
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=9, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=7, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11744] RAW: Created new block Size=8388608
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=7, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=7
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=7
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=9
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=10
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=10, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=9, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=7, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
[vk_mem_alloc.h : 16824] RAW: vmaCreateBuffer
[vk_mem_alloc.h : 14433] RAW: AllocateMemory: MemoryTypeIndex=7, AllocationCount=1, Size=256
[vk_mem_alloc.h : 11648] RAW: Returned from existing block #0
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=7
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=7
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=10
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=9
SystemContext driver=<pyiree.rt.binding.HalDriver object at 0x7fd750f6bf80>
INFO:tensorflow:time(__main__.SlidingWindowTest.test_slidingwindow): 0.07s
I0724 21:55:48.291506 140564544591680 test_util.py:1973] time(__main__.SlidingWindowTest.test_slidingwindow): 0.07s
[ OK ] SlidingWindowTest.test_slidingwindow
----------------------------------------------------------------------
Ran 2 tests in 2.087s
OK (skipped=1)
[vk_mem_alloc.h : 16934] RAW: vmaDestroyBuffer
[vk_mem_alloc.h : 11934] RAW: Freed from MemoryTypeIndex=9
*** Received signal 11 ***
*** BEGIN MANGLED STACK TRACE ***
/usr/local/lib/python3.6/dist-packages/tensorflow/python/../libtensorflow_framework.so.2(+0x10c561d)[0x7fd759ea861d]
/lib/x86_64-linux-gnu/libc.so.6(+0x3efd0)[0x7fd7bb634fd0]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fd7bb3e0fa0]
/usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0(+0xaca5)[0x7fd658313ca5]
/usr/lib/x86_64-linux-gnu/libEGL.so.1(eglReleaseThread+0x8b)[0x7fd6f01c56fb]
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0(+0x9cdb7)[0x7fd750120db7]
*** END MANGLED STACK TRACE ***
*** Begin stack trace ***
tensorflow::CurrentStackTrace()
pthread_mutex_lock
eglReleaseThread
*** End stack trace ***
I haven’t looked into why. Possibly a regression somewhere. See https://source.cloud.google.com/results/invocations/af857e58-a9a0-48af-a4d4-c1d273501c55/targets/iree%2Fgcp_ubuntu%2Fbazel%2Flinux%2Fx86-turing%2Fintegrations%2Fpresubmit/log for the buildbot run.
Disabling this for now to land Bazel + Cloud GPU CI check.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (17 by maintainers)
Ok I have disabled the build and will look into if we can only exclude only some of the TF tests from GPU. Note that we do still have real GPU coverage for everything buildable on the cmake side.
Can confirm I’ve manually hacked in -fsanitize=address to iree_copts.cmake and had it work. Would be nice to fully support that (and have a CI run with it enabled).