godot: Vulkan: GPU Timeout on MacOS [tested multiple hardware]

Godot version

4.0.dev (we cut from 880a0177d12463b612268afe95bd3d8dd565bf52) @Zylann’s terrain module on top

System information

multiple hardware tested MacOS 12.4 intel i9 MacOS 12.2 intel i7

Issue description

Game crashes when a complex enough scene is provided and loaded.

No UBSAN/ASAN errors anymore either. (spent entire day literally fixing them all - patch incoming soon ™️)

Tried MVK_ALLOW_METAL_EVENTS=1 as per some searches but gave me an additional 2 seconds after the scene had loaded before the GPU timeout.

We’re having this on ARM processors too. I will try a bisect tomorrow, but any help is appreciated.

This is for the Mirror.

Symptoms entire app hangs, no display output anymore, infinite swap issues or crash VK_DISPLAY_LOST etc.

2022-10-14 00:20:24.210364+0100 godot.macos.opt.tools.x86_64[4888:74403] Execution of the command buffer was aborted due to an error during execution. Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
[mvk-error] VK_ERROR_DEVICE_LOST: Command buffer 0x7fe954f59a00 "vkQueueSubmit CommandBuffer on Queue 0-0" execution failed (code 2): Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
2022-10-14 00:20:24.210533+0100 godot.macos.opt.tools.x86_64[4888:74403] ERROR:  - Message Id Number: 0 | Message Id Name: 
	VK_ERROR_DEVICE_LOST: Command buffer 0x7fe954f59a00 "vkQueueSubmit CommandBuffer on Queue 0-0" execution failed (code 2): Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_QUEUE, Handle 105553183017928
at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:171)
ERROR:  - Message Id Number: 0 | Message Id Name: 
	VK_ERROR_DEVICE_LOST: Command buffer 0x7fe954f59a00 "vkQueueSubmit CommandBuffer on Queue 0-0" execution failed (code 2): Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout)
	Objects - 1
		Object[0] - VK_OBJECT_TYPE_QUEUE, Handle 105553183017928

Steps to reproduce

Honestly dug around a lot and don’t know how to reproduce outside of our codebase due to complexity.

Minimal reproduction project

Unable to provide as unsure how to reproduce outside our game.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 24 (23 by maintainers)

Most upvoted comments

RevoluPowered and I debugged this a little further today. We think that the issue is not reproducible on the mobile renderer which points to the root issue being in either the scene shader or in a compute shader. We analyzed typical draw calls in the xcode debugger and did not see anything totally out of the ordinary (although between Vulkan -> MoltenVK -> Metal a lot of debug info is lost so I am not confident in what we saw).

My best guess is something is breaking in the cluster building resulting in pathological loops forming in the scene shader

Next steps:

  1. Force disable lighting by adding #define MODE_UNSHADED to the top of the clustered rendering shader. See if this impacts the crash
  2. Force disable, lighting, decals, and reflection probes by commenting out the 3 for loops over cluster data with the format for (uint i = item_from; i < item_to; i++) {. See if this impacts the crash
  3. Compare the results from the cluster builder on macOS in XCode to the results on a non-macOS computer for a similar scene (i.e. validate the output of the cluster builder manually)
  4. Same as 3. except force disable subgroups on the non-macOS computer ~5. Pare down the clustered renderer by removing compute shader dispatches one by one until the problem goes away~

Fixes: https://github.com/godotengine/godot/pull/67912 - important if you like your debugger to work https://github.com/godotengine/godot/pull/67913 - important if you don’t want msaa to randomly crash on you https://github.com/godotengine/godot/pull/67915 - the most important one

Patch to fix the issue is incoming, we found it was the subgroup support in moltenvk

Patch to correct datatypes to ensure using unsigned int (GLSL compiler complains they’re int not uint):

@@ -689,11 +687,11 @@ vec4 fog_process(vec3 vertex) {
 	return vec4(fog_color, fog_amount);
 }
 
 void cluster_get_item_range(uint p_offset, out uint item_min, out uint item_max, out uint item_from, out uint item_to) {
 	uint item_min_max = cluster_buffer.data[p_offset];
-	item_min = item_min_max & 0xFFFF;
+	item_min = item_min_max & 0xFFFFu;
 	item_max = item_min_max >> 16;
 
 	item_from = item_min >> 5;
 	item_to = (item_max == 0) ? 0 : ((item_max - 1) >> 5) + 1; //side effect of how it is stored, as item_max 0 means no elements
 }
@@ -956,13 +954,13 @@ void fragment_shader(in SceneData scene_data) {
 			uint merged_mask = mask;
 #endif
 
 			while (merged_mask != 0) {
 				uint bit = findMSB(merged_mask);
-				merged_mask &= ~(1 << bit);
+				merged_mask &= ~(1u << bit);
 #ifdef USE_SUBGROUPS
-				if (((1 << bit) & mask) == 0) { //do not process if not originally here
+				if (((1u << bit) & mask) == 0) { //do not process if not originally here
 					continue;
 				}
 #endif
 				uint decal_index = 32 * i + bit;
 
@@ -1417,13 +1415,13 @@ void fragment_shader(in SceneData scene_data) {
 			uint merged_mask = mask;
 #endif
 
 			while (merged_mask != 0) {
 				uint bit = findMSB(merged_mask);
-				merged_mask &= ~(1 << bit);
+				merged_mask &= ~(1u << bit);
 #ifdef USE_SUBGROUPS
-				if (((1 << bit) & mask) == 0) { //do not process if not originally here
+				if (((1u << bit) & mask) == 0) { //do not process if not originally here
 					continue;
 				}
 #endif
 				uint reflection_index = 32 * i + bit;
 
@@ -1773,13 +1771,13 @@ void fragment_shader(in SceneData scene_data) {
 #endif
 
 			float shadow = 1.0;
 #ifndef SHADOWS_DISABLED
 			if (i < 4) {
-				shadow = float(shadow0 >> (i * 8) & 0xFF) / 255.0;
+				shadow = float(shadow0 >> (i * 8u) & 0xFFu) / 255.0;
 			} else {
-				shadow = float(shadow1 >> ((i - 4) * 8) & 0xFF) / 255.0;
+				shadow = float(shadow1 >> ((i - 4u) * 8u) & 0xFFu) / 255.0;
 			}
 
 			shadow = shadow * directional_lights.data[i].shadow_opacity + 1.0 - directional_lights.data[i].shadow_opacity;
 #endif
 
@@ -1837,13 +1835,13 @@ void fragment_shader(in SceneData scene_data) {
 			uint merged_mask = mask;
 #endif
 
 			while (merged_mask != 0) {
 				uint bit = findMSB(merged_mask);
-				merged_mask &= ~(1 << bit);
+				merged_mask &= ~(1u << bit);
 #ifdef USE_SUBGROUPS
-				if (((1 << bit) & mask) == 0) { //do not process if not originally here
+				if (((1u << bit) & mask) == 0) { //do not process if not originally here
 					continue;
 				}
 #endif
 				uint light_index = 32 * i + bit;
 
@@ -1908,13 +1906,13 @@ void fragment_shader(in SceneData scene_data) {
 			uint merged_mask = mask;
 #endif
 
 			while (merged_mask != 0) {
 				uint bit = findMSB(merged_mask);
-				merged_mask &= ~(1 << bit);
+				merged_mask &= ~(1u << bit);
 #ifdef USE_SUBGROUPS
-				if (((1 << bit) & mask) == 0) { //do not process if not originally here
+				if (((1u << bit) & mask) == 0) { //do not process if not originally here
 					continue;
 				}
 #endif
 
 				uint light_index = 32 * i + bit;
@@ -2063,11 +2061,11 @@ void fragment_shader(in SceneData scene_data) {
 
 			float sRed = floor((cRed / pow(2.0f, exps - B - N)) + 0.5f);
 			float sGreen = floor((cGreen / pow(2.0f, exps - B - N)) + 0.5f);
 			float sBlue = floor((cBlue / pow(2.0f, exps - B - N)) + 0.5f);
 			//store as 8985 to have 2 extra neighbour bits
-			uint light_rgbe = ((uint(sRed) & 0x1FF) >> 1) | ((uint(sGreen) & 0x1FF) << 8) | (((uint(sBlue) & 0x1FF) >> 1) << 17) | ((uint(exps) & 0x1F) << 25);
+			uint light_rgbe = ((uint(sRed) & 0x1FFu) >> 1) | ((uint(sGreen) & 0x1FFu) << 8) | (((uint(sBlue) & 0x1FFu) >> 1) << 17) | ((uint(exps) & 0x1Fu) << 25);
 
 			imageStore(emission_grid, grid_pos, uvec4(light_rgbe));
 			imageStore(emission_aniso_grid, grid_pos, uvec4(light_aniso));
 		}
 	}
@@ -2097,12 +2095,12 @@ void fragment_shader(in SceneData scene_data) {
 
 #ifdef MODE_RENDER_VOXEL_GI
 	if (bool(instances.data[instance_index].flags & INSTANCE_FLAGS_USE_VOXEL_GI)) { // process voxel_gi_instances
 		uint index1 = instances.data[instance_index].gi_offset & 0xFFFF;
 		uint index2 = instances.data[instance_index].gi_offset >> 16;
-		voxel_gi_buffer.x = index1 & 0xFF;
-		voxel_gi_buffer.y = index2 & 0xFF;
+		voxel_gi_buffer.x = index1 & 0xFFu;
+		voxel_gi_buffer.y = index2 & 0xFFu;
 	} else {
 		voxel_gi_buffer.x = 0xFF;
 		voxel_gi_buffer.y = 0xFF;
 	}
 #endif

I noticed something potentially bad. We don’t lock the version of the metal API we use as far as I can see. Metal 3 has vastly different features compared to Metal 2.

I posted this question in godot rendering chat, so putting here:

is there any way to lock apple dependencies to specific versions? Like for metal rendering for example.
I am curious because right now it is set by the version of xcode we install, then moltenvk uses and adopts that.
It could be hairy because metal 3 has support for RT, and other features, but transparently becomes available.
My assumption is that we can't include them in our repository to ensure a specific version is used right?
The xcode beta/nonbeta flips support for lots of features and OS version essentially too.
Which is why I think it could be an issue, we are consuming a random API version with a different surface each time per machine.
So probably I'll try to find a way to lock this explicitly if nobody has any suggestions on how to do this?

Our bug in this case could also be caused by a machine using an outdated version of metal. I don’t believe it’s the issue but I am going to try testing with a higher OS and XCode version.

I spent a few more hours on this, I debugged a lot, eventually I found that if shader validation in XCode is disabled then the crash will occur in xcode.

This leads me to believe that the timeout IS inside the shader and some of the shader is triggering undefined behaviour and thus causing the GPU timeout.

I will writeup how to configure the frame debugger properly once I can get the “Show in source” button resolving to MoltenVK.

I have a theory that the reason we can’t reproduce on non intel mac’s is the issue is hidden by the higher performance of the M1 mac. the machines roughly have a 30% uplift, it might be why they cannot reproduce it since its timing critical.

We timeout when something broken is sent to the GPU. I enabled vulkan shader validation and it doesn’t report many issues.

My first port of call is a random blank texture is sent to the compute shader, which is 256 MB and never accessed by the GPU:

Screenshot 2022-10-19 at 16 47 36

Another cause for concern is non unique textures are sent to the GPU more than once, and left alive. Screenshot 2022-10-19 at 16 49 22

We could potentially reduce these to a single texture for less memory consumption. Happens quite a lot.

Setting some to use a volatile texture might be a good idea.

We also seem to bind to many objects at once and duplicate bindings perhaps a logic error is present: Screenshot 2022-10-19 at 16 50 40

This is just my initial observations over the past day.

I did more investigation into the issue with the xcode frame debugger, I have found two issues relevant with the same error: https://github.com/KhronosGroup/MoltenVK/issues/602 https://github.com/KhronosGroup/MoltenVK/issues/836

With the XCode frame debugger attached I could not reproduce the crash.

However both those issue pertain and show exactly the same symptoms and log output.

I believe changing the lighting configuration is a symptom of the problem but not the actual issue. The actual issue seems to be we cause some code to timeout with the compute shaders.