cglm: Bug in glm_mat4_inv_sse2
PREFACE: I completely lack both the linear algebra and SIMD skills to fix this; I merely note the issue’s existence!
I was calculating the inverse transpose of the model matrix to properly transform normals for my lighting shader, and noted that the inverse transpose was wrong when I calculated on the CPU with cglm. I compared the results with calculating it in the shader (OpenGL) and in GLM, and found that the issue occurred in glm_mat4_inv, when using SSE (thus following the glm_mat4_inv_sse2 path.
When I manually commented out the SSE path and forced the non-SIMD path in glm_mat4_inv, the calculations came out correct, matching OpenGL shader and GLM results.
Here is the code I used to calculate a sample model matrix, and its inverse transpose, along with the code to print out resulting matrices at each step (NOTE: I’ve flipped/corrected? the mat4 printing (#288), so the matrices print out column-major, to match glm’s output which I was comparing. I also increased precision to 6 decimal places, again to match glm’s output):
mat4 model;
mat4 inverse;
mat4 transpose;
// Create an identity matrix for the model
glm_mat4_identity(model);
fprintf(stderr, "Identity matrix:\n");
glm_mat4_print(model, stderr);
// Translate the model matrix
glm_translate(model, (vec3){2.0f, 5.0f, -15.0f});
fprintf(stderr, "Translated matrix:\n");
glm_mat4_print(model, stderr);
// Apply a rotation to the model matrix in the x and y dimensions.
glm_rotate(model, glm_rad(45.0f), (vec3){1.0f, 0.5f, 0.0f});
fprintf(stderr, "Model matrix:\n");
glm_mat4_print(model, stderr);
// Invert the matrix, and store in inverse
glm_mat4_inv(model, inverse);
fprintf(stderr, "Inverse matrix:\n");
glm_mat4_print(inverse, stderr);
// Transpose the inverse matrix, and store in transpose
glm_mat4_transpose_to(inverse, transpose);
fprintf(stderr, "Transpose matrix:\n");
glm_mat4_print(transpose, stderr);
And here are the resulting values.
Identity matrix:
| 1.000000 0.000000 0.000000 0.000000 |
| 0.000000 1.000000 0.000000 0.000000 |
| 0.000000 0.000000 1.000000 0.000000 |
| 0.000000 0.000000 0.000000 1.000000 |
Translated matrix:
| 1.000000 0.000000 0.000000 0.000000 |
| 0.000000 1.000000 0.000000 0.000000 |
| 0.000000 0.000000 1.000000 0.000000 |
| 2.000000 5.000000 -15.000000 1.000000 |
Model matrix:
| 0.941421 0.117157 -0.316228 0.000000 |
| 0.117157 0.765685 0.632456 0.000000 |
| 0.316228 -0.632456 0.707107 0.000000 |
| 2.000000 5.000000 -15.000000 1.000000 |
Inverse matrix:
| 0.967995 0.120464 0.325154 -0.000000 |
| -0.120464 -0.787298 0.650308 -0.000000 |
| -0.325154 0.650308 0.727066 -0.000000 |
| 7.415617 -5.577195 -13.507222 -1.028227 |
Transpose matrix:
| 0.967995 -0.120464 -0.325154 7.415617 |
| 0.120464 -0.787298 0.650308 -5.577195 |
| 0.325154 0.650308 0.727066 -13.507222 |
| -0.000000 -0.000000 -0.000000 -1.028227 |
When I forced the non-SIMD path, I got the following results:
Identity matrix:
| 1.000000 0.000000 0.000000 0.000000 |
| 0.000000 1.000000 0.000000 0.000000 |
| 0.000000 0.000000 1.000000 0.000000 |
| 0.000000 0.000000 0.000000 1.000000 |
Translated matrix:
| 1.000000 0.000000 0.000000 0.000000 |
| 0.000000 1.000000 0.000000 0.000000 |
| 0.000000 0.000000 1.000000 0.000000 |
| 2.000000 5.000000 -15.000000 1.000000 |
Model matrix:
| 0.941421 0.117157 -0.316228 0.000000 |
| 0.117157 0.765685 0.632456 0.000000 |
| 0.316228 -0.632456 0.707107 0.000000 |
| 2.000000 5.000000 -15.000000 1.000000 |
Inverse matrix:
| 0.941421 0.117157 0.316228 -0.000000 |
| 0.117157 0.765685 -0.632456 0.000000 |
| -0.316228 0.632456 0.707107 -0.000000 |
| -7.212046 5.424091 13.136425 1.000000 |
Transpose matrix:
| 0.941421 0.117157 -0.316228 -7.212046 |
| 0.117157 0.765685 0.632456 5.424091 |
| 0.316228 -0.632456 0.707107 13.136425 |
| -0.000000 0.000000 -0.000000 1.000000 |
For comparison, here’s the glm code I used:
// Create an identity matrix for the model.
glm::mat4 model = glm::mat4(1.0f);
fprintf(stderr, "Identity matrix:\n");
fprintf(stderr, "%s\n", glm::to_string(model).c_str());
// Translate the matrix
model = glm::translate(model, glm::vec3(2.0f, 5.0f, -15.0f));
fprintf(stderr, "Translated matrix:\n");
fprintf(stderr, "%s\n", glm::to_string(model).c_str());
// Rotate the matrix along the x and y axes
model = glm::rotate(model, glm::radians(45.0f), glm::vec3(1.0f, 0.5f, 0.0f));
fprintf(stderr, "Model matrix:\n");
fprintf(stderr, "%s\n", glm::to_string(model).c_str());
// Invert the model matrix
glm::mat4 inverse = glm::inverse(model);
fprintf(stderr, "Inverse matrix:\n");
fprintf(stderr, "%s\n", glm::to_string(inverse).c_str());
// Transpose the inverted matrix.
glm::mat4 transpose = glm::transpose(inverse);
fprintf(stderr, "Transpose matrix:\n");
fprintf(stderr, "%s\n", glm::to_string(transpose).c_str());
And here are the glm results (which match the non-SIMD path of glm_mat4_inv, slight formatting differences notwithstanding):
Identity matrix:
| 1.000000, 0.000000, 0.000000, 0.000000 |
| 0.000000, 1.000000, 0.000000, 0.000000 |
| 0.000000, 0.000000, 1.000000, 0.000000 |
| 0.000000, 0.000000, 0.000000, 1.000000 |
Translated matrix:
| 1.000000, 0.000000, 0.000000, 0.000000 |
| 0.000000, 1.000000, 0.000000, 0.000000 |
| 0.000000, 0.000000, 1.000000, 0.000000 |
| 2.000000, 5.000000, -15.000000, 1.000000 |
Model matrix:
| 0.941421, 0.117157, -0.316228, 0.000000 |
| 0.117157, 0.765685, 0.632456, 0.000000 |
| 0.316228, -0.632456, 0.707107, 0.000000 |
| 2.000000, 5.000000, -15.000000, 1.000000 |
Inverse matrix:
| 0.941421, 0.117157, 0.316228, -0.000000 |
| 0.117157, 0.765685, -0.632456, 0.000000 |
| -0.316228, 0.632456, 0.707107, -0.000000 |
| -7.212046, 5.424091, 13.136425, 1.000000 |
Transpose matrix:
| 0.941421, 0.117157, -0.316228, -7.212046 |
| 0.117157, 0.765685, 0.632456, 5.424091 |
| 0.316228, -0.632456, 0.707107, 13.136425 |
| -0.000000, 0.000000, -0.000000, 1.000000 |
So pretty clearly, something goes wonky and wrong in the SSE path for glm_mat4_inv.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (9 by maintainers)
@deadwanderer @gottfriedleibniz many thanks, that PR is merged.
@gottfriedleibniz I created a PR for this, it would be nice to get a review: https://github.com/recp/cglm/pull/291
@deadwanderer is it possible to test that PR with
/fp:fastto validate that is working as expected (previous tests)?thanks