vision: ONNX CI workflow is broken
Since the 5th of May our CI workflow for ONNX is broken (commit 970ba3555794d163daca0ab95240d21e3035c304). Looking at the warnings emitted by the failing tests
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
[...]
AssertionError: The values for attribute 'shape' do not match: torch.Size([1, 4]) != torch.Size([0, 4]).
Two models are affected faster_rcnn
and mask_rcnn
. To reproduce run:
pytest test/test_onnx.py -k "test_faster_rcnn"
pytest test/test_onnx.py -k "test_mask_rcnn"
I believe a recent patch to primtorch might be the offender here. cc @neginraoof @seemethere @mruberry
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 17 (10 by maintainers)
Commits related to this issue
- [ONNX] Relax node constraint for onnx shape inference (#77379) None as input is legal per ONNX spec for representing optional inputs. For [example](https://github.com/onnx/onnx/blob/main/docs/Operato... — committed to pytorch/pytorch by BowenBao 2 years ago
- [ONNX] Relax node constraint for onnx shape inference (#77379) (#77379) Summary: None as input is legal per ONNX spec for representing optional inputs. For [example](https://github.com/onnx/onnx/blob... — committed to pytorch/pytorch by BowenBao 2 years ago
After some painful bisection, I finally found the real offender: pytorch/pytorch#73284. After seeing that the PR title contains the phrase ONNX and our ONNX tests are failing, I have no idea how I missed that when looking at the PRs 🤦
Can confirm @BowenBao’s assessment. You can verify yourself, by looking at the
nightly
branch. The cutoff for today (2022-05-13) was pytorch/pytorch@65f71c0cbeb080c13e927d37b0d23d39bac6f092. Taking that knowledge to themaster
branch, we can verify that pytorch/pytorch@a812c4cd96d94d51627d2af290ae87de34169ec0 was three commits late.It will make its way into tomorrows nightly. I’ll retest and close this if the fix worked.
Hi @pmeier, if my understanding is correct, it appears my fix has not been included in yesterday’s nightly yet.
5-13 nightly https://github.com/pytorch/pytorch/commit/44bf440b53cc9641086ca9462f8cab52d7dbfaa3, head is https://github.com/pytorch/pytorch/commit/65f71c0cbeb080c13e927d37b0d23d39bac6f092, the fix commit is https://github.com/pytorch/pytorch/commit/a812c4cd96d94d51627d2af290ae87de34169ec0 after it.
We’ll know when the fresh
torch
nightly drops, which is around UTC+0 10:00. I’ll report back.Fix has been merged in pytorch master. @datumbox please let us know if this fixes torchvision CI.
My bad, I linked the wrong PR 🤦 Sorry for the noise. ~It should have been pytorch/pytorch#76875~ See https://github.com/pytorch/vision/issues/5971#issuecomment-1124310367
It does seem like primTorch would be to blame because we also use the “prim” or “prims” prefix, but we don’t have a prim::Constant or any C++ code