onnxruntime: [Performance] Why is dynamic shape not supported with the CoreML provider, while CoreML 2+ supports it ?
Describe the issue
When I try to run an ONNX model with a dynamic shape (even on a single axis) on the CoreML backend of ORT 1.13.1, on a recent machine (macbook air M1 with macOS 12.5), I get the following warning:
[W:onnxruntime:, helper.cc:61 IsInputSupported] Dynamic shape is not supported for now, for input:input
And as a result the ONNX runs on the CPU instead of CoreML.
Why can’t ORT runs ONNX models with dynamic shape on the CoreML backend ? According to https://apple.github.io/coremltools/mlmodel/Format/Model.html CoreML supports dynamic shapes since version 2, which was released with macOS 10.14, more than 4 years ago.
To reproduce
Use ORT 1.13.1 macos arm64 on a C++ project running in arm64 mode. Try to infer an ONNX model with at least one axis dynamic, and the warning will show up and the model inference will fallback to CPU.
Urgency
No response
Platform
Mac
OS Version
12.5
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.13.1
ONNX Runtime API
C++
Architecture
ARM64
Execution Provider
CoreML
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 20 (8 by maintainers)
From what I understand, CoreML allows 3 approaches for dynamic shapes: -a set of predetermined shapes -bounded ranges -unbounded ranges
Support for any of these mode would be a significant improvement over the single fixed shape situation we have currently. I don’t have any personal preference as long as I can at least play with a couple different shapes.
I suppose the easiest would be to implement the unbounded range mode, which is a straightforward application of the ONNX’s unbounded dimensions.
I added some checks to limit the shapes to at most rank 5 in the CoreML EP. That limit seems to be inherent to CoreML at the moment. ORT should no longer fail to load the model because of CoreML compilation errors due to that, but it limits the nodes that are assigned to the CoreML EP.
https://github.com/microsoft/onnxruntime/pull/17086
This SAM model has some shapes with rank greater than 5, so those nodes won’t be supported by the CoreML EP. Maybe the model can be updated to work around this limitation.
Basic dynamic shape support is enabled in the CoreML EP now. Note that the CPU EP is also an option if performance is not a priority.
We’ll look into adding more op support. PRs are welcome too.
Seconding the above. We’re trying to run a transformer sentence embedding model using the CoreML Execution Provider. Without support for dynamic shapes we have to pad each sequence in a batch to the maximum model sequence length.
SAM models and its variants (FastSAM) are now widely used for semi-automated segmentation tasks. For instance the AnyLabeling tool uses such models to assist the semantic segmentation annotation task and this tool is quite popular (not notion of “production use case” is quite subjective). SAM models are also used in concrete applications such as medical image segmentation with SAMed.
Apart from transformers models, dynamic shapes are widely used in standard CNNs in order to allow different speed-accuracy trade-offs. For instance, one would use a CNN (let say for object detection) with input shapes of 256x256 or 512x512 depending on the need of real-time inference.
I also often use dynamic shapes to allow multiple batch sizes. This allows ones to adapt to different loads dynamically, for instances on servers.
ONNX, TensortRT, CoreML and others support dynamic shape because there is a demand for such capability I guess. Performance is not the priority, at least for me, thus I think that basic support for dynamic shape with
CoreMLExecutionProvideris enough, for instance using unbounded ranges only. This avoids crashes at runtime and improves the interoperability.The model is public, it’s one from the AnyLabeling software. They simply repackaged the public SAM model as far as I know.
No it’s supposed to work with any network, at least networks using convolutions: https://coremltools.readme.io/docs/flexible-inputs
In my use case (audio processing) I adapt the input shape based both on the rendering type (real-time preview : smaller input shape for a more dynamic feedback, offline render: longer input shape to reduce borders artefacts) and sample rate (to not tile excessively where the spectrum contains nothing).