anomalib: [Bug]: Patchcore exported ONNX file is not usable
Describe the bug
Hi all! Has anyone tried to do inference with the Patchcore exported ONNX from anomalib with onnxruntime for example?
The model is apprently buggy as it is asking for an insane amount of memory (haven’t been able to run it on 80GB machine on CPU for example). The error I keep getting is :
onnxruntime/onnxruntime/core/framework/bfc_arena.cc:342 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 288601669632.
The problematic layer is apparently this sub node.
Does anyone has any clue how to fix it? or is there any workaround?
Dataset
Folder
Model
PatchCore
Steps to reproduce the behavior
1 - Install Anomalib 2 - Train a Patchcore model 3 - Try to infer with the ONNX
OS information
OS information:
- OS: [e.g. Ubuntu 20.04]
- Python version: [e.g. 3.8.10]
- Anomalib version: [e.g. 0.4.0]
- PyTorch version: [e.g. 1.9.0]
- CUDA/cuDNN version: [e.g. 11.4]
- GPU models and configuration: [1x A100]
Expected behavior
exported ONNX model from patchcore is not working.
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
0.4.0
Configuration YAML
model:
name: patchcore
backbone: wide_resnet50_2
pre_trained: true
layers:
- layer2
- layer3
coreset_sampling_ratio: 0.1
num_neighbors: 9
normalization_method: min_max # options: [null, min_max, cdf]
metrics:
image:
- F1Score
- AUROC
pixel:
- F1Score
- AUROC
threshold:
method: adaptive #options: [adaptive, manual]
manual_image: null
manual_pixel: null
visualization:
show_images: False # show images on the screen
save_images: True # save images to the file system
log_images: True # log images to the available loggers (if any)
image_save_path: null # path to which images will be saved
mode: full # options: ["full", "simple"]
project:
seed: 42
path: ./results
logging:
logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
log_graph: false # Logs the model graph to respective logger.
optimization:
export_mode: onnx #options: onnx, openvino
# PL Trainer Args. Don't add extra parameter here.
trainer:
enable_checkpointing: true
default_root_dir: null
gradient_clip_val: 0
gradient_clip_algorithm: norm
num_nodes: 1
devices: 1
enable_progress_bar: true
overfit_batches: 0.0
track_grad_norm: -1
check_val_every_n_epoch: 1 # Don't validate before extracting features.
fast_dev_run: false
accumulate_grad_batches: 1
max_epochs: 1
min_epochs: null
max_steps: -1
min_steps: null
max_time: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
limit_predict_batches: 1.0
val_check_interval: 1.0 # Don't validate before extracting features.
log_every_n_steps: 50
accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
strategy: null
sync_batchnorm: false
precision: 32
enable_model_summary: true
num_sanity_val_steps: 0
profiler: null
benchmark: false
deterministic: false
reload_dataloaders_every_n_epochs: 0
auto_lr_find: false
replace_sampler_ddp: true
detect_anomaly: false
auto_scale_batch_size: false
plugins: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
Logs
onnxruntime/onnxruntime/core/framework/bfc_arena.cc:342 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 288601669632.
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (6 by maintainers)
I thought the same and I tried that, but it still allocated the wrong size of memory. My model with the patch applied seems to work the same as the pytorch version, but I need to fix something in the training itself. The patch seems to resolve my issue, so my investigation stops here for now since I have limited time to work on this. I hope someone can pick it up and make a proper fix for it.
I’m running into the same issue, trying to debug what is happening. I noticed that my memory bank is shaped
[48000, 1536](images are shaped[400, 400]) and that the amount of memory it tries to allocate is737280000000bytes. It probably isn’t a coincidence that48000 * 1536 = 73728000. I would expect it to allocate48000 * 1536 * 4bytes though, as the memory bank has dtypefloat32, not48000 * 1536 * 10000bytes. I’m trying to find out where this issue comes from … I’ll let you know if I find anything, but wanted to share my findings in the meantime.Hello @hgaiser ,Thank you very much for your answer, just @jasonvanzelm provided a new solution idea, you can also try. I am testing it now and hope the modification will work once.
Numerous methods are provided in Anomalib, and it seems that PatchCore performs well among all the models that can be trained just once to get there. But I checked the related material with the original paper and found that Padim using
Wide_ResNet 50also performs well. But strangely, the model cannot be trained correctly after I set the backbone of Padim to wide_ResNet 50, you can refer to #1045 for more details.I wonder if you can provide some ideas for this problem, thank you very much!
Good to hear that it works. We’ll see what to do from here on and try to implement this fix and test it. Thanks for all the input 😃
I’ve narrowed it down quite a bit to this line:
https://github.com/openvinotoolkit/anomalib/blob/main/src/anomalib/models/patchcore/torch_model.py#L191
It seems that in the ONNX representation (and apparently also OpenVINO?), the input for
cdistis shaped1, 48000, 1536(called%onnx::Sub_770in ONNX), whereas it is shaped48000, 1536according to pytorch. The other input (%/Reshape_output_0)is shaped2500, 1536, but seems to get unsqueezed to2500, 1, 1536(presumably to match the other input). The calculated output shape ofcdistis then2500, 48000, 1536, which is way too large.I believe the
Sub_770tensor should be squeezed so that it doesn’t have three dimensions … but not sure where this happens. I’m not entirely sure on the details, but at the moment I have this diff (thanks to https://github.com/openvinotoolkit/anomalib/issues/440#issuecomment-1191184221):I haven’t yet checked if the output of the ONNX model is correct, but at the very least it runs. I will check the output tomorrow.
I’ve encountered the same issue,
My current workaround is to either use the .ckpt file with the pytorch lightning interpreter or extract the memory_bank as a numpy array, define a custom PatchCoreModel wrapper and manually load & store the memory_bank as a tensor.
I was only able to run the ONNX file on a server with 64 GB RAM when using an input size of 224 and a sampling rate of 0.01, so that I get ~6.7k entries (shape of 6700x1536) for the memory_bank.
Using different Docker images or Interpreters made no difference (OpenVINO or ONNX Runtime) Tried these with a 3090: https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.cuda nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
Hello, I’m not sure what the problem is, but I can reproduce this. It definitely isn’t normal that model requests 288gigs of memory, but I’m not entirely sure if that node is the only problem.