anomalib: Can't run on Mac M1 - Cannot convert a MPS Tensor to float64

Describe the bug

I get a “Cannot convert a MPS Tensor to float64” when running the train.py script on a Mac M1

It seems that the Mac GPU interface can’t handle 64bit tensors… I am unsure where to cast the tensor or how to properly do it but from what I can tell data is loaded in ligthining_fabric/apply_func.py. I tried changing stuff to “data_output= data.type(torch.float32).to(device, **kwargs)” (~line 95) but this does not work. Looking forward to any help 😃

Regards JI

Dataset

MVTec

Model

PADiM

Steps to reproduce the behavior

On a Mac M1 / Apple Silicon:

  • Install as defined in the how to
  • load dataset and put it in the correct folder
  • Run the train.py (

OS information

OS information:

  • OS: Mac OS Ventura 13.5
  • Python version: 3.10.13
  • Anomalib version: 1.0dev
  • PyTorch version: 2.1.1
  • GPU models and configuration: MPS

Expected behavior

A working training to get going with this cool lib

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

1.0dev

Configuration YAML

dataset:
  name: mvtec
  format: mvtec
  path: ./datasets/MVTec
  category: bottle
  task: segmentation
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: padim
  backbone: resnet18
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  normalization_method: min_max # options: [none, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: True # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: torch, onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

python tools/train.py --config src/anomalib/models/padim/custom_config.yaml
/Users/justiniszatt/Desktop/Programming/python/anomalib_env/anomalib/src/anomalib/config/config.py:280: UserWarning: config.project.unique_dir is set to False. This does not ensure that your results will be written in an empty directory and you may overwrite files.
  warn(
Global seed set to 42
2023-12-08 17:43:18,916 - anomalib.data - INFO - Loading the datamodule
2023-12-08 17:43:18,917 - anomalib.data.utils.transform - INFO - No config file has been provided. Using default transforms.
2023-12-08 17:43:18,917 - anomalib.data.utils.transform - INFO - No config file has been provided. Using default transforms.
2023-12-08 17:43:18,917 - anomalib.models - INFO - Loading the model.
2023-12-08 17:43:18,917 - anomalib.models.components.base.anomaly_module - INFO - Initializing PadimLightning model.
/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
2023-12-08 17:43:18,920 - anomalib.models.components.feature_extractors.timm - WARNING - FeatureExtractor is deprecated. Use TimmFeatureExtractor instead. Both FeatureExtractor and TimmFeatureExtractor will be removed in a future release.
2023-12-08 17:43:19,209 - timm.models.helpers - INFO - Loading pretrained weights from url (https://download.pytorch.org/models/resnet18-5c106cde.pth)
2023-12-08 17:43:19,294 - anomalib.utils.loggers - INFO - Loading the experiment logger(s)
2023-12-08 17:43:19,294 - anomalib.utils.callbacks - INFO - Loading the callbacks
/Users/justiniszatt/Desktop/Programming/python/anomalib_env/anomalib/src/anomalib/utils/callbacks/__init__.py:153: UserWarning: Export option: None not found. Defaulting to no model export
  warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export")
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True (mps), used: True
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
2023-12-08 17:43:19,324 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
2023-12-08 17:43:19,324 - anomalib - INFO - Training the model.
2023-12-08 17:43:19,327 - anomalib.data.mvtec - INFO - Found the dataset.
/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `ROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:613: UserWarning: Checkpoint directory results/padim/mvtec/bottle/run/weights/lightning exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py:183: UserWarning: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
  rank_zero_warn(
2023-12-08 17:43:19,428 - pytorch_lightning.callbacks.model_summary - INFO - 
  | Name                  | Type                     | Params
-------------------------------------------------------------------
0 | image_threshold       | AnomalyScoreThreshold    | 0     
1 | pixel_threshold       | AnomalyScoreThreshold    | 0     
2 | model                 | PadimModel               | 2.8 M 
3 | image_metrics         | AnomalibMetricCollection | 0     
4 | pixel_metrics         | AnomalibMetricCollection | 0     
5 | normalization_metrics | MinMax                   | 0     
-------------------------------------------------------------------
2.8 M     Trainable params
0         Non-trainable params
2.8 M     Total params
11.131    Total estimated model params size (MB)
Epoch 0:   0%|                                           | 0/10 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/anomalib/tools/train.py", line 79, in <module>
    train(args)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/anomalib/tools/train.py", line 64, in train
    trainer.fit(model=model, datamodule=datamodule)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
    results = self._run_stage()
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
    self._run_train()
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 187, in advance
    batch = next(data_fetcher)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in __next__
    return self.fetching_function()
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 275, in fetching_function
    return self.move_to_device(batch)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 294, in move_to_device
    batch = self.batch_to_device(batch)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 261, in batch_to_device
    batch = self.trainer._call_strategy_hook("batch_to_device", batch, dataloader_idx=0)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1494, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 273, in batch_to_device
    return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 342, in _apply_batch_transfer_handler
    batch = self._call_batch_hook("transfer_batch_to_device", batch, device, dataloader_idx)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 330, in _call_batch_hook
    return trainer_method(hook_name, *args)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/core/hooks.py", line 632, in transfer_batch_to_device
    return move_data_to_device(batch, device)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py", line 102, in move_data_to_device
    return apply_to_collection(batch, dtype=_TransferableDataType, function=batch_to)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 72, in apply_to_collection
    return _apply_to_collection_slow(
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 104, in _apply_to_collection_slow
    v = _apply_to_collection_slow(
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 96, in _apply_to_collection_slow
    return function(data, *args, **kwargs)
  File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py", line 95, in batch_to
    data_output = data.to(device, **kwargs)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Epoch 0:   0%|          | 0/10 [00:17<?, ?it/s]

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

You could try changing the accelerator inside trainer section of config from auto to cpu, maybe that solves the issue.

If you want to solve the visualisation issue you should add matplotlib.use(‘Agg’) to the top of the visualizer.py file. There is some issue with Mac Os and that’s why you need to specify this matplotlib backend. It worked for me, I hope it will work for you too

@jahad9819jjj, yeah you are right, the problem was not fully resolved. I’ve also spotted this, and created #1644 . Once it is merged, it should hopefully be ok 😃

I had the same visualization error when I tried to log image results out. I have tried to run the same code in Linux environment, and everything works just fine,

if you change the config file as follows, you should be able to log some image results out.

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: False # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: simple

I am using macOS 12.3.1 with miniconda env

Hallo blaz-r,

And first of all thanks for the tip! That was exactly the hint I needed 😉

While it made the model run for a bit, I got a follow up crash with a reshape problem in return visualization.generate() File "/Users/justiniszatt/Desktop/Programming/python/anomalib_env/anomalib/src/anomalib/post_processing/visualizer.py", line 287, in generate img = img.reshape(self.figure.canvas.get_width_height()[::-1] + (3,)) ValueError: cannot reshape array of size 15000000 into shape (500,2500,3) I already played around with the visualisation part of the config but had no luck so far -> any tips here would also be appreciated

Regarding the MPS, it still might be interesting to further investigate as the processing times are suboptimal on CPU… I tried to convert some of the tensors to float32 but had no luck at all