ivadomed: NotImplementedError: Got , but numpy array, torch tensor, or caffe2 blob name are expected.

Issue description

I’m training a ‘softseg’ network and got this error:

Terminal output

2022-12-20 11:20:54.181 | INFO     | ivadomed.training:train:122 - Initialising model's weights from scratch.
2022-12-20 11:20:56.595 | INFO     | ivadomed.training:train:138 - Scheduler parameters: {'name': 'CosineAnnealingLR', 'base_lr': 1e-05, 'max_lr': 0.001}
2022-12-20 11:20:56.597 | INFO     | ivadomed.training:train:163 - Selected Loss: AdapWingLoss
2022-12-20 11:20:56.597 | INFO     | ivadomed.training:train:164 - 	with the parameters: []
Training:   2%|███▍                                                                                                                                                                      | 1/50 [00:00<?, ?it/s]/home/GRAMES.POLYMTL.CA/p101317/code/ivadomed/ivadomed/transforms.py:304: RuntimeWarning: invalid value encountered in divide
  data_out = (sample - sample.mean()) / sample.std()
2022-12-20 11:22:05.643 | INFO     | ivadomed.training:train:238 - Epoch 1 training loss: nan.	Dice training loss: nan.
Epoch 1 training loss: nan.	Dice training loss: nan.
Training:   2%|███▍                                                                                                                                                                      | 1/50 [01:45<?, ?it/s]
Traceback (most recent call last):
  File "/home/GRAMES.POLYMTL.CA/p101317/.conda/envs/ivadomed/bin/ivadomed", line 33, in <module>
    sys.exit(load_entry_point('ivadomed', 'console_scripts', 'ivadomed')())
  File "/home/GRAMES.POLYMTL.CA/p101317/code/ivadomed/ivadomed/main.py", line 623, in run_main
    run_command(context=context,
  File "/home/GRAMES.POLYMTL.CA/p101317/code/ivadomed/ivadomed/main.py", line 457, in run_command
    best_training_dice, best_training_loss, best_validation_dice, best_validation_loss = imed_training.train(
  File "/home/GRAMES.POLYMTL.CA/p101317/code/ivadomed/ivadomed/training.py", line 304, in train
    writer.add_scalars('Validation/Metrics', metrics_dict, epoch)
  File "/home/GRAMES.POLYMTL.CA/p101317/.conda/envs/ivadomed/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 403, in add_scalars
    fw.add_summary(scalar(main_tag, scalar_value),
  File "/home/GRAMES.POLYMTL.CA/p101317/.conda/envs/ivadomed/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py", line 249, in scalar
    scalar = make_np(scalar)
  File "/home/GRAMES.POLYMTL.CA/p101317/.conda/envs/ivadomed/lib/python3.8/site-packages/torch/utils/tensorboard/_convert_np.py", line 24, in make_np
    raise NotImplementedError(
NotImplementedError: Got <class 'NoneType'>, but numpy array, torch tensor, or caffe2 blob name are expected.
config file
{
    "command": "train",
    "gpu_ids": [
        5
    ],
    "path_output": "model_seg_lesion_mp2rage_20221220_112014",
    "model_name": "model_seg_lesion_mp2rage",
    "debugging": true,
    "log_file": "log",
    "object_detection_params": {
        "object_detection_path": null,
        "safety_factor": [
            1.0,
            1.0,
            1.0
        ],
        "gpu_ids": 5,
        "path_output": "model_seg_lesion_mp2rage_20221220_112014"
    },
    "wandb": {
        "wandb_api_key": "9095e2bc9e4ab445d478c9c8a81759ae908be8c6",
        "project_name": "basel-mp2rage-lesion",
        "group_name": "3D",
        "run_name": "run-1",
        "log_grads_every": 100
    },
    "loader_parameters": {
        "path_data": [
            "/home/GRAMES.POLYMTL.CA/p101317/data_nvme_p101317/data_seg_mp2rage_20221217_170634/data_processed_lesionseg"
        ],
        "subject_selection": {
            "n": [],
            "metadata": [],
            "value": []
        },
        "target_suffix": [
            "_lesion-manualHaris"
        ],
        "extensions": [
            ".nii.gz"
        ],
        "roi_params": {
            "suffix": null,
            "slice_filter_roi": null
        },
        "contrast_params": {
            "training_validation": [
                "UNIT1"
            ],
            "testing": [
                "UNIT1"
            ],
            "balance": {}
        },
        "slice_filter_params": {
            "filter_empty_mask": true,
            "filter_empty_input": true
        },
        "patch_filter_params": {
            "filter_empty_mask": false,
            "filter_empty_input": false
        },
        "slice_axis": "axial",
        "multichannel": false,
        "soft_gt": true,
        "is_input_dropout": false,
        "bids_validate": true
    },
    "split_dataset": {
        "fname_split": null,
        "random_seed": 42,
        "split_method": "participant_id",
        "data_testing": {
            "data_type": null,
            "data_value": []
        },
        "balance": null,
        "train_fraction": 0.6,
        "test_fraction": 0.2
    },
    "training_parameters": {
        "batch_size": 16,
        "loss": {
            "name": "AdapWingLoss"
        },
        "training_time": {
            "num_epochs": 50,
            "early_stopping_patience": 50,
            "early_stopping_epsilon": 0.001
        },
        "scheduler": {
            "initial_lr": 0.001,
            "lr_scheduler": {
                "name": "CosineAnnealingLR",
                "base_lr": 1e-05,
                "max_lr": 0.001
            }
        },
        "balance_samples": {
            "applied": false,
            "type": "gt"
        },
        "mixup_alpha": null,
        "transfer_learning": {
            "retrain_model": null,
            "retrain_fraction": 1.0,
            "reset": true
        }
    },
    "default_model": {
        "name": "Unet",
        "dropout_rate": 0.3,
        "bn_momentum": 0.1,
        "depth": 3,
        "is_2d": true,
        "final_activation": "relu"
    },
    "uncertainty": {
        "epistemic": false,
        "aleatoric": false,
        "n_it": 0
    },
    "postprocessing": {
        "remove_noise": {
            "thr": -1
        },
        "keep_largest": {},
        "binarize_prediction": {
            "thr": 0.5
        },
        "uncertainty": {
            "thr": -1,
            "suffix": "_unc-vox.nii.gz"
        },
        "fill_holes": {},
        "remove_small": {
            "unit": "vox",
            "thr": 3
        }
    },
    "evaluation_parameters": {
        "object_detection_metrics": true,
        "target_size": {
            "unit": "vox",
            "thr": [
                20,
                100
            ]
        },
        "overlap": {
            "unit": "vox",
            "thr": 3
        }
    },
    "transformation": {
        "Resample": {
            "hspace": 1,
            "wspace": 1,
            "dspace": 1
        },
        "RandomReverse": {
            "applied_to": [
                "im",
                "gt"
            ],
            "dataset_type": [
                "training"
            ]
        },
        "RandomAffine": {
            "degrees": 10,
            "scale": [
                0.2,
                0.2,
                0.2
            ],
            "translate": [
                0.2,
                0.2,
                0.2
            ],
            "applied_to": [
                "im",
                "gt"
            ],
            "dataset_type": [
                "training"
            ]
        },
        "CenterCrop": {
            "size": [
                32,
                32,
                128
            ]
        },
        "NormalizeInstance": {
            "applied_to": [
                "im"
            ]
        }
    },
    "FiLMedUnet": {
        "applied": false,
        "metadata": "contrasts",
        "film_layers": [
            0,
            1,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0
        ]
    },
    "Modified3DUNet": {
        "applied": true,
        "length_3D": [
            32,
            32,
            32
        ],
        "stride_3D": [
            32,
            32,
            4
        ],
        "attention": false,
        "n_filters": 8
    },
    "training_sha256": {
        "sub-P173_UNIT1.nii.gz": "d7faf510ebc0f77d4bceb61c2fd1e37e7aaf4ea50f55e905ce4ccbaf8085ae2f",
        "sub-P057_UNIT1.nii.gz": "29ef9080c42db1979e976ef10fb72cb8e5b35f26f30980719c5efdb8e15997fc",
        "sub-P089_UNIT1.nii.gz": "383f3f135f6a4b9fb6c89d4316317f04db135d5add9ce64612f514dfeca3825d",
        "sub-P199_UNIT1.nii.gz": "2661ce225413dd659874a3aab63869a9c071e91628e68309c50a628b50a5c49a",
        "sub-P001_UNIT1.nii.gz": "275c4ddf275775fa011a44b03e481acc518b49a864d6f8c1d86f803c850009d4",
        "sub-P188_UNIT1.nii.gz": "b14c01f6a398edaced39afc4314dee4d90f974348ac3e43fcb145286df9e1612",
        "sub-P030_UNIT1.nii.gz": "de196f14aa686741ad4cb2e48f6898848bf6ba9bd5cdb61216f329a503190770",
        "sub-P022_UNIT1.nii.gz": "3033d6d6a7b64f0bda62671a2dd5aa8e1ed5690bab7d40bf653c0115416c6e56",
        "sub-P040_UNIT1.nii.gz": "df98742eb2ca7d2d0fce70f46b3bd6d3de15c2ba5d537bbebe36353bf90ff2a6",
        "sub-P063_UNIT1.nii.gz": "94b59476d65efb78c7e89edf905a0473863c662d0d8445c13e3711003ce69ab9",
        "sub-P183_UNIT1.nii.gz": "21817290c565177c048f3203b71a62efa9563c34ab953c11b7ef566a2a87cd5f",
        "sub-P068_UNIT1.nii.gz": "a88f72ad61a87c2c0e22276536e8ae2e531f835f2fab485712c4ec63def3d4aa",
        "sub-P153_UNIT1.nii.gz": "4dc6af0665ff2eefbf39958531b5f19b839b3adf86f55d07d90928a740529ac6",
        "sub-P081_UNIT1.nii.gz": "f52152f37ddb9ab69b43c0b42429003c1dd36355fa2f4c708901e45fed873b88",
        "sub-P003_UNIT1.nii.gz": "42b1e26a9ac14dcbf2b9efd0e13a0b9dec5b58aae5b8247aa03a9909bb5c8227",
        "sub-P179_UNIT1.nii.gz": "b1d2566e618553f90451bc8e9530260609c76a8a3d13f394f42eab4819165ba4",
        "sub-P110_UNIT1.nii.gz": "1f50c8e50fb1054c979e4e892cf0990177fa4d3fb9e5e50345621e7feef6fb3a",
        "sub-P035_UNIT1.nii.gz": "4cd4ccf42a2e1d86032115c929d3caf671cb6876509a41e5020cd0b58f48c924",
        "sub-P101_UNIT1.nii.gz": "2606d1fdf15259d6790c9b74259716c910cef7921c65572993d60b17e32c50e5",
        "sub-P095_UNIT1.nii.gz": "e201ab6c4498988170528213c104dfeb55e743b470178712da8a6d692e6d21b3",
        "sub-P133_UNIT1.nii.gz": "d356d78fb82e3e712a808eccb681a3f759985333e06fe6767b4006d4fe49006b",
        "sub-P029_UNIT1.nii.gz": "946f2dbf028bac08f02042751bf53bd301d452b6c7b6db0d89d8e67491b21777",
        "sub-P067_UNIT1.nii.gz": "b86783c8f475cad23b3f414aa6e163a97d533b72b5f251f719a988a5345b9c0e",
        "sub-P249_UNIT1.nii.gz": "9128cb1b4d47ea76005bc99422647a9a0be88da1415740f97a6819cde8621196",
        "sub-P080_UNIT1.nii.gz": "ce751c30ab5ea3f9c3f82eb79730c9131823f7325cea652c823f0ddaaa502d8c",
        "sub-P193_UNIT1.nii.gz": "871c200cce0c3b6105f6a6e9ca6e4bc129b1c1cf2a956e357a1e62b08fd276ea",
        "sub-P039_UNIT1.nii.gz": "1034813df5c641a83a0aaa3613f016ff3c50e66008394cd41cb09554fd282cbe",
        "sub-P162_UNIT1.nii.gz": "5ca25c14d41072811b35e3034c5b6ca13d5121649e56839883151a5a13936ee3",
        "sub-P048_UNIT1.nii.gz": "a9a20aa2723ae2503cb55d12204a29ba45f7a65770ac3645b34fa20b0ee5fb64",
        "sub-P200_UNIT1.nii.gz": "b80015c38d156bedb602b0f4ccdf58534378657cb6bc494796006d826e0b90c6",
        "sub-P123_UNIT1.nii.gz": "95d8a172b13fd640f47566f736f5060b5a341f9e89342d8a90155038612ac602",
        "sub-P012_UNIT1.nii.gz": "beac0203441d167b9fa513b1aa6fa62156743f55a6f4f3758cb80b6281f655b5",
        "sub-P074_UNIT1.nii.gz": "807ab79818f090b312b897c7b1ff2fca660006bb1586be907bd0c5dbcb0295ab",
        "sub-P092_UNIT1.nii.gz": "aa409241dbe1bcca90c843f42240aab275d90784b12841d0344932cd189f499b",
        "sub-P045_UNIT1.nii.gz": "e66e5ba2114de28c41dbbdc4e2e03f66eb8234267daee3075f47c7f3defd2ba7",
        "sub-P069_UNIT1.nii.gz": "425c5cf86a7accb6a6c447db8e4bbfac21b9b3586c87429df0654540ddf2ef64",
        "sub-P073_UNIT1.nii.gz": "6201cd9398560717e8395f087363fc59d84d0c796496a953f37cc68d17c06f8f",
        "sub-P097_UNIT1.nii.gz": "2492f8ed9c1cf2b1d9d187de53d9a16f36919615557ef0dcd419d643b97f05de",
        "sub-P185_UNIT1.nii.gz": "686e202f743e9b9e1a5a4305f37b3b3c88170cb6e72e915eaab971f7a3cfdcfc",
        "sub-P192_UNIT1.nii.gz": "f76121438526ff10ac1276b4a02746115eb5b48a37c0a22b68b8268eadd103d4",
        "sub-P007_UNIT1.nii.gz": "fdd254ccf7ae1c59debf1ff4978e25307a2aefb363fb44bb5f1528ea3545df9a",
        "sub-P117_UNIT1.nii.gz": "e85f4621ecfb2852b20d95fabc6ff3ea7fbf0748af0f3c065a0d210d635bd892",
        "sub-P176_UNIT1.nii.gz": "9ab90a46287cccb02834d564859307a09cb0aa7ac1b675a04124ead536a9d830",
        "sub-P050_UNIT1.nii.gz": "ca88ac95c80169df151a6db5c5eb6e660f568b6e0d1fddd4f078620e45331871",
        "sub-P141_UNIT1.nii.gz": "c4634fd7f7ff6c7a5bdf31068ecb9d4cb991d59e33510acfdbfcfef5afac46be",
        "sub-P144_UNIT1.nii.gz": "83a1e78105ab37d66e45025ae6264e90d7e7ee335dbc93371d785ea494dd6793",
        "sub-P024_UNIT1.nii.gz": "d9c4c29c1deb96a670948db8e2da063b005e0e30d51e2679ebaa49c42674a9fa",
        "sub-P163_UNIT1.nii.gz": "5cc52879f172e9f37865fa15f4d877b5a250649cb1126811a595701b1c5824f5",
        "sub-P051_UNIT1.nii.gz": "06d92792653132f348dd6f7becbf875c3eb13ee01b1b7583848bd069b2065cf4",
        "sub-P031_UNIT1.nii.gz": "20074cd10956ead4039413781bd0bdda0ff68bddd9c5a6d06e8fe8dcfeea4711",
        "sub-P161_UNIT1.nii.gz": "9e4a81c46c054bb39bf45b50671af45b7cb03f1028bac9ec368db3a19bd43d42",
        "sub-P042_UNIT1.nii.gz": "7022e4e9855b54b8850cef2b7c3249e4b786e18e30a3aba0d1b5eff7d9798cc0",
        "sub-P044_UNIT1.nii.gz": "28f17d374dea69d3a1c38db130a29909f594207178c760e02b33faa3b62871c8",
        "sub-P005_UNIT1.nii.gz": "dc3957affe67fc51fd09839c3e8f9bf7cd89d6ac70456fad4780f5b13edb1fd9",
        "sub-P078_UNIT1.nii.gz": "8f825e17e68f7a2f96b9290e94270d91d4a704adfa853152fb64a6a3830a4cc7",
        "sub-P105_UNIT1.nii.gz": "a37fffc33193437cbd28803ef295739186d7c8c2b8acc7b88b3a6e801b9396af",
        "sub-P149_UNIT1.nii.gz": "307bfafd210e120e69620677b7904b5821214ae26c7b3e87dd3aaba5a859adc4",
        "sub-P147_UNIT1.nii.gz": "5410e4e9f6886e38da86d5e0f01d5085eaa6c6546eec00a5dbc35d963221bb5b",
        "sub-P096_UNIT1.nii.gz": "ba296d68d8b94e0221f5b785d1afb1691af2fbaf9adfabdd21ff38f2e1400b39",
        "sub-P148_UNIT1.nii.gz": "7bc5464f6407072fcad6fb0f7a320b63bec54aa3ec6974eb311993642c574790",
        "sub-P241_UNIT1.nii.gz": "f224fd83c1d4612a3e070c8ea52a098850ab6e0b6c60e0654fbdcef0e4e1a07a",
        "sub-P187_UNIT1.nii.gz": "7af9f491f6aeead04829775c2fd4d17b2a52e7d88bfd5c0509464b4820066944",
        "sub-P049_UNIT1.nii.gz": "baa96097a131992d62a6bcb74d397607b7106694a2cb6560d9d536234b0912f1",
        "sub-P142_UNIT1.nii.gz": "e6b957fce9b91077e515b5935ff4fc53d8ae75bd917c07fbafc997d72dc44c6e",
        "sub-P052_UNIT1.nii.gz": "4019d9d6d8732e74863846feeb2df66a97d50caeeaf6fb68283cddc0381e2f7a",
        "sub-P055_UNIT1.nii.gz": "d33e55cb2361fab0fa4ddd85e23209004784db1b307cb69dee92fed3aedc5394",
        "sub-P243_UNIT1.nii.gz": "49cdc5011616019b3e72981685699bcf0666b584edc2e189b4a59d50ae2f45e7",
        "sub-P140_UNIT1.nii.gz": "5691e29530f45359cfb2b478ea0bcf528df81498edaf11d36011d52809e4df39",
        "sub-P021_UNIT1.nii.gz": "7a04b074dc304a83a12879e58092fa462d75d388fea0a7d9c3f672d962e9bf18",
        "sub-P104_UNIT1.nii.gz": "b4b50fd64c44c38ac0b1598b36cff22395b58e56306dd287561e91b252ded5b7",
        "sub-P124_UNIT1.nii.gz": "977bf31c8516eee05933daa2913d52cf2c634bd6351799386916029b1c505673",
        "sub-P138_UNIT1.nii.gz": "e1cc2ef1b61803a5aa633008dc11a84a93d9d4359fd5a0c2949f75de6c1beea2",
        "sub-P004_UNIT1.nii.gz": "7b9b6ac2376874cf5f80f4ce235c1388bcab3304422b2ded699e1c14732067c1",
        "sub-P085_UNIT1.nii.gz": "131655bdec44dfed232d3b345bde224f8786cf58df091a39ddf495bfc573d20e",
        "sub-P122_UNIT1.nii.gz": "c180696f478aec9bd48389b8e031fa0977d3bafbce46ce0015f4a4690817c7bf",
        "sub-P034_UNIT1.nii.gz": "1391480cbc5db75238395ad8bd8276b5cd301919c5087878805e1216da607705",
        "sub-P244_UNIT1.nii.gz": "a84e3d47896bd248a5a8ea92f8db1e137924278076fb25e6e67a48bc56d78bca",
        "sub-P151_UNIT1.nii.gz": "e13c0841add4ba4286411bb0faac384ecb7f8ecc17b89822289be95e7fb7a051",
        "sub-P134_UNIT1.nii.gz": "bf702ad78d674a6dd8e0557c4b2173c880666d898ce38afbd5d0c25e53095158",
        "sub-P194_UNIT1.nii.gz": "7b126553ac7d96c87c1207930d612f536031508b593d82efdd39a1ceb8d6c986",
        "sub-P182_UNIT1.nii.gz": "1e2dfcc6293faff4c8689ae7cf07dcf49b2a958ec84875a7bb145daae8f8275c",
        "sub-P010_UNIT1.nii.gz": "883743dd1d33c7aa4873cd01cfaa8c142660f4951661695a75e7da6030682fd4",
        "sub-P108_UNIT1.nii.gz": "4176a197454a48dbcbab5aff045828db750121d186da6fc31c924035edd271bb",
        "sub-P120_UNIT1.nii.gz": "b57bbfef8c1ffe3fa653ca096406ec74a112c43d0ae158ed18a0b0045ee9f8ec",
        "sub-P109_UNIT1.nii.gz": "78d47ae447d6d8963a833a481df09750163f35fd612742033aece096403e01e4",
        "sub-P116_UNIT1.nii.gz": "781d420cc3acbb8d1683a2e9f34fa83b9216e5b3fe39aab6dd71c45bb167bb58",
        "sub-P121_UNIT1.nii.gz": "177ccbff5a04d3c48f7e59c7d6d24f06340a8b8f1e681ac2d1054d34c64e1938",
        "sub-P072_UNIT1.nii.gz": "4932e5d4d1ec76b6b8d1dc622d9803abd5d8e235c84453502e8d004b83608ff5",
        "sub-P038_UNIT1.nii.gz": "c8dff094c06d7a2893a1fe585a5636f7836d9cceebfbc9a6c901f48b6370e0f0",
        "sub-P046_UNIT1.nii.gz": "8dbbbe33e810c8b3b3e29dfe763231ae5eb429383b095e4fd03cd298645606c3",
        "sub-P125_UNIT1.nii.gz": "dac49318d803625cfe2ec5fb29c3c5ca3623bb18b74d01d9e0cf50dbb512d6da",
        "sub-P084_UNIT1.nii.gz": "874e440642e9b4ac982bfafa10940f2a3698a0a7581ba41781b56778ea282186",
        "sub-P172_UNIT1.nii.gz": "242f5e4adaf85ad29c618dbdffee572c09a110eb1c83161db556c1eba030d1aa",
        "sub-P083_UNIT1.nii.gz": "3832285ebba279320e5c2186da045738caf69ff4c362dfa82bd4fe9fcca0b21f",
        "sub-P077_UNIT1.nii.gz": "8153218e02140d8384962dec49f0ee0ad28f387cb14aff7ab45a444375a0da60",
        "sub-P011_UNIT1.nii.gz": "ab1a746ba5d751445fd67a583cdf869ab014db13ef287d99a47a762e20fc6f34",
        "sub-P113_UNIT1.nii.gz": "0c0f493ec6376a07d87c2f1df6f81578695e81c8e646223408c31c1f570deede",
        "sub-P102_UNIT1.nii.gz": "d08509029f5d2900bd193f194491d1989bffb904d75a55f63097e4ea588904f0",
        "sub-P062_UNIT1.nii.gz": "0b2baceb4a2df76b3d579a5d075343904fbc41eb1ebbb7a1c3c34fab4cf26a48",
        "sub-P053_UNIT1.nii.gz": "6de0ffd47e6722fa582ead289478a14d89510a3bca18c36cfa44ca544e99d277",
        "sub-P126_UNIT1.nii.gz": "01071bef694b7cb8098cde73c8e9db5eac6d86cd6bbbfbac8db2ccc5c555387b",
        "sub-P019_UNIT1.nii.gz": "bd00a5592ae1158d7ef349f5fa4739e1e16f03538daee6a64d54a41e745200df",
        "sub-P033_UNIT1.nii.gz": "d0005f04e4670aaa20b5f8935e849879bb5ef61615af6f4e3e75f9c1ef62e80d",
        "sub-P159_UNIT1.nii.gz": "c1c565501a0252c5810a9081329d7c4c1240404fab6ecd4c2b17b9e22c5b274b",
        "sub-P197_UNIT1.nii.gz": "7cab9ed02bb185b014dc8c000372b7011afef2a223e282b58e2928930b203e8b",
        "sub-P169_UNIT1.nii.gz": "aae51a60ea1856d11f1b5a024d35d71c87bef882b2b763c4593cd85fce5ec763"
    }
}

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

I did some tests and have more information.

The PR #1164 for filtering 3D empty patches is up-to-date and working as expected. I just need to add some documentation points and it will be ready-to-go. But PR #1164 does not fix the present issue!

Why:

  • The patch_filter (2D or 3D) is applied during the initial loading of the volumes. It filters every patches that is empty after Resample and CenterCrop.
  • The error happens in NormalizeInstance which is applied way after in data augmentation. In other words: after data augmentation, an empty patch will not be filtered out and ultimately leads to this issue.

Options to fix:

  1. Use a less “agressive” data augmentation. In particular for RandomAffine “scale” and “translate”.
  2. Use a smaller CenterCrop to avoid “near” empty patches at the edge of the volume.
  3. As suggested before, we could look into adding a more explicit error for the user when that happens.
  4. Speaking of NormalizeInstance, I have the following idea:

Current NormalizeInstance: https://github.com/ivadomed/ivadomed/blob/79493de405e34eaeb43567af35c3816e2e317090/ivadomed/transforms.py#L302-L305 What if we only normalize if the std is different from 0? like this:

        if sample.std() != 0:
            data_out = (sample - sample.mean()) / sample.std()
        else:
            data_out = sample

A patch with std==0 is either empty or a constant. Is there any issue with not normalizing those specific cases?

So my suggestion would be to output a more informative warning/error when that happens.

Yes, and that also seems like a good argument to revive the patch filter for 3D subvolumes in #1164. I can continue to investigate tomorrow.

I just continued investigating this, and the error was indeed raised due to an empty 3D patch only, as suspected. So the following happens:

Error happened with empty 3D patches, maybe related?

Yes I think so. Just before the error I get this warning which point to an empty patch:

/home/mhbourget/code/ivadomed/ivadomed/transforms.py:304: RuntimeWarning: invalid value encountered in divide
  data_out = (sample - sample.mean()) / sample.std()

Since an empty patch contains only zeros, the division leads to a NaN array. As per NumPy’s default behaviour, this results in a warning and not an exception.

But then, the error itself comes from tensorboard:

/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py:400: RuntimeWarning: invalid value encountered in cast
  tensor = (tensor * scale_factor).astype(np.uint8)
Training:  26%|██████████████████████████████████████████▋                                                                                                                         | 13/50 [04:16<13:11, 21.39s/it]
Traceback (most recent call last):
  File "/home/mhbourget/venv-ivadomed-296/bin/ivadomed", line 11, in <module>
    load_entry_point('ivadomed', 'console_scripts', 'ivadomed')()
  File "/home/mhbourget/code/ivadomed/ivadomed/main.py", line 623, in run_main
    run_command(context=context,
  File "/home/mhbourget/code/ivadomed/ivadomed/main.py", line 457, in run_command
    best_training_dice, best_training_loss, best_validation_dice, best_validation_loss = imed_training.train(
  File "/home/mhbourget/code/ivadomed/ivadomed/training.py", line 304, in train
    writer.add_scalars('Validation/Metrics', metrics_dict, epoch)
  File "/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 403, in add_scalars
    fw.add_summary(scalar(main_tag, scalar_value),
  File "/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py", line 249, in scalar
    scalar = make_np(scalar)
  File "/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/_convert_np.py", line 24, in make_np
    raise NotImplementedError(
NotImplementedError: Got <class 'NoneType'>, but numpy array, torch tensor, or caffe2 blob name are expected.

This NaN array flows through until it’s caught. Although it’s caught much later, the initial effect could be seen in the output of training loss being nan.

Training:   2%|███▍                                                                                                                                                                      | 1/50 [00:00<?, ?it/s]/home/GRAMES.POLYMTL.CA/p101317/code/ivadomed/ivadomed/transforms.py:304: RuntimeWarning: invalid value encountered in divide
  data_out = (sample - sample.mean()) / sample.std()
�[32m2022-12-20 11:22:05.643�[39m | �[1mINFO    �[22m | �[36mivadomed.training�[39m:�[36mtrain�[39m:�[36m238�[39m - �[1mEpoch 1 training loss: nan.	Dice training loss: nan.
Epoch 1 training loss: nan.	Dice training loss: nan

So my suggestion would be to output a more informative warning/error when that happens.

#1164 might take care of the issue in general, but this is a great suggestion to warn the users.

Error happened with empty 3D patches, maybe related?

Yes I think so. Just before the error I get this warning which point to an empty patch:

/home/mhbourget/code/ivadomed/ivadomed/transforms.py:304: RuntimeWarning: invalid value encountered in divide
  data_out = (sample - sample.mean()) / sample.std()

But then, the error itself comes from tensorboard:

/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py:400: RuntimeWarning: invalid value encountered in cast
  tensor = (tensor * scale_factor).astype(np.uint8)
Training:  26%|██████████████████████████████████████████▋                                                                                                                         | 13/50 [04:16<13:11, 21.39s/it]
Traceback (most recent call last):
  File "/home/mhbourget/venv-ivadomed-296/bin/ivadomed", line 11, in <module>
    load_entry_point('ivadomed', 'console_scripts', 'ivadomed')()
  File "/home/mhbourget/code/ivadomed/ivadomed/main.py", line 623, in run_main
    run_command(context=context,
  File "/home/mhbourget/code/ivadomed/ivadomed/main.py", line 457, in run_command
    best_training_dice, best_training_loss, best_validation_dice, best_validation_loss = imed_training.train(
  File "/home/mhbourget/code/ivadomed/ivadomed/training.py", line 304, in train
    writer.add_scalars('Validation/Metrics', metrics_dict, epoch)
  File "/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 403, in add_scalars
    fw.add_summary(scalar(main_tag, scalar_value),
  File "/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py", line 249, in scalar
    scalar = make_np(scalar)
  File "/home/mhbourget/venv-ivadomed-296/lib/python3.8/site-packages/torch/utils/tensorboard/_convert_np.py", line 24, in make_np
    raise NotImplementedError(
NotImplementedError: Got <class 'NoneType'>, but numpy array, torch tensor, or caffe2 blob name are expected.

Yup! Looks like it is related. When reducing the centercrop size (ie: edge of the image containing zeros), there is no more error.

So my suggestion would be to output a more informative warning/error when that happens.

Error happened with empty 3D patches, maybe related? image

@jcohenadad, I’ll take a look. If it’s an issue with data augmentation and 3D training, it may be related to #1213 and #1222. Are you on master?

I was able to reproduce the error on branch mhb/1213-fix-3d-data-augmentation from PR #1222. However, it happened during the 5th epoch of training on my first try, and happened on the 13th epoch on my second try, so it’s somewhat random?

In any case, it seems unrelated to the previous issue so I’ll go ahead ang merge #1222 and continue the investigation on this issue separately.

Maybe related to:

        "RandomReverse": {"applied_to": ["im", "gt"], "dataset_type": ["training"]},

Nope! still getting the error