kubeflow: Error when using pytorch in Notebooks
Hey,
when using pytorch (via fastai) in my notebook I get the error:
RuntimeError: Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/opt/conda/lib/python3.6/site-packages/fastai/torch_core.py", line 117, in data_collate
return torch.utils.data.dataloader.default_collate(to_data(batch))
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 207, in default_collate
storage = batch[0].storage()._new_shared(numel)
File "/opt/conda/lib/python3.6/site-packages/torch/storage.py", line 122, in _new_shared
return cls._new_using_fd(size)
RuntimeError: unable to write to file </torch_1509_1969058146>
I read online that this might be solved by setting --ipc=host
for docker. Where can I set this? Or can this error be solved in another way?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (16 by maintainers)
@hno2 Thanks for figuring this out! I was going through the fast ai course and hit this issue.
While we wait for #3102 to get merged here’s a potential work around.
Spawn a notebook using the notebook manager UI
Get the spec for the notebook
Edit the notebook spec
Add to volumes
Add to volume mounts
Apply your notebook
fyi, I added this as pull request #3102
I will close this Issue once it is merged.