kubeflow: Error when using pytorch in Notebooks

Hey,

when using pytorch (via fastai) in my notebook I get the error:

RuntimeError: Traceback (most recent call last):
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
   samples = collate_fn([dataset[i] for i in batch_indices])
 File "/opt/conda/lib/python3.6/site-packages/fastai/torch_core.py", line 117, in data_collate
   return torch.utils.data.dataloader.default_collate(to_data(batch))
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
   return [default_collate(samples) for samples in transposed]
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in <listcomp>
   return [default_collate(samples) for samples in transposed]
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 207, in default_collate
   storage = batch[0].storage()._new_shared(numel)
 File "/opt/conda/lib/python3.6/site-packages/torch/storage.py", line 122, in _new_shared
   return cls._new_using_fd(size)
RuntimeError: unable to write to file </torch_1509_1969058146>

I read online that this might be solved by setting --ipc=host for docker. Where can I set this? Or can this error be solved in another way?

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 17 (16 by maintainers)

Most upvoted comments

@hno2 Thanks for figuring this out! I was going through the fast ai course and hit this issue.

While we wait for #3102 to get merged here’s a potential work around.

Spawn a notebook using the notebook manager UI

Get the spec for the notebook

  kubectl get notebooks -o yaml <notebook>  > /tmp/mynotebook.yaml

Edit the notebook spec

Add to volumes

- name: dshm
  emptyDir:
    medium: Memory

Add to volume mounts
```
- mountPath: /dev/shm
  name: dshm
```

Apply your notebook

kubectl apply -f  / /tmp/mynotebook.yaml

jlewi on Apr 28, 2019

fyi, I added this as pull request #3102

I will close this Issue once it is merged.

hno2 on Apr 25, 2019