kubeflow: Error when using pytorch in Notebooks

Hey,

when using pytorch (via fastai) in my notebook I get the error:

RuntimeError: Traceback (most recent call last):
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
   samples = collate_fn([dataset[i] for i in batch_indices])
 File "/opt/conda/lib/python3.6/site-packages/fastai/torch_core.py", line 117, in data_collate
   return torch.utils.data.dataloader.default_collate(to_data(batch))
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
   return [default_collate(samples) for samples in transposed]
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in <listcomp>
   return [default_collate(samples) for samples in transposed]
 File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 207, in default_collate
   storage = batch[0].storage()._new_shared(numel)
 File "/opt/conda/lib/python3.6/site-packages/torch/storage.py", line 122, in _new_shared
   return cls._new_using_fd(size)
RuntimeError: unable to write to file </torch_1509_1969058146>

I read online that this might be solved by setting --ipc=host for docker. Where can I set this? Or can this error be solved in another way?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (16 by maintainers)

Most upvoted comments

@hno2 Thanks for figuring this out! I was going through the fast ai course and hit this issue.

While we wait for #3102 to get merged here’s a potential work around.

  1. Spawn a notebook using the notebook manager UI

  2. Get the spec for the notebook

      kubectl get notebooks -o yaml <notebook>  > /tmp/mynotebook.yaml
    
  3. Edit the notebook spec

    • Add to volumes

      - name: dshm
        emptyDir:
          medium: Memory
      
    • Add to volume mounts

      - mountPath: /dev/shm
        name: dshm
      
  4. Apply your notebook

    kubectl apply -f  / /tmp/mynotebook.yaml
    

fyi, I added this as pull request #3102

I will close this Issue once it is merged.