sagemaker-python-sdk: Training a model in `local_code` mode does not work if `source_dir="."`

System Information

  • Python Version: 3.6
  • Python SDK Version: master
  • Are you using a custom image: yes

Describe the problem

I am trying to train a model using the undocumented local_code mode. In case I don’t specify source_dir or set it to "." the training procedure fails to mount the volumes correctly.

I get:

Cannot create container for service algo-1-JFP46: create .: volume name is too short, names should be at least two alphanumeric characters

I am reporting this even if local_code is still not documented, hoping it can be useful anyway.

Minimal repro / logs

Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

session = LocalSession()
session.config = {'local': {'local_code': True}}

est = MyEstimator(
    entry_point='code.py',
    train_instance_type='local',
    train_instance_count=1,
    role=role,
    sagemaker_session=session,
)

est.fit()

See the full traceback.

Here is the interesting part of the generated docker-compose.yaml:

networks:
  sagemaker-local:
    name: sagemaker-local
services:
    volumes:
    - /tmp/tmp_i5dhjtn/algo-1-JFP46/output/data:/opt/ml/output/data
    - /tmp/tmp_i5dhjtn/algo-1-JFP46/output:/opt/ml/output
    - /tmp/tmp_i5dhjtn/algo-1-JFP46/input:/opt/ml/input
    - /tmp/tmp_i5dhjtn/model:/opt/ml/model
    - :/opt/ml/code
    - /tmp/tmp_i5dhjtn/shared:/opt/ml/shared

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 15 (3 by maintainers)

Most upvoted comments

I solved it using absolute path for the entry_point

entry_point=str(Path.cwd()/‘train.py’)

Using Pytorch local mode 、 encountered same issue!

Hello @tyrion,

Thank you bringing this to our attention.

I’ll speak with the team in regards to handling this situation and the fix needed.

Thanks again!

I solved it using absolute path for the entry_point

entry_point=str(Path.cwd()/‘train.py’)

Supplying the absolute path worked for me as well, but does anyone know why? It makes no sense to me.

I am getting this same error.

I solved it using absolute path for the entry_point

entry_point=str(Path.cwd()/‘train.py’)

This worked for me. It needs the relative and not the absolute path to the file for local mode to work.