sagemaker-python-sdk: Training a model in `local_code` mode does not work if `source_dir="."`
System Information
- Python Version: 3.6
- Python SDK Version: master
- Are you using a custom image: yes
Describe the problem
I am trying to train a model using the undocumented local_code
mode. In case I don’t specify source_dir
or set it to "."
the training procedure fails to mount the volumes correctly.
I get:
Cannot create container for service algo-1-JFP46: create .: volume name is too short, names should be at least two alphanumeric characters
I am reporting this even if local_code
is still not documented, hoping it can be useful anyway.
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
session = LocalSession()
session.config = {'local': {'local_code': True}}
est = MyEstimator(
entry_point='code.py',
train_instance_type='local',
train_instance_count=1,
role=role,
sagemaker_session=session,
)
est.fit()
See the full traceback.
Here is the interesting part of the generated docker-compose.yaml
:
networks:
sagemaker-local:
name: sagemaker-local
services:
volumes:
- /tmp/tmp_i5dhjtn/algo-1-JFP46/output/data:/opt/ml/output/data
- /tmp/tmp_i5dhjtn/algo-1-JFP46/output:/opt/ml/output
- /tmp/tmp_i5dhjtn/algo-1-JFP46/input:/opt/ml/input
- /tmp/tmp_i5dhjtn/model:/opt/ml/model
- :/opt/ml/code
- /tmp/tmp_i5dhjtn/shared:/opt/ml/shared
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 15 (3 by maintainers)
I solved it using absolute path for the entry_point
entry_point=str(Path.cwd()/‘train.py’)
Using Pytorch local mode 、 encountered same issue!
Hello @tyrion,
Thank you bringing this to our attention.
I’ll speak with the team in regards to handling this situation and the fix needed.
Thanks again!
Supplying the absolute path worked for me as well, but does anyone know why? It makes no sense to me.
I am getting this same error.
This worked for me. It needs the relative and not the absolute path to the file for local mode to work.