sagemaker-python-sdk: FrameworkProcessor is broken with SageMaker Pipelines
Describe the bug
Trying to use any Processor derived from FrameworkProcessor is bugged with SageMaker Pipelines. There is a problem with the command and entrypoint parameter, where command does not pass python3, causing the following error:
line 2: import: command not found
To reproduce
- Create a FrameworkProcessor (i.e. PyTorchProcessor, TensorFlowProcessor)
- Create a ProcessingStep and a Pipeline
- Execute it
- See it fail
Expected behavior The pipeline should go through.
Screenshots or logs
Screenshot from Pipelines:

Logs from CloudWatch:
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 2: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 3: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 4: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 5: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 6: from: command not found
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.57.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Every Framework
- Framework version: Every version supported by SM
- Python version: 3.8
- CPU or GPU: CPU and GPU
- Custom Docker image (Y/N): N
Additional context N/A
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 11
- Comments: 23 (6 by maintainers)
Still the case for now.
However, there is now a possibility to use the new
sagemaker.workflow.pipeline_context.PipelineSessionto have the.run()generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a customFrameworkProcessor, but should work with anyFrameworkProcessor. Your code would look like:Just make sure to update the SageMaker Python SDK to the latest version 😃
UPDATE 2:
ScriptProcessordoes work, however there is no support forsource_dirparameter (as commented above by @athewsey ). If you need custom dependencies or for multi-files script, create your own custom container by extending the SM images for TF/PyTorch/HuggingFace/MXNet.For those who need some sort of directions on how to change from
FrameworkProcessortoScriptProcessor, here is an example for TF2.3:A very short example of a
Dockerfileto extend the default TF container and install dependencies (not tested yet):The
FrameworkProcessorhas a method calledget_run_args(doc here) that is designed to help integrate this processor to theProcessingStep, which can be put within a SageMaker pipeline. If you want to add pip dependencies, you can add a requirements.txt file underBASE_DIR.Here is a simplified code that helps to connect the dots between:
FrameworkProcessor,get_run_args,ProcessingStepandPipeline.If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.
Is this issue fixed?
Thanks @dgallitelli
We would encourage users to adopt this new way to construct
TrainingStep,ProcessingStep,TransformStep,TuningStep, andModelStep.We have a
readthedocsabout to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.