sagemaker-python-sdk: Error hosting Sagemaker MXNet Endpoint
Please fill out the form below.
System Information
- Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): MXNet
- Framework Version:
- Python Version:
- CPU or GPU:
- Python SDK Version:
- Are you using a custom image:
Describe the problem
I am using my own Keras pretrained model (.json, .h5) - converted to tensorflow protobuf and then gziped to model.tar.gz following this tutorial (https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/)
Now after putting the model.tar.gz in S3, I do below code to deploy the model
from sagemaker.mxnet import MXNetModel
mxnet_model = MXNetModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/modelmxnet/model.tar.gz', role=role, entry_point='trasform_script.py')
predictor = mxnet_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)
When I run this cell, I get below error:
ValueError: Error hosting endpoint sagemaker-mxnet-2019-02-26-14-40-33-819: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check.
My entry point code trasform.py
looks like below where I am doing some preprocessing of image.
from __future__ import print_function
import argparse
import bisect
from collections import Counter
from itertools import chain, islice
import json
import logging
import time
import random
import os
import mxnet as mx
from mxnet import gluon, autograd, nd
from mxnet.io import DataIter, DataBatch, DataDesc
import numpy as np
from sagemaker_mxnet_container.training_utils import scheduler_host
import io
import boto3
from keras.preprocessing import image
from keras.applications.densenet import preprocess_input
logging.basicConfig(level=logging.DEBUG)
def transform_fn(net, data, input_content_type, output_content_type):
"""
Transform a request using the Gluon model. Called once per request.
:param net: The model.
:param data: The request payload.
:param input_content_type: The request content type.
:param output_content_type: The (desired) response content type.
:return: response payload and content type.
"""
# we can use content types to vary input/output handling, but
# here we just assume json for both
net, vocab = net
parsed = json.loads(data)
outputs = []
img_path = 'shoe.jpg'
img = image.load_img(img_path, target_size=(256, 256))
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
img_data_list = img_data.tolist()
img_data_json = json.dumps(img_data_list)
output = net(img_data_json)
prediction = mx.nd.argmax(output, axis=1)
outputs.append(int(prediction.asscalar()))
response_body = json.dumps(outputs)
return response_body, output_content_type
Minimal repro / logs
Error
ValueError: Error hosting endpoint sagemaker-mxnet-2019-02-26-14-40-33-819: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check.
- Exact command to reproduce:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (6 by maintainers)
Commits related to this issue
- Add delete_record API (#664) — committed to mizanfiu/sagemaker-python-sdk by imingtsou 2 years ago
- Add delete_record API (#664) — committed to mizanfiu/sagemaker-python-sdk by imingtsou 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to aws/sagemaker-python-sdk by mizanfiu 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to claytonparnell/sagemaker-python-sdk by mizanfiu 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to mufaddal-rohawala/sagemaker-python-sdk by mizanfiu 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to aws/sagemaker-python-sdk by mizanfiu 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to JoseJuan98/sagemaker-python-sdk by mizanfiu 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to JoseJuan98/sagemaker-python-sdk by mizanfiu 2 years ago
- feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to nmadan/sagemaker-python-sdk by mizanfiu 2 years ago
Hi @nahidalam
Let me clarify, the conda environment (conda_mxnet_p36 in your case) is only used in your notebook instance. When you submit a training job, batch transform job or host an endpoint in sagemaker, you are not using that environment but instead it is using our Sagemaker MXNet docker container.
To fix this issue, please add a requirements.txt file in your source_dir along with your python files.
You can also refer to our docs: https://sagemaker.readthedocs.io/en/stable/using_mxnet.html#using-third-party-libraries
You can use the python docs for a reference on the requirements.txt file syntax https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
Let me know if this works for you.