sagemaker-python-sdk: Error hosting Sagemaker MXNet Endpoint

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): MXNet
Framework Version:
Python Version:
CPU or GPU:
Python SDK Version:
Are you using a custom image:

Describe the problem

I am using my own Keras pretrained model (.json, .h5) - converted to tensorflow protobuf and then gziped to model.tar.gz following this tutorial (https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/)

Now after putting the model.tar.gz in S3, I do below code to deploy the model

from sagemaker.mxnet import MXNetModel
mxnet_model = MXNetModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/modelmxnet/model.tar.gz', role=role, entry_point='trasform_script.py')

predictor = mxnet_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)

When I run this cell, I get below error: ValueError: Error hosting endpoint sagemaker-mxnet-2019-02-26-14-40-33-819: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check.

My entry point code trasform.py looks like below where I am doing some preprocessing of image.

from __future__ import print_function

import argparse
import bisect
from collections import Counter
from itertools import chain, islice
import json
import logging
import time
import random
import os

import mxnet as mx
from mxnet import gluon, autograd, nd
from mxnet.io import DataIter, DataBatch, DataDesc
import numpy as np

from sagemaker_mxnet_container.training_utils import scheduler_host

import io
import boto3
from keras.preprocessing import image
from keras.applications.densenet import preprocess_input

logging.basicConfig(level=logging.DEBUG)



def transform_fn(net, data, input_content_type, output_content_type):
    """
    Transform a request using the Gluon model. Called once per request.
    :param net: The model.
    :param data: The request payload.
    :param input_content_type: The request content type.
    :param output_content_type: The (desired) response content type.
    :return: response payload and content type.
    """
    # we can use content types to vary input/output handling, but
    # here we just assume json for both
    net, vocab = net
    parsed = json.loads(data)
    outputs = []
   
    img_path = 'shoe.jpg'
    img = image.load_img(img_path, target_size=(256, 256))
    img_data = image.img_to_array(img)
    img_data = np.expand_dims(img_data, axis=0)
    img_data = preprocess_input(img_data)
    img_data_list = img_data.tolist()
    img_data_json = json.dumps(img_data_list)
    
    output = net(img_data_json)
    prediction = mx.nd.argmax(output, axis=1)
    outputs.append(int(prediction.asscalar()))
    response_body = json.dumps(outputs)
    return response_body, output_content_type

Minimal repro / logs

Error ValueError: Error hosting endpoint sagemaker-mxnet-2019-02-26-14-40-33-819: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check.

Exact command to reproduce:

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (6 by maintainers)

Commits related to this issue

Add delete_record API (#664) — committed to mizanfiu/sagemaker-python-sdk by imingtsou 2 years ago
Add delete_record API (#664) — committed to mizanfiu/sagemaker-python-sdk by imingtsou 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to aws/sagemaker-python-sdk by mizanfiu 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to claytonparnell/sagemaker-python-sdk by mizanfiu 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to mufaddal-rohawala/sagemaker-python-sdk by mizanfiu 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to aws/sagemaker-python-sdk by mizanfiu 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to JoseJuan98/sagemaker-python-sdk by mizanfiu 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to JoseJuan98/sagemaker-python-sdk by mizanfiu 2 years ago
feature: Added doc update for dataset builder (#3539) * Add list_feature_groups API (#647) * feat: Feature/get record api (#650) Co-authored-by: Eric Zou <zoueric@amazon.com> * Add delete_re... — committed to nmadan/sagemaker-python-sdk by mizanfiu 2 years ago

Most upvoted comments

Hi @nahidalam

Let me clarify, the conda environment (conda_mxnet_p36 in your case) is only used in your notebook instance. When you submit a training job, batch transform job or host an endpoint in sagemaker, you are not using that environment but instead it is using our Sagemaker MXNet docker container.

To fix this issue, please add a requirements.txt file in your source_dir along with your python files.

You can also refer to our docs: https://sagemaker.readthedocs.io/en/stable/using_mxnet.html#using-third-party-libraries

You can use the python docs for a reference on the requirements.txt file syntax https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format

Let me know if this works for you.

iquintero on Feb 28, 2019