sagemaker-python-sdk: Error hosting Sagemaker MXNet Endpoint

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): MXNet
  • Framework Version:
  • Python Version:
  • CPU or GPU:
  • Python SDK Version:
  • Are you using a custom image:

Describe the problem

I am using my own Keras pretrained model (.json, .h5) - converted to tensorflow protobuf and then gziped to model.tar.gz following this tutorial (https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/)

Now after putting the model.tar.gz in S3, I do below code to deploy the model

from sagemaker.mxnet import MXNetModel
mxnet_model = MXNetModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/modelmxnet/model.tar.gz', role=role, entry_point='trasform_script.py')

predictor = mxnet_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)

When I run this cell, I get below error: ValueError: Error hosting endpoint sagemaker-mxnet-2019-02-26-14-40-33-819: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check.

My entry point code trasform.py looks like below where I am doing some preprocessing of image.

from __future__ import print_function

import argparse
import bisect
from collections import Counter
from itertools import chain, islice
import json
import logging
import time
import random
import os

import mxnet as mx
from mxnet import gluon, autograd, nd
from mxnet.io import DataIter, DataBatch, DataDesc
import numpy as np

from sagemaker_mxnet_container.training_utils import scheduler_host

import io
import boto3
from keras.preprocessing import image
from keras.applications.densenet import preprocess_input

logging.basicConfig(level=logging.DEBUG)



def transform_fn(net, data, input_content_type, output_content_type):
    """
    Transform a request using the Gluon model. Called once per request.
    :param net: The model.
    :param data: The request payload.
    :param input_content_type: The request content type.
    :param output_content_type: The (desired) response content type.
    :return: response payload and content type.
    """
    # we can use content types to vary input/output handling, but
    # here we just assume json for both
    net, vocab = net
    parsed = json.loads(data)
    outputs = []
   
    img_path = 'shoe.jpg'
    img = image.load_img(img_path, target_size=(256, 256))
    img_data = image.img_to_array(img)
    img_data = np.expand_dims(img_data, axis=0)
    img_data = preprocess_input(img_data)
    img_data_list = img_data.tolist()
    img_data_json = json.dumps(img_data_list)
    
    output = net(img_data_json)
    prediction = mx.nd.argmax(output, axis=1)
    outputs.append(int(prediction.asscalar()))
    response_body = json.dumps(outputs)
    return response_body, output_content_type

Minimal repro / logs

Error ValueError: Error hosting endpoint sagemaker-mxnet-2019-02-26-14-40-33-819: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check.

  • Exact command to reproduce:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @nahidalam

Let me clarify, the conda environment (conda_mxnet_p36 in your case) is only used in your notebook instance. When you submit a training job, batch transform job or host an endpoint in sagemaker, you are not using that environment but instead it is using our Sagemaker MXNet docker container.

To fix this issue, please add a requirements.txt file in your source_dir along with your python files.

You can also refer to our docs: https://sagemaker.readthedocs.io/en/stable/using_mxnet.html#using-third-party-libraries

You can use the python docs for a reference on the requirements.txt file syntax https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format

Let me know if this works for you.