BentoML: Service with sklearn model fails on my EKS cluster

I have created a simple service:

model_runner = bentoml.sklearn.load_runner("mymodel:latest")
svc = bentoml.Service("myservice", runners=[model_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    return model_runner.run(input_series)

When I run it on my laptop (MacBook Pro M1), using

bentoml serve ./service.py:svc --reload

everything works fine when I invoke the generated classify API.

Now when I push this service to my Yatai server as a bento and deploy it to my K8s cluster (EKS), I get the following error when I invoke the API:

image

Looking at the code, the problem lies in https://github.com/bentoml/BentoML/blob/119b103e2417291b18127d64d38f092893c8de4f/bentoml/_internal/frameworks/sklearn.py#L163 In my case, _num_threads answers 0. Digging a bit further, resource_quota.cpu is computed here: https://github.com/bentoml/BentoML/blob/119b103e2417291b18127d64d38f092893c8de4f/bentoml/_internal/runner/utils.py#L208. Here are the values I get on the pod running the API:

source value
file /sys/fs/cgroup/cpu/cpu.cfs_quota_us -1
file /sys/fs/cgroup/cpu/cpu.cfs_period_us 100000
file /sys/fs/cgroup/cpu/cpu.shares 2
call to os.cpu_count() 2

Given those values, query_cgroup_cpu_count() will return 0.001953125, which once rounded will end up as 0, meaning n_jobs will alway be 0. So the call will always fail on my pods.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 17 (3 by maintainers)

Most upvoted comments

One of our developers thinks we’ve identified the issue. Please standby for commit and release. Will get back to you with an eta.

Thanks for the help in identifying this issue!!!