BentoML: Service with sklearn model fails on my EKS cluster

I have created a simple service:

model_runner = bentoml.sklearn.load_runner("mymodel:latest")
svc = bentoml.Service("myservice", runners=[model_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    return model_runner.run(input_series)

When I run it on my laptop (MacBook Pro M1), using

bentoml serve ./service.py:svc --reload

everything works fine when I invoke the generated classify API.

Now when I push this service to my Yatai server as a bento and deploy it to my K8s cluster (EKS), I get the following error when I invoke the API:

Looking at the code, the problem lies in https://github.com/bentoml/BentoML/blob/119b103e2417291b18127d64d38f092893c8de4f/bentoml/_internal/frameworks/sklearn.py#L163 In my case, _num_threads answers 0. Digging a bit further, resource_quota.cpu is computed here: https://github.com/bentoml/BentoML/blob/119b103e2417291b18127d64d38f092893c8de4f/bentoml/_internal/runner/utils.py#L208. Here are the values I get on the pod running the API:

source	value
file `/sys/fs/cgroup/cpu/cpu.cfs_quota_us`	-1
file `/sys/fs/cgroup/cpu/cpu.cfs_period_us`	100000
file `/sys/fs/cgroup/cpu/cpu.shares`	2
call to `os.cpu_count()`	2

Given those values, query_cgroup_cpu_count() will return 0.001953125, which once rounded will end up as 0, meaning n_jobs will alway be 0. So the call will always fail on my pods.

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 2
Comments: 17 (3 by maintainers)

Most upvoted comments

One of our developers thinks we’ve identified the issue. Please standby for commit and release. Will get back to you with an eta.

Thanks for the help in identifying this issue!!!

timliubentoml on Apr 1, 2022