kubeflow: Can not launch TensorFlow Serving because AVX not available on VM
Hi all,
I followed user guide try to serve-a-model-using-tensorflow-serving, but tf-serving pod can’t be created. I setup kubeflow in my own kubernetes cluster, not using GKE or minikube. How can I fix this error ? Because there is no logs in Pod, I cannot show provide other useful information. Thanks.
root@vagrant:~# kubectl get pod
NAME READY STATUS RESTARTS AGE
ambassador-6ccb864c46-268xt 2/2 Running 0 10m
ambassador-6ccb864c46-9hrsh 2/2 Running 0 10m
ambassador-6ccb864c46-m7gc7 2/2 Running 0 10m
inception-858476d4c4-cr49s 0/1 CrashLoopBackOff 6 10m
tf-hub-0 1/1 Running 0 10m
tf-job-operator-78757955b-nk457 1/1 Running 0 10m
Name: inception-858476d4c4-cr49s
Namespace: default
Node: 192.168.2.21/192.168.2.21
Start Time: Wed, 14 Mar 2018 18:33:26 +0000
Labels: app=inception
pod-template-hash=4140328070
Annotations: <none>
Status: Running
IP: 172.20.0.152
Controlled By: ReplicaSet/inception-858476d4c4
Containers:
inception:
Container ID: docker://f59a9db5a3cf6f2e33671ba30ff71d3e34e32e1835ab908c4250f3c7f93f8c75
Image: gcr.io/kubeflow-images-staging/tf-model-server:v20180227-master
Image ID: docker-pullable://gcr.io/kubeflow-images-staging/tf-model-server@sha256:07ded66bc3a8e5ca582c3dc1871c2636a4ebc06b082e3afd76a68c45f1953385
Port: 9000/TCP
Args:
/usr/bin/tensorflow_model_server
--port=9000
--model_name=inception
--model_base_path=gs://kubeflow-models/inception
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 132
Started: Wed, 14 Mar 2018 18:39:24 +0000
Finished: Wed, 14 Mar 2018 18:39:24 +0000
Ready: False
Restart Count: 6
Limits:
cpu: 4
memory: 4Gi
Requests:
cpu: 1
memory: 1Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-pb8qz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-pb8qz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-pb8qz
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m default-scheduler Successfully assigned inception-858476d4c4-cr49s to 192.168.2.21
Normal SuccessfulMountVolume 9m kubelet, 192.168.2.21 MountVolume.SetUp succeeded for volume "default-token-pb8qz"
Normal Pulled 7m (x5 over 9m) kubelet, 192.168.2.21 Container image "gcr.io/kubeflow-images-staging/tf-model-server:v20180227-master" already present on machine
Normal Created 7m (x5 over 9m) kubelet, 192.168.2.21 Created container
Normal Started 7m (x5 over 9m) kubelet, 192.168.2.21 Started container
Warning BackOff 4m (x24 over 9m) kubelet, 192.168.2.21 Back-off restarting failed container
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (14 by maintainers)
Thanks for your useful suggestion.
After I read tf-serving Dockerfile I noted tf-serving install
tensorflow-model-server
. I also read the tf-serving install guide, there is one note:I found my vm does’nt have avx instruction, so I replace tensorflow-model-server with tensorflow-model-server-universal in Dockerfile and re-build new images. Then tf-serving Pod can be created.
Check the Available binaries section