kserve: kfserving 0.5.0-rc2 standalone - segmentation violation

/kind bug

What steps did you take and what happened:

NAME                             READY   STATUS
kfserving-controller-manager-0   2/2     Running
  • Create a namespace for sklearn
kubectl create namespace myns

Check the controller-manager pod status

  • It is in CrashLoop and Error
NAME                             READY   STATUS
kfserving-controller-manager-0   1/2     CrashLoopBackOff

Logs from the controller-manager pod

2021/02/02 15:20:54 http: panic serving 172.17.0.1:21148: runtime error: invalid memory address or nil pointer dereference
goroutine 346 [running]:
net/http.(*conn).serve.func1(0xc00079e320)
	/usr/local/go/src/net/http/server.go:1772 +0x139
panic(0x1759f60, 0x28a7210)
	/usr/local/go/src/runtime/panic.go:975 +0x3e3
github.com/kubeflow/kfserving/pkg/apis/serving/v1alpha2.(*InferenceService).ConvertFrom(0xc000610000, 0x1bccf60, 0xc00042fc00, 0x1bccf60, 0xc00042fc00)
	/go/src/github.com/kubeflow/kfserving/pkg/apis/serving/v1alpha2/inferenceservice_conversion.go:219 +0x1183
sigs.k8s.io/controller-runtime/pkg/webhook/conversion.(*Webhook).convertObject(0xc0009557f0, 0x1baa0e0, 0xc00042fc00, 0x1baa020, 0xc000610000, 0x1baa020, 0xc000610000)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/webhook/conversion/conversion.go:142 +0x7bc
sigs.k8s.io/controller-runtime/pkg/webhook/conversion.(*Webhook).handleConvertRequest(0xc0009557f0, 0xc0003bc200, 0xc0009e83c0, 0x0, 0x0)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/webhook/conversion/conversion.go:107 +0x1f8
sigs.k8s.io/controller-runtime/pkg/webhook/conversion.(*Webhook).ServeHTTP(0xc0009557f0, 0x7f315b7e3530, 0xc0000a0050, 0xc000892200)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/webhook/conversion/conversion.go:74 +0x10b
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1(0x7f315b7e3530, 0xc0000a0050, 0xc000892200)
	/go/pkg/mod/github.com/prometheus/client_golang@v1.6.0/prometheus/promhttp/instrument_server.go:40 +0xab
net/http.HandlerFunc.ServeHTTP(0xc000513320, 0x7f315b7e3530, 0xc0000a0050, 0xc000892200)
	/usr/local/go/src/net/http/server.go:2012 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1(0x1bdb560, 0xc000130000, 0xc000892200)
	/go/pkg/mod/github.com/prometheus/client_golang@v1.6.0/prometheus/promhttp/instrument_server.go:100 +0xda
net/http.HandlerFunc.ServeHTTP(0xc0005134d0, 0x1bdb560, 0xc000130000, 0xc000892200)
	/usr/local/go/src/net/http/server.go:2012 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2(0x1bdb560, 0xc000130000, 0xc000892200)
	/go/pkg/mod/github.com/prometheus/client_golang@v1.6.0/prometheus/promhttp/instrument_server.go:76 +0xb2
net/http.HandlerFunc.ServeHTTP(0xc0005135c0, 0x1bdb560, 0xc000130000, 0xc000892200)
	/usr/local/go/src/net/http/server.go:2012 +0x44
net/http.(*ServeMux).ServeHTTP(0xc0007bfd80, 0x1bdb560, 0xc000130000, 0xc000892200)
	/usr/local/go/src/net/http/server.go:2387 +0x1a5
net/http.serverHandler.ServeHTTP(0xc0004ec0e0, 0x1bdb560, 0xc000130000, 0xc000892200)
	/usr/local/go/src/net/http/server.go:2807 +0xa3
net/http.(*conn).serve(0xc00079e320, 0x1bdf560, 0xc0003bc040)
	/usr/local/go/src/net/http/server.go:1895 +0x86c
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2933 +0x35c
{"level":"info","ts":1612279254.1017942,"logger":"v1beta1Controllers.InferenceService","msg":"Reconciling inference service","apiVersion":"serving.kubeflow.org/v1beta1","isvc":"sklearn-iris"}
{"level":"info","ts":1612279254.202461,"logger":"PredictorReconciler","msg":"Reconciling Predictor","PredictorSpec":{"sklearn":{"storageUri":"gs://seldon-models/sklearn/iris","protocolVersion":"v2","name":"","resources":{}}}}
E0202 15:20:54.202734       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 251 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1759f60, 0x28a7210)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1759f60, 0x28a7210)
	/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1.(*SKLearnSpec).getContainerV2(0xc00021a9a0, 0xc000a0c080, 0xc, 0x0, 0x0, 0xc000a0c09a, 0x6, 0xc0005668a0, 0x53, 0xc000a28450, ...)
	/go/src/github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1/predictor_sklearn.go:118 +0x44c
github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1.(*SKLearnSpec).GetContainer(0xc00021a9a0, 0xc000a0c080, 0xc, 0x0, 0x0, 0xc000a0c09a, 0x6, 0xc0005668a0, 0x53, 0xc000a28450, ...)
	/go/src/github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1/predictor_sklearn.go:71 +0x120
github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice/components.(*Predictor).Reconcile(0xc0003bcb80, 0xc0004e3800, 0x13, 0x1beab80)
	/go/src/github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice/components/predictor.go:84 +0x4fe
github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile(0xc0007bfa00, 0xc0003a95ea, 0x6, 0xc0003a95c0, 0xc, 0xc0006ce2d0, 0xc0004b61b8, 0xc0004b61b0, 0xc0004b61b0)
	/go/src/github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice/controller.go:132 +0x45a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00095f950, 0x17bc5c0, 0xc0009fe4e0, 0xc000583e00)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:233 +0x161
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00095f950, 0x203000)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:209 +0xae
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00095f950)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:188 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00061c0b0)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00061c0b0, 0x1ba0880, 0xc0006721e0, 0x1, 0xc000044f60)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00061c0b0, 0x3b9aca00, 0x0, 0x1a11201, 0xc000044f60)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc00061c0b0, 0x3b9aca00, 0xc000044f60)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:90 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:170 +0x411
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13f094c]

goroutine 251 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:55 +0x105
panic(0x1759f60, 0x28a7210)
	/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1.(*SKLearnSpec).getContainerV2(0xc00021a9a0, 0xc000a0c080, 0xc, 0x0, 0x0, 0xc000a0c09a, 0x6, 0xc0005668a0, 0x53, 0xc000a28450, ...)
	/go/src/github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1/predictor_sklearn.go:118 +0x44c
github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1.(*SKLearnSpec).GetContainer(0xc00021a9a0, 0xc000a0c080, 0xc, 0x0, 0x0, 0xc000a0c09a, 0x6, 0xc0005668a0, 0x53, 0xc000a28450, ...)
	/go/src/github.com/kubeflow/kfserving/pkg/apis/serving/v1beta1/predictor_sklearn.go:71 +0x120
github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice/components.(*Predictor).Reconcile(0xc0003bcb80, 0xc0004e3800, 0x13, 0x1beab80)
	/go/src/github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice/components/predictor.go:84 +0x4fe
github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice.(*InferenceServiceReconciler).Reconcile(0xc0007bfa00, 0xc0003a95ea, 0x6, 0xc0003a95c0, 0xc, 0xc0006ce2d0, 0xc0004b61b8, 0xc0004b61b0, 0xc0004b61b0)
	/go/src/github.com/kubeflow/kfserving/pkg/controller/v1beta1/inferenceservice/controller.go:132 +0x45a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00095f950, 0x17bc5c0, 0xc0009fe4e0, 0xc000583e00)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:233 +0x161
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00095f950, 0x203000)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:209 +0xae
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00095f950)
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:188 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00061c0b0)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00061c0b0, 0x1ba0880, 0xc0006721e0, 0x1, 0xc000044f60)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00061c0b0, 0x3b9aca00, 0x0, 0x1a11201, 0xc000044f60)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc00061c0b0, 0x3b9aca00, 0xc000044f60)
	/go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:90 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/go/pkg/mod/github.com/zchee/sigs.k8s-controller-runtime@v0.6.1-0.20200623114430-46812d3a0a50/pkg/internal/controller/controller.go:170 +0x411
  • Deleting the InfrerenceService/sklearn-iris brings the controller back to running

What did you expect to happen: Submitting a sklearn sample with 0.5.0-rc2 should succeed and not crash the controller-manager pod

Anything else you would like to add:

Describe on the controller-manager pod

Name:         kfserving-controller-manager-0
Namespace:    kfserving-system
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Tue, 02 Feb 2021 09:57:28 -0500
Labels:       control-plane=kfserving-controller-manager
              controller-revision-hash=kfserving-controller-manager-855f55cffb
              controller-tools.k8s.io=1.0
              statefulset.kubernetes.io/pod-name=kfserving-controller-manager-0
Annotations:  <none>
Status:       Running
IP:           172.17.0.4
IPs:
  IP:           172.17.0.4
Controlled By:  StatefulSet/kfserving-controller-manager
Containers:
  kube-rbac-proxy:
    Container ID:  docker://2f9c84cd1de7c419dc064aeb5372752ec5b905d52355b926cac9755f0116bc64
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0
    Image ID:      docker-pullable://gcr.io/kubebuilder/kube-rbac-proxy@sha256:297896d96b827bbcb1abd696da1b2d81cab88359ac34cce0e8281f266b4e08de
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Running
      Started:      Tue, 02 Feb 2021 09:57:29 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tlwwp (ro)
  manager:
    Container ID:  docker://1741ffc3299df8eea03f949f73f4b9095727bb408ba64e12f41d57cc415ed8c7
    Image:         gcr.io/kfserving/kfserving-controller:v0.5.0-rc2
    Image ID:      docker-pullable://gcr.io/kfserving/kfserving-controller@sha256:7fecc482c6d2bf7edb68d7ef56eac6f2b9bb359ce2667d6454e47024c340afcf
    Port:          9443/TCP
    Host Port:     0/TCP
    Command:
      /manager
    Args:
      --metrics-addr=127.0.0.1:8080
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 02 Feb 2021 10:19:42 -0500
      Finished:     Tue, 02 Feb 2021 10:20:54 -0500
    Ready:          False
    Restart Count:  9
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      POD_NAMESPACE:  kfserving-system (v1:metadata.namespace)
      SECRET_NAME:    kfserving-webhook-server-cert
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tlwwp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kfserving-webhook-server-cert
    Optional:    false
  default-token-tlwwp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tlwwp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  25m                  default-scheduler  Successfully assigned kfserving-system/kfserving-controller-manager-0 to minikube
  Normal   Pulled     25m                  kubelet, minikube  Container image "gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0" already present on machine
  Normal   Created    25m                  kubelet, minikube  Created container kube-rbac-proxy
  Normal   Started    25m                  kubelet, minikube  Started container kube-rbac-proxy
  Normal   Pulling    23m (x4 over 25m)    kubelet, minikube  Pulling image "gcr.io/kfserving/kfserving-controller:v0.5.0-rc2"
  Normal   Pulled     23m (x4 over 25m)    kubelet, minikube  Successfully pulled image "gcr.io/kfserving/kfserving-controller:v0.5.0-rc2"
  Normal   Created    23m (x4 over 25m)    kubelet, minikube  Created container manager
  Normal   Started    23m (x4 over 25m)    kubelet, minikube  Started container manager
  Warning  BackOff    5m7s (x88 over 24m)  kubelet, minikube  Back-off restarting failed container

Environment:

  • Istio Version: 1.7.6
  • Knative Version: 0.20.0
  • KFServing Version: 0.5.0-rc2
  • Kubeflow version: N/A
  • Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
  • Minikube version: 1.16.0
  • Kubernetes version: (use kubectl version): 1.17.4
  • OS (e.g. from /etc/os-release):

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (12 by maintainers)

Most upvoted comments

@adriangonz Can you help take a look?