kubeflow: Error in Pipeline section on dashboard: `upstream connect error or disconnect/reset before headers`

/kind bug

What steps did you take and what happened: After successfully deployed kubeflow on Azure (I had to change the yaml file for installation, more details in #5246 ). I run the following command to be able to check the kubeflow dashboard:

 kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

I’m able to visit the Dashboard page, I can visit the Home, Notebook Servers, Katib and Artifact store sections. But when I try to visit the Pipelines section I got the following error:

upstream connect error or disconnect/reset before headers. reset reason: connection failure

What did you expect to happen: I except to be able to visit the Pipelines section and run some examples pipelines.

Anything else you would like to add: YAML file used in the deployment: kfctl_k8s_istio.v1.1.0.yaml.txt

image

image

$ kubectl get all -n anonymous
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/ml-pipeline-ui-artifact-bd978746-bcw82             2/2     Running   0          29m
pod/ml-pipeline-visualizationserver-865c7865bc-nw2vb   2/2     Running   0          29m

NAME                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/ml-pipeline-ui-artifact           ClusterIP   10.0.208.227   <none>        80/TCP     29m
service/ml-pipeline-visualizationserver   ClusterIP   10.0.13.172    <none>        8888/TCP   29m

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ml-pipeline-ui-artifact           1/1     1            1           29m
deployment.apps/ml-pipeline-visualizationserver   1/1     1            1           29m

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/ml-pipeline-ui-artifact-bd978746             1         1         1       29m
replicaset.apps/ml-pipeline-visualizationserver-865c7865bc   1         1         1       29m

$ kubectl logs -n anonymous ml-pipeline-ui-artifact-bd978746-bcw82
Error from server (BadRequest): a container name must be specified for pod ml-pipeline-ui-artifact-bd978746-bcw82, choose one of: [ml-pipeline-ui-artifact istio-proxy] or one of the init containers: [istio-init]
$ kubectl logs -n anonymous pod/ml-pipeline-visualizationserver-865c7865bc-nw2vb
Error from server (BadRequest): a container name must be specified for pod ml-pipeline-visualizationserver-865c7865bc-nw2vb, choose one of: [ml-pipeline-visualizationserver istio-proxy] or one of the init containers: [istio-init]

Environment:

  • Kubeflow version: build version v1beta1
  • kfctl version: kfctl v1.1.0-0-g9a3621e
  • Kubernetes platform: Azure Kubernetes Service
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.13", GitCommit:"39a145ca3413079bcb9c80846488786fed5fe1cb", GitTreeState:"clean", BuildDate:"2020-07-15T16:18:19Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.13", GitCommit:"1da71a35d52fa82847fd61c3db20c4f95d283977", GitTreeState:"clean", BuildDate:"2020-07-15T21:59:26Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release): 18.04.5 LTS (Bionic Beaver)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 25 (5 by maintainers)

Most upvoted comments

Suspecting this to be a mTLS issue I disabled it on both ml-pipeline and ml-pipeline-ui destination rules as follows

$ kubectl edit destinationrule -n kubeflow ml-pipeline

Modify the tls.mode (the last line) from ISTIO_MUTUAL to DISABLE

do this for the ml-pipeline-ui destination rule as well

run $./istioctl authn tls-check <istio-ingressgateway-pod> -n istio-system |grep pipeline

to verify the client is HTTP for both ml-pipeline and ml-pipeline-ui authentication policies ( it was mTLS before the changes)

now accessing the pipelines UI works.

I assume, this is probably a configuration issue during install that needs to be fixed

Try this solution that is suggested above first might save you lot of time. Worked for me Thanks @danishsamad

Resolved the issue on kubeflow v1.2 also kfctl_k8s_istio.v1.2.0.yaml also. Deployment is on Azure AKS

kubectl edit destinationrule -n kubeflow ml-pipeline

Modify the tls.mode (the last line) from ISTIO_MUTUAL to DISABLE

kubectl edit destinationrule -n kubeflow ml-pipeline-ui

Modify the tls.mode (the last line) from ISTIO_MUTUAL to DISABLE

@danishsamad It could be some kind of race condition where your istio admission webhook wasn’t ready when you deployed the kubeflow components.

You can confirm this by checking if the istio-proxy container is missing from the ml-pipeline-ui:

kubectl get pods -l app=ml-pipeline-ui -n kubeflow | grep ml-pipeline-ui

# output should look like this, note the `2/2`
# ml-pipeline-ui-8695cc6b46-cvr4m                          2/2     Running   0          83m  

But if your istio is up and running, you almost certainly just need to recreate/restart all the pipeline stuff:

kubectl get deploy -n kubeflow -l app.kubernetes.io/name=kubeflow-pipelines -o name | \
xargs kubectl rollout restart -n kubeflow 

Hi all, the solution to this is simple, just make sure your Namespace/kubeflow has the right labels.

PLEASE NOTE: you will need to recreate all resources in Namespace/kubeflow, after making this change.

apiVersion: v1
kind: Namespace
metadata:
  name: kubeflow
  labels:
    control-plane: kubeflow
    istio-injection: enabled

I have a similar setup and facing the exact same issue. To debug I enabled access logs on my istio ingress gateway pod and I get this tls error when I access the pipelines page

[2020-09-03T14:34:03.746Z] "GET /pipeline/ HTTP/1.1" 503 UF,URX "-" "TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER" 0 91 76 - "10.244.1.34" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" "e913d047-57e2-40a8-b1b0-b5a74b523aa9" "localhost:8080" "10.244.1.29:3000" outbound|80||ml-pipeline-ui.kubeflow.svc.cluster.local - 127.0.0.1:80 127.0.0.1:53600

Also, when trying to upload a pipeline using a kale notebook I got the same “upstream connect error…”

thesuperzapper

It could be some kind of race condition where your istio admission webhook wasn’t ready when you deployed the kubeflow components. You can confirm this by checking if the istio-proxy container is missing from the ml-pipeline-ui:

You’re right. Thank you for the explanation. I spent a lot of time trying to use a fix (changing ISTIO_MUTUAL to DISABLE) but it works for a while then appear RBAC access denied error.

So, if you got “upstream connect error or disconnect/reset before headers” error check that ml-pipeline and ml-pipeline-ui pod got two running containers (ml-pipeline and istio-proxy).

Just want to mention that I deployed the istio-dex kubeflow version (using kfctl_istio_dex.v1.1.0.yaml) and the same problem appears, and the same hack fix it.

Hello there,

Well doing:

  • kubectl logs -n istio-system -l istio=ingressgateway I get:
[2020-09-04 21:19:59.381][86][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13, 
[2020-09-04 21:22:20.092][86][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13, 
[2020-09-04 21:28:15.435][86][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13,
  • kubectl logs -n kubeflow ml-pipeline-6bc56cd86d-l8866 I got:
I0904 20:20:44.250057       6 client_manager.go:134] Initializing client manager
I0904 20:20:44.250238       6 config.go:51] Config DBConfig.ExtraParams not specified, skipping
I0904 20:22:22.131397       6 client_manager.go:364] Successfully created bucket mlpipeline
I0904 20:22:22.133149       6 client_manager.go:164] Client manager initialized successfully
I0904 20:22:27.381888       6 main.go:210] All samples are loaded.
I0904 20:22:27.394416       6 resource_manager.go:889] Default experiment is set. ID is: 202251f4-52dd-4451-be05-63a82209cbac
I0904 20:22:27.394587       6 main.go:113] Starting Http Proxy
I0904 20:22:27.394676       6 main.go:84] Starting RPC server
  • kubectl logs -n kubeflow ml-pipeline-ui-8695cc6b46-v4t6h I got:
{
  argo: {
    archiveArtifactory: 'minio',
    archiveBucketName: 'mlpipeline',
    archiveLogs: false,
    archivePrefix: 'logs'
  },
  artifacts: 'Artifacts config contains credentials, so it is omitted',
  metadata: { envoyService: { host: '10.0.87.103', port: '9090' } },
  pipeline: { host: '10.0.58.15', port: '8888' },
  server: {
    apiVersionPrefix: 'apis/v1beta1',
    basePath: '/pipeline',
    deployment: 'KUBEFLOW',
    port: 3000,
    staticDir: '/client'
  },
  viewer: {
    tensorboard: { podTemplateSpec: [Object], tfImageName: 'tensorflow/tensorflow' }
  },
  visualizations: { allowCustomVisualizations: true },
  gkeMetadata: { disabled: false },
  auth: {
    enabled: true,
    kubeflowUserIdHeader: 'kubeflow-userid',
    kubeflowUserIdPrefix: ''
  }
}
[HPM] Proxy created: [Function]  ->  /artifacts/get
[HPM] Proxy created: /  ->  http://10.0.87.103:9090
[HPM] Proxy created: /  ->  http://127.0.0.1
[HPM] Subscribed to http-proxy events:  [ 'error', 'close' ]
[HPM] Proxy created: /  ->  http://127.0.0.1
[HPM] Subscribed to http-proxy events:  [ 'error', 'close' ]
[HPM] Proxy created: /  ->  http://10.0.58.15:8888
[HPM] Subscribed to http-proxy events:  [ 'proxyReq', 'error', 'close' ]
[HPM] Proxy created: /  ->  http://10.0.58.15:8888
[HPM] Subscribed to http-proxy events:  [ 'proxyReq', 'error', 'close' ]
Server listening at http://localhost:3000
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz

Anyway, I follow the @danishsamad’s advice and run:

  • kubectl edit destinationrule -n kubeflow ml-pipeline
  • kubectl edit destinationrule -n kubeflow ml-pipeline-ui

And edited the spec.trafficPolicy.tls.mode section, changing its value from ISTIO_MUTUAL to DISABLE. The I could visit the pipelines section.