pipelines: [Multi User] failed to call `kfp.Client().create_run_from_pipeline_func` in in-cluster juypter notebook

What steps did you take:

In a multi-user enabled env, I created a notebook server on user’s namespace, launch a notebook and try to call Python SDK from there. When I execute the code below:

pipeline = kfp.Client().create_run_from_pipeline_func(mnist_pipeline, arguments={}, namespace='mynamespace')

What happened:

The API call was rejected with the following errors:

~/.local/lib/python3.6/site-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    236 
    237         if not 200 <= r.status <= 299:
--> 238             raise ApiException(http_resp=r)
    239 
    240         return r

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Tue, 01 Sep 2020 00:58:39 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '8'})
HTTP response body: RBAC: access denied

What did you expect to happen:

A pipeline run should be created and executed

Environment:

How did you deploy Kubeflow Pipelines (KFP)?

I installed the KFP on IKS with multi-user support KFP version: v1.1.0 KFP SDK version: v1.0.0

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 32
Comments: 128 (71 by maintainers)

Commits related to this issue

downgrade to kubeflow 1.0.1 until https://github.com/kubeflow/pipelines/issues/4440 is resolved — committed to pachyderm/kfdata by lukemarsden 4 years ago
Revert "downgrade to kubeflow 1.0.1 until https://github.com/kubeflow/pipelines/issues/4440 is resolved" This reverts commit 49d2170f7b1a6a3424ed48fb59c91a3740d53742. — committed to pachyderm/kfdata by lukemarsden 4 years ago

Most upvoted comments

@Bobgy

I studied the envoy filter more and here is a better version:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: mynamespace
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: user@example.com
  workloadSelector:
    labels:
      notebook-name: mynotebook

It directly uses the custom request header feature that http_connection_manager provides. Because the header name/value are fixed, no need to use lua filter.

+26

yhwang on Sep 6, 2020

@Bobgy sure!

The RBAC to allow the notebook server in user’s namespace: “mynamespace” to access ml-pipeline service

apiVersion: rbac.istio.io/v1alpha1
kind: ServiceRoleBinding
metadata:
  name: bind-ml-pipeline-nb-mynamespace
  namespace: kubeflow
spec:
  roleRef:
    kind: ServiceRole
    name: ml-pipeline-services
  subjects:
  - properties:
      source.principal: cluster.local/ns/mynamespace/sa/default-editor

Envoy filter to inject the kubeflow-userid header from notebook to ml-pipeline service. In the example below, the notebook server’s name is mynotebook and userid for namespace: mynamespace is user@example.com

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: mynamespace
spec:
  workloadSelector:
    labels:
      notebook-name: mynotebook
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        portNumber: 8888
        filterChain:
          filter:
            name: "envoy.http_connection_manager"
            subFilter:
              name: "envoy.router"
    patch:
      operation: INSERT_BEFORE
      value: # lua filter specification
       name: envoy.lua
       config:
         inlineCode: |
           function envoy_on_request(request_handle)
             request_handle:headers():add("kubeflow-userid", "user@example.com")
           end

The envoy filter above only inject the kubeflow-userid HTTP header for those traffic going to ml-pipelie service

+20

yhwang on Sep 6, 2020

@ErikEngerd The below AuthorizationPolicy and EnvoyFilter works for me with k8s 1.18.19, kubeflow 1.3.0 and istio 1.9.5.

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: bind-ml-pipeline-nb-kubeflow-user-example-com
 namespace: kubeflow
spec:
 selector:
   matchLabels:
     app: ml-pipeline
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/kubeflow-user-example-com/sa/default-editor"]
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: kubeflow-user-example-com
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: user@example.com
  workloadSelector:
    labels:
      notebook-name: test-jupyterlab

Function to create and run pipeline.

kfp.Client().create_run_from_pipeline_func(calc_pipeline, arguments=arguments, namespace='kubeflow-user-example-com')

+15

KannanThiru on Jul 1, 2021

@Bobgy I thought about this issue again and I think this is where istio envoy filter could be used without any application change. I added an envoy filter to add kubeflow-userid header for those HTTP traffics going to ml-pipeline.kubeflow.svc.cluster.local:8888. Then it works. So you actually don’t need to do the authentication trick for in-cluster use case. It’s weird to me to perform authentication for in-cluster scenario. The kubeflow-userid I injected in the http header is the namespace owner’s userid. I think it totally make sense.

In conclusion, I added two config objects to make it work:

add a servicerolebinding to allow notebook server to access ml-pipeline-service
add envoy filter to inject kubeflow-userid header for ml-pipeline-api-server to validate the incoming request.

If these two configs could be created with the notebook server, then it will be perfect!

yhwang on Sep 5, 2020

After lib update it works fine, exact code that works for me is the variant without anything really special:

client = kfp.Client()
print(client.list_experiments())

So all in all for me it was:

applying PodDefault change to cluster
creating a new jupiter server with config by selecting the “config” checkbox on UI (both operations are described here: https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/)
updating kfp library via jupiter notebook

many thanks to @grapefruitL

rustam-ashurov-mcx on May 24, 2022

@arshashi The envoyfilter should be added to user’s namespace. You may check your notebook server pod and see if its label does have the notebook-name: mynotebook. That will make sure the envoyfilter would apply to the notebook server in user’s namespace. Also check the user’s namespace, i.e.

kubectl get ns brainyapps -o yaml

and make sure the owner is brainyapps@example.com. for example:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    owner: brainyapps@example.com
    ......
    .....

In your case, there is no kubeflow-userid is injected. I guess the notebook-name: mynotebook is wrong. that’s my guess.

edit oh another possibility is that in your kubeflow, the identity header name is not kubeflow-userid. You may double check your kubeflow config.

yhwang on Sep 8, 2020

import os
with open(os.environ['KF_PIPELINES_SA_TOKEN_PATH'], "r") as f:
    TOKEN = f.read()

import kfp
client = kfp.Client(
    host='http://ml-pipeline.kubeflow.svc.cluster.local:8888',
    # host='http://ml-pipeline-ui.kubeflow.svc.cluster.local:80', # <--- Does not work as later causes HTTP response body: RBAC: access denied
    # existing_token=TOKEN. # Not required
)

print(client.list_pipelines())

Result:

{'next_page_token': None,
 'pipelines': [{'created_at': datetime.datetime(2022, 5, 22, 2, 5, 33, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2022, 5, 22, 2, 5, 33, tzinfo=tzlocal()),
                                    'id': 'b693a0d3-b11c-4c5b-b3f9-6158382948d6',
                                    'name': '[Demo] XGBoost - Iterative model '
                                            'training',
                                    'package_url': None,
                                    'parameters': None,
                                    'resource_references': [{'key': {'id': 'b693a0d3-b11c-4c5b-b3f9-6158382948d6',
                                                                     'type': 'PIPELINE'},
                                                             'name': None,
                                                             'relationship': 'OWNER'}]},
                'description': '[source '
                               'code](https://github.com/kubeflow/pipelines/blob/c8a18bde299f2fdf5f72144f15887915b8d11520/samples/core/train_until_good/train_until_good.py) '
                               'This sample demonstrates iterative training '
                               'using a train-eval-check recursive loop. The '
                               'main pipeline trains the initial model and '
                               'then gradually trains the model some more '
                               'until the model evaluation metrics are good '
                               'enough.',
                'error': None,
                'id': 'b693a0d3-b11c-4c5b-b3f9-6158382948d6',
                'name': '[Demo] XGBoost - Iterative model training',
                'parameters': None,
                'resource_references': None,
                'url': None},
               {'created_at': datetime.datetime(2022, 5, 22, 2, 5, 34, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2022, 5, 22, 2, 5, 34, tzinfo=tzlocal()),
                                    'id': 'c65b4f2e-362d-41a8-8f5c-9b944830029e',
                                    'name': '[Demo] TFX - Taxi tip prediction '
                                            'model trainer',
                                    'package_url': None,
                                    'parameters': [{'name': 'pipeline-root',
                                                    'value': 'gs://{{kfp-default-bucket}}/tfx_taxi_simple/{{workflow.uid}}'},
                                                   {'name': 'module-file',
                                                    'value': '/opt/conda/lib/python3.7/site-packages/tfx/examples/chicago_taxi_pipeline/taxi_utils_native_keras.py'},
                                                   {'name': 'push_destination',
                                                    'value': '{"filesystem": '
                                                             '{"base_directory": '
                                                             '"gs://your-bucket/serving_model/tfx_taxi_simple"}}'}],
                                    'resource_references': [{'key': {'id': 'c65b4f2e-362d-41a8-8f5c-9b944830029e',
                                                                     'type': 'PIPELINE'},
                                                             'name': None,
                                                             'relationship': 'OWNER'}]},
                'description': '[source '
                               'code](https://github.com/kubeflow/pipelines/tree/c8a18bde299f2fdf5f72144f15887915b8d11520/samples/core/parameterized_tfx_oss) '
                               '[GCP Permission '
                               'requirements](https://github.com/kubeflow/pipelines/blob/c8a18bde299f2fdf5f72144f15887915b8d11520/samples/core/parameterized_tfx_oss#permission). '
                               'Example pipeline that does classification with '
                               'model analysis based on a public tax cab '
                               'dataset.',
                'error': None,
                'id': 'c65b4f2e-362d-41a8-8f5c-9b944830029e',
                'name': '[Demo] TFX - Taxi tip prediction model trainer',
                'parameters': [{'name': 'pipeline-root',
                                'value': 'gs://{{kfp-default-bucket}}/tfx_taxi_simple/{{workflow.uid}}'},
                               {'name': 'module-file',
                                'value': '/opt/conda/lib/python3.7/site-packages/tfx/examples/chicago_taxi_pipeline/taxi_utils_native_keras.py'},
                               {'name': 'push_destination',
                                'value': '{"filesystem": {"base_directory": '
                                         '"gs://your-bucket/serving_model/tfx_taxi_simple"}}'}],
                'resource_references': None,
                'url': None},
               {'created_at': datetime.datetime(2022, 5, 22, 2, 5, 35, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2022, 5, 22, 2, 5, 35, tzinfo=tzlocal()),
                                    'id': '56bb7063-ade0-4074-9721-b063f42c46fd',
                                    'name': '[Tutorial] Data passing in python '
                                            'components',
                                    'package_url': None,
                                    'parameters': None,
                                    'resource_references': [{'key': {'id': '56bb7063-ade0-4074-9721-b063f42c46fd',
                                                                     'type': 'PIPELINE'},
                                                             'name': None,
                                                             'relationship': 'OWNER'}]},
                'description': '[source '
                               'code](https://github.com/kubeflow/pipelines/tree/c8a18bde299f2fdf5f72144f15887915b8d11520/samples/tutorials/Data%20passing%20in%20python%20components) '
                               'Shows how to pass data between python '
                               'components.',
                'error': None,
                'id': '56bb7063-ade0-4074-9721-b063f42c46fd',
                'name': '[Tutorial] Data passing in python components',
                'parameters': None,
                'resource_references': None,
                'url': None},
               {'created_at': datetime.datetime(2022, 5, 22, 2, 5, 36, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2022, 5, 22, 2, 5, 36, tzinfo=tzlocal()),
                                    'id': '36b09aa0-a317-4ad4-a0ed-ddf55a485eb0',
                                    'name': '[Tutorial] DSL - Control '
                                            'structures',
                                    'package_url': None,
                                    'parameters': None,
                                    'resource_references': [{'key': {'id': '36b09aa0-a317-4ad4-a0ed-ddf55a485eb0',
                                                                     'type': 'PIPELINE'},
                                                             'name': None,
                                                             'relationship': 'OWNER'}]},
                'description': '[source '
                               'code](https://github.com/kubeflow/pipelines/tree/c8a18bde299f2fdf5f72144f15887915b8d11520/samples/tutorials/DSL%20-%20Control%20structures) '
                               'Shows how to use conditional execution and '
                               'exit handlers. This pipeline will randomly '
                               'fail to demonstrate that the exit handler gets '
                               'executed even in case of failure.',
                'error': None,
                'id': '36b09aa0-a317-4ad4-a0ed-ddf55a485eb0',
                'name': '[Tutorial] DSL - Control structures',
                'parameters': None,
                'resource_references': None,
                'url': None},
               {'created_at': datetime.datetime(2022, 5, 24, 6, 46, 45, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2022, 5, 24, 6, 46, 45, tzinfo=tzlocal()),
                                    'id': 'da2bc8b4-27f2-4aa3-befb-c53487d9db49',
                                    'name': 'test',
                                    'package_url': None,
                                    'parameters': [{'name': 'a', 'value': '1'},
                                                   {'name': 'b', 'value': '7'}],
                                    'resource_references': [{'key': {'id': 'da2bc8b4-27f2-4aa3-befb-c53487d9db49',
                                                                     'type': 'PIPELINE'},
                                                             'name': None,
                                                             'relationship': 'OWNER'}]},
                'description': 'test',
                'error': None,
                'id': 'da2bc8b4-27f2-4aa3-befb-c53487d9db49',
                'name': 'test',
                'parameters': [{'name': 'a', 'value': '1'},
                               {'name': 'b', 'value': '7'}],
                'resource_references': None,
                'url': None}],
 'total_size': 5}

oonisim on May 24, 2022

Following the instructions access Kubeflow Pipelines from inside your cluster solved my access issue.

Create a PodDefault in the namespace you want to access the pipeline
Create a new notebook server
- Selected Allow access to Kubeflow Pipelines under Configurations

PodDefault from the docs

apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: access-ml-pipeline
  namespace: "<YOUR_USER_PROFILE_NAMESPACE>"
spec:
  desc: Allow access to Kubeflow Pipelines
  selector:
    matchLabels:
      access-ml-pipeline: "true"
  volumes:
    - name: volume-kf-pipeline-token
      projected:
        sources:
          - serviceAccountToken:
              path: token
              expirationSeconds: 7200
              audience: pipelines.kubeflow.org      
  volumeMounts:
    - mountPath: /var/run/secrets/kubeflow/pipelines
      name: volume-kf-pipeline-token
      readOnly: true
  env:
    - name: KF_PIPELINES_SA_TOKEN_PATH
      value: /var/run/secrets/kubeflow/pipelines/token

dewnull on Apr 3, 2022

Our official feature to support this use-case is https://github.com/kubeflow/pipelines/issues/5138. Because the PR has been merged and released: https://github.com/kubeflow/pipelines/pull/5676. Let’s keep tracking the task of finishing documentation in #5138, and close this issue.

Bobgy on Jul 30, 2021

@kosehy my guess is you need to specify the namespace since it complains about empty namespace

@yhwang You are right. I fixed above error based on https://github.com/kubeflow-kale/kale/issues/210#issuecomment-727018461 this comment. Thank you for your reply!

kosehy on Jan 13, 2021

Well, I think now it’s working correctly:

jovyan@kale-0:~$ kfp pipeline list
+--------------------------------------+-------------------------------------------------+---------------------------+
| Pipeline ID                          | Name                                            | Uploaded at               |
+======================================+=================================================+===========================+
| 271f4189-1bd3-425a-8b59-213f4a6502b2 | [Tutorial] DSL - Control structures             | 2020-10-26T11:58:27+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+
| e8989196-9105-41b2-b302-fe7b2a1f92cc | [Tutorial] Data passing in python components    | 2020-10-26T11:58:26+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+
| 3fb1b41c-dcca-4c12-88cf-cdb602c5c665 | [Demo] TFX - Iris classification pipeline       | 2020-10-26T11:58:25+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+
| ff55dd05-deb3-40c9-87c3-a8a06871b801 | [Demo] TFX - Taxi tip prediction model trainer  | 2020-10-26T11:58:24+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+
| 99666886-380b-488b-bd84-d3be3d12b2d8 | [Demo] XGBoost - Training with confusion matrix | 2020-10-26T11:58:23+00:00 |
+--------------------------------------+-------------------------------------------------+---------------------------+

@yhwang thank you. I think the error above is related to Kale already. So, I’ll try to fix it on Kale side.

mr-yaky on Nov 13, 2020

@mr-yaky in your ServiceRoleBinding you should change source.principal: cluster.local/ns/anonymous/default-editor to source.principal: cluster.local/ns/anonymous/sa/default-editor

Can you try it?

yhwang on Nov 12, 2020

thanks @Bobgy! Got it working now.

Also note the header value has to be formatted like accounts.google.com:<email>

jonasdebeukelaer on Oct 22, 2020

@jonasdebeukelaer are you on GCP? The header should be ‘x-goog-authenticated-user-email’

Bobgy on Oct 21, 2020

With above design, KFP API Server doesn’t need to know. It uses the service account as requester identity. So we won’t need SDK method to add kubeflow userid.

So the following scenario would fail and we should document it as limitation

User settings:

User A and he owns namespace A

User B and he owns namespace B

User A invites User B as collaborator, so User B can also access namespace A

User B can access run/experiment in namespace A and B. But User A only can access namespace A.

Scenario: When User B runs the code kfp.Client.create_run_from_pipeline_func() and specifies namespace=B on the notebook server that User A creates in namespace A, I guess User B would expects he can access the resource in his own namespace. But based on the design, he can’t becaue the XFCC is cluster.local/ns/A/sa/default-editor and KFP API Server only allow this identity to access resource in namespace A.

Another scenario (I guess this is more of what you want to support) is that, we have the same users A and B and namespaces A and B and namespace A is shared to User B. User B uses a notebook server in namespace B and tries to kfp.Client.create_run_from_pipeline_func() for namespace=A. The notebook server in namespace B (despite User B being the owner and namespace A is shared to User B) won’t have access to pipelines in namespace A.

To fix the permission issue, user A should also invite namespace B’s default-editor service account as collaborator to namespace A. So conceptually, in this security model, a service account in the cluster has its own identity different from the user, and access are managed by the identity sending the request. Arguably, Kubernetes has the same auth model, if you run a notebook in cluster, you cannot use your own account’s permissions, the notebook has its service account’s permissions.

Bobgy on Oct 14, 2020

Coming a little late, but let me explain my current thoughts.

First, we don’t want to ask users to configure istio for RBAC access, istio configs are really brittle and requires a lot of knowledge to use and debug. Therefore, my ideal setup is like:

KFP api server accepts all traffic
KFP api server reads header X-Forwarded-Client-Cert injected by istio sidecar: https://stackoverflow.com/a/58099997 X-Forwarded-Client-Cert should contain auth information like spiffe://cluster.local/ns/<namespace>/sa/<service account>.

3.1 If a request comes from istio gateway (probably we can configure this), then KFP api server interprets it as coming from users reading the special header like kubeflow-userid

3.2 If a request comes from other istio mTLS enabled sources, we know which service account initiated the request.

3.3 If a request doesn’t have X-Forwarded-Client-Cert, it’s not authed by istio, we may develop other ways for auth like providing service account token with pipeline audience

When API server knows requester identity, KFP api server can use SubjectAccessReview to test if the corresponding user/service account can access a certain resource representing KFP using Kubernetes RBAC.

With this setup, access to all KFP resources are backed by Kubernetes RBAC. If users have notebook servers in-cluster with istio sidecar (mTLS enabled), they only need to grant K8s RBAC permissions to those servers’ service accounts.

And we should provide an option to disable all of the authz checks, so if it’s not useful for an org, they can just disable it.

What are your thoughts? @yanniszark @yhwang

Bobgy on Oct 13, 2020

The following config is (finally) working for me. Note that my use case isn’t for notebooks, but rather an in-cluster kfp client, so switching the listenerType from GATEWAY to SIDECAR_INBOUND was necessary so that you got the header added on in-cluster traffic as well.

# Create anonymous namespace (profile) in Kubeflow without having to click
# a button on a web page
curl -XPOST http://localhost:31380/api/workgroup/create

# disable Istio RBAC to workaround
# https://github.com/kubeflow/pipelines/issues/4440#issuecomment-697920377
kubectl apply -f - <<EOF
apiVersion: rbac.istio.io/v1alpha1
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: "OFF"
EOF

# tell kfp that user=anonymous@kubeflow.org even for in-cluster clients
# like pachyderm (listenerType=SIDECAR_INBOUND, not GATEWAY)
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-user-everywhere
  namespace: istio-system
spec:
  filters:
    - filterConfig:
        inlineCode: |
          function envoy_on_request(request_handle)
              request_handle:headers():replace("kubeflow-userid","anonymous@kubeflow.org")
          end
      filterName: envoy.lua
      filterType: HTTP
      insertPosition:
        index: FIRST
      listenerMatch:
        listenerType: SIDECAR_INBOUND
EOF

# Stop header being added multiple times
kubectl delete envoyfilter -n istio-system add-user-filter

lukemarsden on Sep 29, 2020

Hi all! I will try to answer some of the questions that came up in this issue:

I guess it should be k8s RBAC (because of SubjectAccessReview). How about using istio RBAC instead? Because the goal is to protect the pipeline server API endpoints and actually istio can do that by setting up proper istio RBAC for those endpoints. KFAM just needs to maintain correct istio RBAC objects. It ties to istio though.

@IronPan @gaoning777 @yanniszark do you have context on this?

Istio RBAC is deprecated so I’m going to talk about Istio AuthorizationPolicy. Istio AuthorizationPolicy is a useful tool, but besides the obvious disadvantages (tied to Istio, harder to use, etc.), it doesn’t have the flexibility that the Pipelines model requires right now. Consider that in the current code:

In experiments, the namespace is found from a protobuf-encoded filter. How will we decode this filter in Istio AuthorizationPolicy? We can’t.
In runs, the namespace is found from the owning experiment (stateful authorization). How will Istio AuthorizationPolicy get the owning experiment’s namespace? It can’t.

On the contrary, we use Kubernetes RBAC as an authorization database and perform whatever complex logic we want in the API-Server. And as an authorization database, Kubernetes RBAC makes much more sense. Does this answer your question @yhwang? Please tell me if something is not clear! cc @Bobgy

Please also take a look at the Kubeflow-wide guideline for authorization, which prescribes RBAC and SubjectAccessReview: https://github.com/kubeflow/community/blob/master/guidelines/auth.md

I added an envoy filter to add kubeflow-userid header for those HTTP traffics going to ml-pipeline.kubeflow.svc.cluster.local:8888. Envoy filter to inject the kubeflow-userid header from notebook to ml-pipeline service. In the example below, the notebook server’s name is mynotebook and userid for namespace: mynamespace is user@example.com

Thanks @yhwang! The solution sounds secure and reasonable! I guess the only concern I have, is that other users sharing the namespace can act as namespace owner’s permissions.

I want to make it clear that the config I see outlined here is NOT secure. The sidecar can impersonate ANY identity. The correct way to enable programmatic access is to:

Use audience-bound ServiceAccountTokens for calling the KFP API.
- This needs changes in the Pipelines API-Server to do TokenReview. We have implemented this for our enterprise installations and will be pushing it upstream.
Use Istio mTLS.

So if we can make up a service account that represents this namespace and only grant permission to access the same namespace, that will be ideal, but I believe that is totally possible, at least for GCP.

@Bobgy but we don’t need to have a notion of a ServiceAccount that “represents” a namespace. All ServiceAccount identities will be able to prove themselves to the Pipelines API Server with the design outlined above.

yanniszark on Sep 23, 2020

@yhwang My notebook server name was incorrect and now it works for me with the above two changes. Thanks alot for your time and suggestion, I was stuck with this issue for long time.

arshashi on Sep 8, 2020

Since the notebook server uses serviceaccount: default-editor in my user’s namespace, I can fixed the RBAC issue by adding a servicerolebinding to allow the serviceaccount to access ml-pipeline-service. However, the request is still rejected by the ml-pipelie-api-server:

~/.local/lib/python3.6/site-packages/kfp_server_api/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    236 
    237         if not 200 <= r.status <= 299:
--> 238             raise ApiException(http_resp=r)
    239 
    240         return r

ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Tue, 01 Sep 2020 01:07:38 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","message":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Request header error: there is no user identity header.","error_details":"Failed to authorize the request.: Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header."}]}

yhwang on Sep 1, 2020

@rustam-ashurov-mcx try update kfp to 1.8.1: pip install kfp --upgrade

grapefruitL on May 24, 2022

ServiceRoleBinding

Trying this out now on kubernetes 1.21.0 with kubeflow 1.3.0 and istio 1.9.0 (docker.io/istio/pilot:1.9.0).

Unfortunately, it seems that istio no longer supports the rbac.istio.io/v1alpha1 and has replaced the ServerRoleBinding object by something else ( https://istio.io/latest/blog/2019/v1beta1-authorization-policy/). The instructions for migrating from v1alpha1 to v1beta1 are a bit complex. Do you have an example of an equivalent yaml file for an authorization policy that replaces the ServerRoleBinding?

ErikEngerd on Jun 29, 2021

For people tracking this issue, the correct solution will come from issue: https://github.com/kubeflow/pipelines/issues/5138

yanniszark on Mar 10, 2021

@kosehy my guess is you need to specify the namespace since it complains about empty namespace

yhwang on Jan 13, 2021

@pedrocwb I have a similar question about 1.1.6 vs 1.3.1. For me the more relevant case is being able to authenticate from outside the cluster. Currently this requires passing the Cognito cookies. I have managed to get this to work with 1.1.6, but it actually looks like this currently doesn’t work with 1.3.1.

Even though I pass the correct cookies, I still get the Request header error: there is no user identity header error. I am going to spend some more time on this today and will give you feedback if I know a bit more.

For our clients the two most important KF components are KFP and KFServing. At the moment we can use KFP with 1.1.6, but not 1.3.1. And only a very old version of KFServing seems to be compatible with 1.1.6.

karlschriek on Dec 9, 2020

I got the same problem that @karlschriek. In a further investigation, I discovered that KF v1.1 and above is using a very outdated istio version (1.1.6) so the EnvoyFilter @yhwang provided is not compatible with this version.

I tried to port the filter to be compatible with version 1.1.6 but it still doesn’t work.

kind: EnvoyFilter
metadata:
  name: add-header
  namespace: __namepace__
spec:
  filters:
  - listenerMatch:
      listenerType: SIDECAR_OUTBOUND
      listenerProtocol: HTTP
      address:
        - ml-pipeline.kubeflow.svc.cluster.local
      portNumber: 8888
    filterName: envoy.lua
    filterType: HTTP
    filterConfig:
      inlineCode: |
        function envoy_on_request(request_handle)
          request_handle:headers():add("kubeflow-userid", "anonymous@kubeflow.org)
        end
  workloadLabels:
      notebook-name: __notebook__

Error:

Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Wed, 02 Dec 2020 00:26:05 GMT', 'x-envoy-upstream-service-time': '2', 'server': 'envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","message":"Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header.","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Request header error: there is no user identity header.","error_details":"Failed to authorize with API resource references: Bad request.: BadRequestError: Request header error: there is no user identity header.: Request header error: there is no user identity header."}]}

Does anyone know why KF is using such outdated Istio version?
How can I debug this filter to verify it is actually intercepting the request?

EDIT: Very important to mention I’m on AWS.

pedrocwb on Dec 2, 2020

This is what I used:

EDIT:

Fixed after @DavidSpek’s comment below

export NAMESPACE=mynamespace
export NOTEBOOK=mynotebook
export USER=me@myemail.com

cat >  ./envoy_filter.yaml << EOM
apiVersion: rbac.istio.io/v1alpha1
kind: ServiceRoleBinding
metadata:
  name: bind-ml-pipeline-nb-${NAMESPACE}
  namespace: kubeflow
spec:
  roleRef:
    kind: ServiceRole
    name: ml-pipeline-services
  subjects:
  - properties:
      source.principal: cluster.local/ns/${NAMESPACE}/sa/default-editor
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: add-header
  namespace: ${NAMESPACE}
spec:
  configPatches:
  - applyTo: VIRTUAL_HOST
    match:
      context: SIDECAR_OUTBOUND
      routeConfiguration:
        vhost:
          name: ml-pipeline.kubeflow.svc.cluster.local:8888
          route:
            name: default
    patch:
      operation: MERGE
      value:
        request_headers_to_add:
        - append: true
          header:
            key: kubeflow-userid
            value: ${USER}
  workloadSelector:
    labels:
      notebook-name: ${NOTEBOOK}
EOM

karlschriek on Nov 25, 2020

I would ping the appropriate WG that owns the config. I currently don’t have the bandwidth to work on this

swiftdiaries on Nov 11, 2020

That’s right, and I’d rather consider that as expected behavior. The scenario is based on the assumption, user A/B didn’t authenticate as themselves when using the notebook server, therefore, they should only be able to access what the notebook server’s service account can access.

User B will still have the choice to use KFP SDK to connect to cluster public endpoint and use his user credentials for authentication. In that case, User B will have access to namespace B, but the notebook server are shared between user A and B, so user B needs to be aware that his credentials may be used by anyone else having access to namespace A (which should be avoided).

Bobgy on Oct 14, 2020

@Bobgy yes, what you describe is pretty much how I planned to use Istio mTLS for authentication (via the XFCC header). As for SubjectAccessReview, we plan on delivering it after Kubecon is over. We have also refactored the authentication code a bit in order to support multiple auth methods (we call them authenticators). We’ll be pushing that upstream as well, after SubjectAccessReview. Does this sound good?

yanniszark on Oct 13, 2020

@Bobgy

So if we can make up a service account that represents this namespace and only grant permission to access the same namespace

In the RBAC config I posted above, the notebook would be cluster.local/ns/mynamespace/sa/default-editor. So ml-pipeline-apiserver could use that and only allow that serviceaccount to access specific resources, for example: pipelines created by mynamespace’s owner

yhwang on Sep 10, 2020

Thanks @yhwang! The solution sounds secure and reasonable!

I guess the only concern I have, is that other users sharing the namespace can act as namespace owner’s permissions.

So if we can make up a service account that represents this namespace and only grant permission to access the same namespace, that will be ideal, but I believe that is totally possible, at least for GCP.

/cc @yanniszark @IronPan what do you think about this approach to grant cluster workload access to KFP api?

Bobgy on Sep 10, 2020

current suggested workaround is to always authenticate from the public endpoint using user credentials

How does this work for non-GCP clusters? I saw this issue where it was stated that auth is only possible using GCP IAP, and that someone using AWS should use the kfp client from within the cluster.

meowcakes on Sep 2, 2020