linkerd2: Canary A/B in nginx is not routing to the canary pods

Bug Report

What is the issue?

The canary a/b testing suite is set up using nginx/flagger and linkerd but when attempting to see logs appearing in the canary pods or testing the results from curl as curl -s -H 'X-Canary: always' -H 'Host: app.example.com' http://<ip address of load balancer> the results are always from the primary pod

How can it be reproduced?

Set up the flagger install as https://docs.flagger.app/usage/linkerd-progressive-delivery#a-b-testing Then progress a new image This is using the helm install of nginx 0.26.1 and the helm install of linkerd 2.6-stable with the addition of these settings for nginx

controller:
  podAnnotations:
    linkerd.io/inject: enabled
  config:
    ssl-redirect: "false"
    enable-opentracing: "true"
    enable-vts-status: "false"
    zipkin-collector-host: oc-collector.tracing
    zipkin-sample-rate: "0.5"

Logs, error output, etc

Nginx appears to be attempting routes

127.0.0.1 - [127.0.0.1] - - [02/Dec/2019:08:26:20 +0000] "GET / HTTP/1.1" 200 397 "-" "curl/7.29.0" 128 0.003 [test-podinfo-9898] [test-podinfo-canary-9898] 10.60.1.151:9898 397 0.004 200 951a40952d962a04d9d193c3f6b1c432

The l5d headers appear to be in nginx

           proxy_set_header l5d-dst-override $service_name.$namespace.svc.cluster.local:9898;
            proxy_hide_header l5d-remote-ip;
            proxy_hide_header l5d-server-id;

When putting linkerd proxy in nginx into debug mode it appears that the routing is to the primary pod

DBUG [   504.958360s] podinfo.test.svc.cluster.local:9898 linkerd2_proxy::proxy::http::client client request: method=GET uri=http://app.example.com/ version=HTTP/2.0 headers={"x-b3-traceid": "35bca5baac8664ea", "x-b3-spanid": "03df8b068107c1cb", "x-b3-sampled": "1", "x-b3-parentspanid": "afb36db7fa7ddb4e", "x-b3-flags": "0", "host": "app.example.com", "x-request-id": "e7ec34ab7bb6dc2f36bb65031a70be62", "x-real-ip": "127.0.0.1", "x-forwarded-for": "127.0.0.1", "x-forwarded-host": "app.example.com", "x-forwarded-port": "80", "x-forwarded-proto": "http", "x-scheme": "http", "cache-control": "max-age=259200", "user-agent": "curl/7.29.0", "accept": "*/*", "x-canary": "always", "l5d-dst-canonical": "podinfo.test.svc.cluster.local:9898", "l5d-orig-proto": "HTTP/1.1"}

(If the output is long, please create a gist and paste the link here.)

linkerd check output

centos7:[root@localhost kubernetes-cluster]$ linkerd check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: v1.14.8-gke.17
  • Cluster Environment: (GKE, AKS, kops, …) GKE
  • Host OS: Linux
  • Linkerd version: 2.6-stable

Possible solution

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (8 by maintainers)

Most upvoted comments

Removing l5d-dst-override header makes the traffic from nginx ingress controller to pods removed from the service mesh. (No mTLS).

This is not the case any longer in the latest edge.

@dhananjaya-senanayake that is correct. Check out the latest edge if you’d like to see it work (note you’ll want to remove the l5d-dst-override header).