linkerd2: Canary A/B in nginx is not routing to the canary pods
Bug Report
What is the issue?
The canary a/b testing suite is set up using nginx/flagger and linkerd but when attempting to
see logs appearing in the canary pods or testing the results from curl as
curl -s -H 'X-Canary: always' -H 'Host: app.example.com' http://<ip address of load balancer>
the results are always from the primary pod
How can it be reproduced?
Set up the flagger install as https://docs.flagger.app/usage/linkerd-progressive-delivery#a-b-testing Then progress a new image This is using the helm install of nginx 0.26.1 and the helm install of linkerd 2.6-stable with the addition of these settings for nginx
controller:
podAnnotations:
linkerd.io/inject: enabled
config:
ssl-redirect: "false"
enable-opentracing: "true"
enable-vts-status: "false"
zipkin-collector-host: oc-collector.tracing
zipkin-sample-rate: "0.5"
Logs, error output, etc
Nginx appears to be attempting routes
127.0.0.1 - [127.0.0.1] - - [02/Dec/2019:08:26:20 +0000] "GET / HTTP/1.1" 200 397 "-" "curl/7.29.0" 128 0.003 [test-podinfo-9898] [test-podinfo-canary-9898] 10.60.1.151:9898 397 0.004 200 951a40952d962a04d9d193c3f6b1c432
The l5d headers appear to be in nginx
proxy_set_header l5d-dst-override $service_name.$namespace.svc.cluster.local:9898;
proxy_hide_header l5d-remote-ip;
proxy_hide_header l5d-server-id;
When putting linkerd proxy in nginx into debug mode it appears that the routing is to the primary pod
DBUG [ 504.958360s] podinfo.test.svc.cluster.local:9898 linkerd2_proxy::proxy::http::client client request: method=GET uri=http://app.example.com/ version=HTTP/2.0 headers={"x-b3-traceid": "35bca5baac8664ea", "x-b3-spanid": "03df8b068107c1cb", "x-b3-sampled": "1", "x-b3-parentspanid": "afb36db7fa7ddb4e", "x-b3-flags": "0", "host": "app.example.com", "x-request-id": "e7ec34ab7bb6dc2f36bb65031a70be62", "x-real-ip": "127.0.0.1", "x-forwarded-for": "127.0.0.1", "x-forwarded-host": "app.example.com", "x-forwarded-port": "80", "x-forwarded-proto": "http", "x-scheme": "http", "cache-control": "max-age=259200", "user-agent": "curl/7.29.0", "accept": "*/*", "x-canary": "always", "l5d-dst-canonical": "podinfo.test.svc.cluster.local:9898", "l5d-orig-proto": "HTTP/1.1"}
(If the output is long, please create a gist and paste the link here.)
linkerd check
output
centos7:[root@localhost kubernetes-cluster]$ linkerd check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match
Status check results are √
Environment
- Kubernetes Version: v1.14.8-gke.17
- Cluster Environment: (GKE, AKS, kops, …) GKE
- Host OS: Linux
- Linkerd version: 2.6-stable
Possible solution
Additional context
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (8 by maintainers)
This is not the case any longer in the latest edge.
@dhananjaya-senanayake that is correct. Check out the latest edge if you’d like to see it work (note you’ll want to remove the
l5d-dst-override
header).