linkerd2: proxy error:client requires absolute-form URIs
Bug Report
What is the issue?
~15 hours after upgrading our staging cluster from edge-20.7.5 to stable-2.9.0, we observed connectivity errors between multiple pods in the same namespace that normally communicate over k8s services. From the perspective of the client, all requests to http://server/foo
were returned instantly with a 502 error, while the linkerd-proxy log on both the client and server reported e.g:
[ 8109.623316s] WARN ThreadId(01) inbound:accept{peer.addr=10.4.2.144:50078 target.addr=10.4.2.205:3000}: linkerd2_app_core::errors: Failed to proxy request: client requires absolute-form URIs
Note that the pods were not restarted after the linkerd update, so while the control plane was running 2.9.0, the pods were still running the older version of the proxy.
Interestingly, if from the client pod I manually curl http://server.namespace/foo
rather than http://server/foo
the request succeeds; it’s only the un-qualified version of the URL that fails.
How can it be reproduced?
At least for us, the repro path is to update from the old edge release to the new stable release and then wait 12-24 hours; it’s happened twice so far. That said I do not know what level of traffic (if any) might be necessary to trigger it and there is no obvious reason why the issue starts when it does.
linkerd check
output
Pre-update:
$ linkerd check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust roots are using supported crypto algorithm
√ trust roots are within their validity period
√ trust roots are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust root
linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ tap api service is running
linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
is running version 2.7.1 but the latest stable version is 2.9.0
see https://linkerd.io/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 20.7.5 but the latest edge version is 20.11.5
see https://linkerd.io/checks/#l5d-version-control for hints
‼ control plane and cli versions match
control plane running edge-20.7.5 but cli running stable-2.7.1
see https://linkerd.io/checks/#l5d-version-control for hints
Status check results are √```
### Environment
- Kubernetes Version: 1.16.15-gke.4300
- Cluster Environment: GKE
- Host OS: Google Container-Optimized OS (cos)
- Linkerd version: edge-20.7.5; stable-2.9.0
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (11 by maintainers)
Commits related to this issue
- add test reproducing linkerd/linkerd2#5298 Signed-off-by: Eliza Weisman <eliza@buoyant.io> — committed to linkerd/linkerd2-proxy by hawkw 3 years ago
- add test reproducing linkerd/linkerd2#5298 Signed-off-by: Eliza Weisman <eliza@buoyant.io> — committed to linkerd/linkerd2-proxy by hawkw 3 years ago
- inbound: normalize URIs after downgrading to HTTP/1 (#881) Somewhere between stable-2.8.1 and stable-2.9.0, we introduced a bug when downgrading HTTP/1 traffic that originated in origin form. See l... — committed to linkerd/linkerd2-proxy by hawkw 3 years ago
- proxy: v2.124.1 This release addresses #5298 by backporting fixes to origin-form uri handling (from linkerd/linkerd2-proxy#2a645b7) to the release/v2.124.0 tag. This fix will be released as part of a... — committed to linkerd/linkerd2 by olix0r 3 years ago
- proxy: v2.124.1 (#5631) This release addresses #5298 by backporting fixes to origin-form uri handling (from linkerd/linkerd2-proxy#2a645b7) to the release/v2.124.0 tag. This fix will be released as... — committed to linkerd/linkerd2 by olix0r 3 years ago
- proxy: v2.124.2 This release addresses #5298 by backporting fixes to origin-form uri handling (from linkerd/linkerd2-proxy#2a645b7) to the release/v2.124.0 tag. This fix will be released as part of a... — committed to linkerd/linkerd2 by olix0r 3 years ago
- proxy: v2.124.2 (#5784) This release addresses #5298 by backporting fixes to origin-form uri handling (from linkerd/linkerd2-proxy@2a645b7) to the release/v2.124.0 tag. This fix will be released as... — committed to linkerd/linkerd2 by olix0r 3 years ago
So, it turns out that, while we did fix this in the proxy and backport the fix onto the 2.9.x version of the proxy, this fixed version isn’t what was actually tagged for release:
The proxy tag release/v2.124.0 is what was used for 2.9.0; and the release/v2.124.1 tag points to the same commit 😞
We’ll create a v2.124.2 tag on the proper commit (d1766c00) and confirm the fix for a stable-2.9.4.
Again, my apologies for the confusion.
stable-2.9.4 has been released and is available via
I’ve confirmed that the fix is included as follows:
Please let us know if you see anything unexpected with stable-2.9.4!
Hello, once again thank you for the release of linkerd 2.9.3. Just recently we tried to update our cluster again from version 2.8.1 to 2.9.3. However same as before we still noticed the same problem. Where in, services that are still running proxy 2.8.1 failed to establish http connection to services that are running with 2.9.3 proxy. This happened in our UAT EKS cluster with version k8s 1.18.
To further test the issue and ensuring that it was not due to underlaying cluster setup. I tried to replicate it in my local machine using k8s(1.16.6) on docker desktop, and sadly same behaviour.
Below are the steps to replicate.
linkerd install | kubectl apply -f -
)app1.default.svc.cluster.local
,app2.default.svc.cluster.local
,app3.default.svc.cluster.local
)linkerd upgrade | kubectl apply -f -
)kubectl rollout restart deployment/app{2,3}
)I understand the flow is similar to what @olix0r described. But I am not sure what i am missing here, is there other step that needs to be considered when doing the upgrade from 2.8.1 -> 2.9.3?
Linkerd proxy check
I have attached the logs of the
linkerd-proxy
of the 3 apps for reference.app1-trace.log app2-trace.log app3-trace.log
This is fixed in the latest edge release and we’ll be releasing a backported fix as stable-2.9.3 next week.
What’s the approach to avoid downtime/5xx error while upgrading from 2.8.1 to 2.9.1/2? Just ran into the same issue in our staging environment. Should data planes not be backward compatible by at least 1 version? Did anyone find a workaround?
By the way I can confirm that this is now working. Tried it locally and it worked (using linkerd 2.9.4). Thanks again
@adinhodovic Appreciate the offer. Given @olix0r’s repro (woot!) we’re probably in good shape but I’ll let him chime in if he wants more logs.