kubernetes: AWS ELB LoadBalancer not being updated in v1.14.7
What happened: When upgrading from Kubernetes 1.14.6 to -> 1.14.7 we started seeing following error:
kube-controller-manager error E0920 07:38:34.944393 1 service_controller.go:663] External error while updating load balancer: error listing AWS instances: "InvalidParameterValue: The filter 'null' is invalid\n\tstatus code: 400, request id: redacted".
And saw that the load balancer did not have any new nodes added to it.
What you expected to happen:
Kube-controller-manager should have been able to add nodes to lb pool.
How to reproduce it (as minimally and precisely as possible): Upgrade from 1.14.6 to 1.14.7
Anything else we need to know?:
Environment: Kops in AWS, classic load balancer
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-04T04:48:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.7", GitCommit:"8fca2ec50a6133511b771a11559e24191b1aa2b4", GitTreeState:"clean", BuildDate:"2019-09-18T14:39:02Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
-
Cloud provider or hardware configuration: AWS, Kops
-
OS (e.g:
cat /etc/os-release): Debian 9.9 -
Kernel (e.g.
uname -a): Linux ip-172-20-67-122 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64 GNU/Linux -
Install tools:
-
Network plugin and version (if this is a network-related bug):
Flannel, kube-dns
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 10
- Comments: 44 (30 by maintainers)
Commits related to this issue
- Rollback channels 1.14.7 https://github.com/kubernetes/kubernetes/issues/82923 — committed to mikesplain/kops by mikesplain 5 years ago
- Add warning about v1.14.7 on AWS issue #82923 Signed-off-by: Cryptophobia <aouzounov@gmail.com> — committed to Cryptophobia/kubernetes by Cryptophobia 5 years ago
any ETA for this? for the 1.14.8?
This is a fairly colossal screw-up, taking 2 weeks to patch it makes us as a community and Kubernetes as a piece of software look bad in my opinion.
At the very least a nice fat warning in the release notes, better yet just fixing it ASAP would make sense.
I agree that a “subset” of users is fair and cannot always call for a Kubernetes release however this is a critical basic functionality of Kubernetes on AWS which basically makes 1.14.7 unusable for AWS users who expect ELB functionality. That said, I’d hate to suggest cutting an expedited release which always has risks (and is rarely feasible).
I’m most concerned about users unaware of this bug and ways we can prevent them from experiencing the pain (besides the obvious long term plan to move to external cloud provider, which I’m so pumped for), it appears many others have based on these issues and a number of comments in slack.
Would it be amenable to add a note on the 1.14 release doc for 1.14.7 stating this as a known issue? It’s my understanding 1.14.6 is still fine for users who expect this elb functionality until 1.14.8 is cut?
Currently Kops alpha channel is set on 1.14.7, which I’ll discuss with the team whether we should roll that back to 1.14.6.
It’s 17 days later and there still isn’t an update in the Changelog to warn against upgrading to 1.14.7. This is really poor. In my opinion 1.14.7 should have been pulled until that warning was in place. My clusters are affected because I’m already using this feature and it’s wasted debugging time for me. There are going to be users where this release makes it into production because they’re not using this feature and then they’ll run into issue when they do.
Hmmm, “small subset of users”? Doesn’t this bug impact all users on AWS running k8s and provisioning loadbalancers?
https://github.com/kubernetes/kubernetes/pull/82954#issuecomment-534093933
Folks, we should only be prioritizing patch release for bugs or security issues that impact most (if not all) Kubernetes users. I understand this is an awful bug to run into, but we can’t fast track patch release for cloud provider specific bugs that only impact a subset of users.
All good in 1.14.8. Upgraded today from 1.14.6
Hi folks, Ran into the same issue, upgrading from
1.14.6to1.14.7Our e2e tests failed so we did not rolled it out today. @dims sounds like we want to have e2e tests with aws, do we actually test in other environments than gke?
@zhan849 I don’t think so because the subsequent bug fix (https://github.com/kubernetes/kubernetes/pull/78498) to https://github.com/kubernetes/kubernetes/pull/76749 went into v1.15, but not v1.14.
@Cryptophobia there was another PR open on this two days ago, sorry for not linking it back here for visiblity. https://github.com/kubernetes/kubernetes/pull/83414
My biggest problem with just a warning label is that 1.14.7 release will forever be broken on AWS. Ideal scenario is that 1.14.7 is recreated with the same git tag and https://github.com/kubernetes/kubernetes/pull/78100 cherry-picked on top so that 1.14.7 is a legitimate version.
I think a warning in the CHANGELOG/release notes is reasonable. Can someone open a PR for this please? Ideally someone involved in the initial cherry-pick that caused this bug (https://github.com/kubernetes/kubernetes/pull/78100)? cc @mcrute @micahhausler @nckturner @jaypipes
created cherry-pick for 1.14: https://github.com/kubernetes/kubernetes/pull/82954