kubernetes: AWS ELB LoadBalancer not being updated in v1.14.7

What happened: When upgrading from Kubernetes 1.14.6 to -> 1.14.7 we started seeing following error:

kube-controller-manager error E0920 07:38:34.944393       1 service_controller.go:663] External error while updating load balancer: error listing AWS instances: "InvalidParameterValue: The filter 'null' is invalid\n\tstatus code: 400, request id: redacted".

And saw that the load balancer did not have any new nodes added to it.

What you expected to happen:

Kube-controller-manager should have been able to add nodes to lb pool.

How to reproduce it (as minimally and precisely as possible): Upgrade from 1.14.6 to 1.14.7

Anything else we need to know?:

Environment: Kops in AWS, classic load balancer

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-04T04:48:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.7", GitCommit:"8fca2ec50a6133511b771a11559e24191b1aa2b4", GitTreeState:"clean", BuildDate:"2019-09-18T14:39:02Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: AWS, Kops
OS (e.g: cat /etc/os-release): Debian 9.9
Kernel (e.g. uname -a): Linux ip-172-20-67-122 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64 GNU/Linux
Install tools:
Network plugin and version (if this is a network-related bug):

Flannel, kube-dns

Others:

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 10
Comments: 44 (30 by maintainers)

Commits related to this issue

Rollback channels 1.14.7 https://github.com/kubernetes/kubernetes/issues/82923 — committed to mikesplain/kops by mikesplain 5 years ago
Add warning about v1.14.7 on AWS issue #82923 Signed-off-by: Cryptophobia <aouzounov@gmail.com> — committed to Cryptophobia/kubernetes by Cryptophobia 5 years ago

Most upvoted comments

any ETA for this? for the 1.14.8?

+10

HerrmannHinz on Oct 1, 2019

This is a fairly colossal screw-up, taking 2 weeks to patch it makes us as a community and Kubernetes as a piece of software look bad in my opinion.

At the very least a nice fat warning in the release notes, better yet just fixing it ASAP would make sense.

lawrencedudley on Oct 2, 2019

I agree that a “subset” of users is fair and cannot always call for a Kubernetes release however this is a critical basic functionality of Kubernetes on AWS which basically makes 1.14.7 unusable for AWS users who expect ELB functionality. That said, I’d hate to suggest cutting an expedited release which always has risks (and is rarely feasible).

I’m most concerned about users unaware of this bug and ways we can prevent them from experiencing the pain (besides the obvious long term plan to move to external cloud provider, which I’m so pumped for), it appears many others have based on these issues and a number of comments in slack.

Would it be amenable to add a note on the 1.14 release doc for 1.14.7 stating this as a known issue? It’s my understanding 1.14.6 is still fine for users who expect this elb functionality until 1.14.8 is cut?

Currently Kops alpha channel is set on 1.14.7, which I’ll discuss with the team whether we should roll that back to 1.14.6.

mikesplain on Oct 2, 2019

It’s 17 days later and there still isn’t an update in the Changelog to warn against upgrading to 1.14.7. This is really poor. In my opinion 1.14.7 should have been pulled until that warning was in place. My clusters are affected because I’m already using this feature and it’s wasted debugging time for me. There are going to be users where this release makes it into production because they’re not using this feature and then they’ll run into issue when they do.

vrih on Oct 7, 2019

Hmmm, “small subset of users”? Doesn’t this bug impact all users on AWS running k8s and provisioning loadbalancers?

https://github.com/kubernetes/kubernetes/pull/82954#issuecomment-534093933

Cryptophobia on Oct 2, 2019

Folks, we should only be prioritizing patch release for bugs or security issues that impact most (if not all) Kubernetes users. I understand this is an awful bug to run into, but we can’t fast track patch release for cloud provider specific bugs that only impact a subset of users.

andrewsykim on Oct 2, 2019

All good in 1.14.8. Upgraded today from 1.14.6

ssro on Oct 16, 2019

Hi folks, Ran into the same issue, upgrading from 1.14.6 to 1.14.7

ssro on Sep 20, 2019

Our e2e tests failed so we did not rolled it out today. @dims sounds like we want to have e2e tests with aws, do we actually test in other environments than gke?

szuecs on Sep 20, 2019

@zhan849 I don’t think so because the subsequent bug fix (https://github.com/kubernetes/kubernetes/pull/78498) to https://github.com/kubernetes/kubernetes/pull/76749 went into v1.15, but not v1.14.

andrewsykim on Oct 6, 2019

@Cryptophobia there was another PR open on this two days ago, sorry for not linking it back here for visiblity. https://github.com/kubernetes/kubernetes/pull/83414

andrewsykim on Oct 4, 2019

My biggest problem with just a warning label is that 1.14.7 release will forever be broken on AWS. Ideal scenario is that 1.14.7 is recreated with the same git tag and https://github.com/kubernetes/kubernetes/pull/78100 cherry-picked on top so that 1.14.7 is a legitimate version.

Cryptophobia on Oct 2, 2019

Would it be amenable to add a note on the 1.14 release doc for 1.14.7 stating this as a known issue? It’s my understanding 1.14.6 is still fine for users who expect this elb functionality until 1.14.8 is cut?

I think a warning in the CHANGELOG/release notes is reasonable. Can someone open a PR for this please? Ideally someone involved in the initial cherry-pick that caused this bug (https://github.com/kubernetes/kubernetes/pull/78100)? cc @mcrute @micahhausler @nckturner @jaypipes

andrewsykim on Oct 2, 2019

created cherry-pick for 1.14: https://github.com/kubernetes/kubernetes/pull/82954

M00nF1sh on Sep 20, 2019