agones: Moving cluster to a new node pool doesn't recreate all fleets

I’ve noticed something weird today. I needed to swap node pool in GKE so I created new node pool and deleted old one. I expected all instances in the old node pool to recover in the new one after some time. However in my particular case I could only see 1 of the 3 servers in workloads page on GCloud. So I checked fleets to see if it has min availability which was 1 of each kind = 3. And kubectl describe fleets indicated that 3 servers were online and available, however when I tried to connect to one that was listed but not in workloads it failed to connect to it, I was able to connect to the one appearing in the workloads, but not others. I had to delete fleets and recreate them for them to appear and work correctly again.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Now that #1008 is written, I think we can close this, as we give advice on how to perform upgrades that mitigate this issue (what seems to mostly be a race condition).

Also, the advice to setup separate node pools in production also seems to resolve it.