agones: Moving cluster to a new node pool doesn't recreate all fleets

I’ve noticed something weird today. I needed to swap node pool in GKE so I created new node pool and deleted old one. I expected all instances in the old node pool to recover in the new one after some time. However in my particular case I could only see 1 of the 3 servers in workloads page on GCloud. So I checked fleets to see if it has min availability which was 1 of each kind = 3. And kubectl describe fleets indicated that 3 servers were online and available, however when I tried to connect to one that was listed but not in workloads it failed to connect to it, I was able to connect to the one appearing in the workloads, but not others. I had to delete fleets and recreate them for them to appear and work correctly again.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 18 (8 by maintainers)

Commits related to this issue

Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access mast... — committed to markmandel/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller (#1279) * Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and t... — committed to googleforgames/agones by markmandel 4 years ago
Fix for Pod deletion during unavailable controller (#1279) * Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and t... — committed to ilkercelikyilmaz/agones by markmandel 4 years ago

Most upvoted comments

Now that #1008 is written, I think we can close this, as we give advice on how to perform upgrades that mitigate this issue (what seems to mostly be a race condition).

Also, the advice to setup separate node pools in production also seems to resolve it.

markmandel on Aug 26, 2019