agones: FleetAutoscaler bug
What happened:
we found a bug by using FleetAutoscaler when we set FleetAutoscaler such as :
apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
name: gameserver-autoscaler
spec:
fleetName: test-gameserver
policy:
type: Buffer
buffer:
bufferSize: 2
minReplicas: 3
maxReplicas: 100
we will get like this. so far so good
NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE
gameserver Packed 3 3 0 0 43d
NAME READY STATUS RESTARTS AGE
gameserver-xkhxz-77mbr 2/2 Running 0 13s
gameserver-xkhxz-t44r9 2/2 Running 0 14s
gameserver-xkhxz-xn4bz 2/2 Running 0 22s
now Allocat 2 gameservers we will get 2 Ready and 2 Allocated server
NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE
gameserver Packed 4 4 2 2 43d
NAME READY STATUS RESTARTS AGE
gameserver-xkhxz-77mbr 2/2 Running 0 4m41s
gameserver-xkhxz-g47f7 2/2 Running 0 42s
gameserver-xkhxz-t44r9 2/2 Running 0 4m42s
gameserver-xkhxz-xn4bz 2/2 Running 0 4m50s
NAME STATE ADDRESS PORT NODE AGE
gameserver-xkhxz-77mbr Allocated 18.181.197.173 7190 ip-10-188-11-12.ap-northeast-1.compute.internal 4m42s
gameserver-xkhxz-g47f7 Ready 3.112.57.82 7233 ip-10-188-36-130.ap-northeast-1.compute.internal 43s
gameserver-xkhxz-t44r9 Allocated 3.112.57.82 7842 ip-10-188-36-130.ap-northeast-1.compute.internal 4m43s
gameserver-xkhxz-xn4bz Ready 18.181.197.173 7518 ip-10-188-11-12.ap-northeast-1.compute.internal 4m51s
now when we shutdown one gameserver it will create new server and gonna delete a gameserver which was Ready check it below.
NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE
gameserver Packed 4 4 1 2 43d
NAME READY STATUS RESTARTS AGE
gameserver-xkhxz-77mbr 2/2 Running 0 4m58s
gameserver-xkhxz-g47f7 2/2 Running 0 59s
gameserver-xkhxz-gm9ll 2/2 Running 0 4s
gameserver-xkhxz-t44r9 2/2 Terminating 0 4m59s
gameserver-xkhxz-xn4bz 2/2 Running 0 5m7s
NAME STATE ADDRESS PORT NODE AGE
gameserver-xkhxz-77mbr Allocated 18.181.197.173 7190 ip-10-188-11-12.ap-northeast-1.compute.internal 4m59s
gameserver-xkhxz-g47f7 Ready 3.112.57.82 7233 ip-10-188-36-130.ap-northeast-1.compute.internal 60s
gameserver-xkhxz-gm9ll Scheduled 3.112.57.82 7838 ip-10-188-36-130.ap-northeast-1.compute.internal 5s <--- new server
gameserver-xkhxz-t44r9 Shutdown 3.112.57.82 7842 ip-10-188-36-130.ap-northeast-1.compute.internal 5m <--- shutdown server
gameserver-xkhxz-xn4bz Ready 18.181.197.173 7518 ip-10-188-11-12.ap-northeast-1.compute.internal 5m8s <--- this server disappear
delete gameserver-xkhxz-xn4bz , which was ready…
NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE
gameserver Packed 3 3 1 1 43d
NAME READY STATUS RESTARTS AGE
gameserver-xkhxz-77mbr 2/2 Running 0 5m15s
gameserver-xkhxz-g47f7 2/2 Running 0 76s
gameserver-xkhxz-gm9ll 2/2 Running 0 21s
NAME STATE ADDRESS PORT NODE AGE
gameserver-xkhxz-77mbr Allocated 18.181.197.173 7190 ip-10-188-11-12.ap-northeast-1.compute.internal 5m16s
gameserver-xkhxz-g47f7 Ready 3.112.57.82 7233 ip-10-188-36-130.ap-northeast-1.compute.internal 77s
gameserver-xkhxz-gm9ll Scheduled 3.112.57.82 7838 ip-10-188-36-130.ap-northeast-1.compute.internal 22s
NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE
gameserver Packed 3 3 1 2 43d
NAME READY STATUS RESTARTS AGE
gameserver-xkhxz-77mbr 2/2 Running 0 17m
gameserver-xkhxz-g47f7 2/2 Running 0 13m
gameserver-xkhxz-gm9ll 2/2 Running 0 12m
NAME STATE ADDRESS PORT NODE AGE
gameserver-xkhxz-77mbr Allocated 18.181.197.173 7190 ip-10-188-11-12.ap-northeast-1.compute.internal 17m
gameserver-xkhxz-g47f7 Ready 3.112.57.82 7233 ip-10-188-36-130.ap-northeast-1.compute.internal 13m
gameserver-xkhxz-gm9ll Ready 3.112.57.82 7838 ip-10-188-36-130.ap-northeast-1.compute.internal 12m
What you expected to happen:
when we shutdown Allocated gameserver, it should keep the Ready Server.
How to reproduce it (as minimally and precisely as possible):
check abouve.
Anything else we need to know?:
Environment:
- Agones version: agones-1.13.0
- Kubernetes version (use
kubectl version
): v1.19.6-eks-49a6c0 - Cloud provider or hardware configuration:
- Install method (yaml/helm): helm
- Troubleshooting guide log(s):
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (17 by maintainers)
I think above feature can mitigate the impact and worth doing. But it still can’t solve the root problem. Those temp created gameserver will cause issues in some number-sensitive system such as billing.
Personally I’d like to implement a “lazy reconciling” feature. In this mode we can wait deleting gameservers fully removed before creating new gameservers during reconciling. What do you think?