spinnaker: GKE 1.7.8 - Initial deployment/redeployment - `hal deploy apply` - crashes node (reproducible)
Title
Initial and subsequent hal deploy apply
cause various pods to spin out of control and crash node. Fresh cluster (multiple times), always GKE 1.7.8 (so far)
Cloud Provider: google
Environment:
- GKE 1.7.8
- Spinnaker 1.4.2 (and 1.3.1)
- New installation following GKE codelab
Feature Area deploy
Description
Ouickstart Public Spinnaker on GKE fails to deploy. spin-clouddriver-bootstrap
spins out of control (34 times when I checked).
Found in the kubernetes console:
spin-clouddriver-bootstrap-v000
replicaset
load-balancer-spin-clouddriver-bootstrap: true
replication-controller: spin-clouddriver-bootstrap-v000
0 / 1
10 minutes
gcr.io/spinnaker-marketplace/clouddriver:0.8.1-20171002182452
more_vert
The node was low on resource: [MemoryPressure].
Readiness probe failed: Get http://10.48.1.26:7002/health: dial tcp 10.48.1.26:7002: getsockopt: connection refused
No nodes are available that match all of the following predicates:: NodeUnderMemoryPressure (1).
Steps to Reproduce There is nothing special to reproduce:
- Create GKE cluster with 3 nodes
gcloud alpha container clusters create \
${GKE_CLUSTER} \
--enable-autoscaling \
--max-nodes=10 \
--enable-autorepair \
--enable-autoupgrade \
--machine-type=n1-standard-2 \
--scopes="default,cloud-platform,storage-full,sql-admin,userinfo-email" \
--zone=${GKE_CLUSTER_ZONE} \
--num-nodes=3
- Go to the gke console, choose the cluster, click
Upgrade available
, choose1.7.8
- Once the above is complete, upgrade the node pool to
1.7.8
in the console - Install/deploy spinnaker - https://www.spinnaker.io/setup/quickstart/halyard-gke/
- Wait for it…crash. 2 nodes remaining an spinnaker is running.
- Resize cluster in console to 3 nodes
hal deploy apply
- Wait for it…crash. 2 nodes remaining an spinnaker is running.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 30 (9 by maintainers)
We see the issue with as low as 1.7.5
Thanks that makes a lot more sense - I’ll look into it but it’s unlikely we’ll be able to change much on our end - it might be a bug in the master for 1.7.8 where it tries to restart pods before the old ones are fully cleaned up.