spinnaker: GKE 1.7.8 - Initial deployment/redeployment - `hal deploy apply` - crashes node (reproducible)

Title Initial and subsequent hal deploy apply cause various pods to spin out of control and crash node. Fresh cluster (multiple times), always GKE 1.7.8 (so far)

Cloud Provider: google

Environment:

GKE 1.7.8
Spinnaker 1.4.2 (and 1.3.1)
New installation following GKE codelab

Feature Area deploy

Description Ouickstart Public Spinnaker on GKE fails to deploy. spin-clouddriver-bootstrap spins out of control (34 times when I checked).

Found in the kubernetes console:

spin-clouddriver-bootstrap-v000
replicaset
load-balancer-spin-clouddriver-bootstrap: true
replication-controller: spin-clouddriver-bootstrap-v000
0 / 1
10 minutes
gcr.io/spinnaker-marketplace/clouddriver:0.8.1-20171002182452
 
more_vert
The node was low on resource: [MemoryPressure].
Readiness probe failed: Get http://10.48.1.26:7002/health: dial tcp 10.48.1.26:7002: getsockopt: connection refused
No nodes are available that match all of the following predicates:: NodeUnderMemoryPressure (1).

Steps to Reproduce There is nothing special to reproduce:

Create GKE cluster with 3 nodes

gcloud alpha container clusters create \
  ${GKE_CLUSTER} \
  --enable-autoscaling \
  --max-nodes=10 \
  --enable-autorepair \
  --enable-autoupgrade \
  --machine-type=n1-standard-2 \
  --scopes="default,cloud-platform,storage-full,sql-admin,userinfo-email" \
  --zone=${GKE_CLUSTER_ZONE} \
  --num-nodes=3

Go to the gke console, choose the cluster, click Upgrade available, choose 1.7.8
Once the above is complete, upgrade the node pool to 1.7.8 in the console
Install/deploy spinnaker - https://www.spinnaker.io/setup/quickstart/halyard-gke/
Wait for it…crash. 2 nodes remaining an spinnaker is running.
Resize cluster in console to 3 nodes
hal deploy apply
Wait for it…crash. 2 nodes remaining an spinnaker is running.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 30 (9 by maintainers)

Most upvoted comments

We see the issue with as low as 1.7.5

edwinavalos on Oct 17, 2017

Thanks that makes a lot more sense - I’ll look into it but it’s unlikely we’ll be able to change much on our end - it might be a bug in the master for 1.7.8 where it tries to restart pods before the old ones are fully cleaned up.

lwander on Oct 17, 2017