spinnaker: hal deploy apply causes java.lang.NullPointerException on private cluster
Issue Summary:
Switched deployment clusters today… Beforehand this issue was not present. When running hal deploy apply with a clean cluster, the deployment almost completes. When running hal deploy apply again, I receive a nullpointer and only boostraps get deployed.
Cloud Provider(s):
GKE/k8s
Environment:
Private GKE cluster hal deploy to GKE k8s cluster v1.11 external redis hal --> 1.12.0-20181024113436 spinnaker --> 1.9.5, 1.10.1, 1.10.2
Feature Area:
hal
Description:
Clean slate:
> + Deploy spin-clouddriver
> Success
> + Deploy spin-front50
> Success
> + Deploy spin-orca
> Success
> + Deploy spin-deck
> Success
> + Deploy spin-echo
> Success
> + Deploy spin-gate
> Success
> + Deploy spin-igor
> Success
> + Deploy spin-rosco
> Success
> Problems in Global:
> ! ERROR Unexpected exception: java.lang.NullPointerException
>
> - Failed to deploy Spinnaker.
> -
Without DEBUG:
> on http://localhost:34839/ui/s cluster in account "myk8scluster": View the kube ui - Deploy spin-front50
> Connecting to the Kubernetes cluster in account "myk8scluster": View the kube ui on http://localhost:36107/ui/s cluster in account "myk8scluster": View the kube ui - Apply deployment
> Failure
> - Deploy spin-clouddriver
> Failure
> - Deploy spin-front50
> Failure
> - Deploy spin-orca
> Failure
> - Deploy spin-deck
> Failure
> - Deploy spin-echo
> Connecting to the Kubernetes cluster in account "myk8scluster": View the kube ui on http://localhost:42467/ui/
> - Deploy spin-gate
> Connecting to the Kubernetes cluster in account "myk8scluster": View the kube ui on http://localhost:46815/ui/
> - Deploy spin-igor
> Connecting to the Kubernetes cluster in account "myk8scluster": View the kube ui on http://localhost:44355/ui/
> - Deploy spin-rosco
> Connecting to the Kubernetes cluster in account "myk8scluster": View the kube ui on http://localhost:44115/ui/
> Problems in Global:
> ! ERROR Unexpected exception: java.lang.NullPointerException
With DEBUG enabled:
> Problems in Global:
> ! ERROR Unexpected exception: java.lang.NullPointerException
>
> - Failed to deploy Spinnaker.
> com.netflix.spinnaker.halyard.cli.services.v1.ExpectedDaemonFailureException: Failed to deploy Spinnaker.
> at com.netflix.spinnaker.halyard.cli.services.v1.OperationHandler.get(OperationHandler.java:45)
> at com.netflix.spinnaker.halyard.cli.command.v1.AbstractRemoteActionCommand.runRemoteAction(AbstractRemoteActionCommand.java:50)
> at com.netflix.spinnaker.halyard.cli.command.v1.AbstractRemoteActionCommand.executeThis(AbstractRemoteActionCommand.java:103)
> at com.netflix.spinnaker.halyard.cli.command.v1.NestableCommand.safeExecuteThis(NestableCommand.java:201)
> at com.netflix.spinnaker.halyard.cli.command.v1.NestableCommand.execute(NestableCommand.java:149)
> at com.netflix.spinnaker.halyard.cli.command.v1.NestableCommand.execute(NestableCommand.java:152)
> at com.netflix.spinnaker.halyard.cli.command.v1.NestableCommand.execute(NestableCommand.java:152)
> at com.netflix.spinnaker.halyard.cli.Main.main(Main.java:46)
> Caused by: java.lang.Exception: Unexpected exception: java.lang.NullPointerException
> at com.netflix.spinnaker.halyard.core.tasks.v1.DaemonTaskHandler.lambda$reduceChildren$0(DaemonTaskHandler.java:59)
> at java.util.stream.ReduceOps$1ReducingSink.accept(ReduceOps.java:80)
> at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:484)
> at com.netflix.spinnaker.halyard.core.tasks.v1.DaemonTaskHandler.reduceChildren(DaemonTaskHandler.java:41)
> at com.netflix.spinnaker.halyard.deploy.deployment.v1.DistributedDeployer.deploy(DistributedDeployer.java:231)
> at com.netflix.spinnaker.halyard.deploy.deployment.v1.DistributedDeployer.deploy(DistributedDeployer.java:56)
> at com.netflix.spinnaker.halyard.deploy.services.v1.DeployService.deploy(DeployService.java:287)
> at com.netflix.spinnaker.halyard.controllers.v1.DeploymentController.lambda$deploy$20(DeploymentController.java:262)
> at com.netflix.spinnaker.halyard.core.DaemonResponse$StaticRequestBuilder.build(DaemonResponse.java:127)
> at com.netflix.spinnaker.halyard.core.tasks.v1.TaskRepository.lambda$submitTask$1(TaskRepository.java:48)
> at java.lang.Thread.run(Thread.java:748)
Steps to Reproduce:
Create private cluster (GKE). Nodes to use NAT gateway for outgoing traffic Authorize connection from haylard VM to master node Ensure kubectl get namespaces works add k8s account as deployment target hal deploy apply
Additional Details:
For the account used to deploy, I switched from v1 to v2 and then back to v1. Uncertain if this may have caused an issue.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 60 (21 by maintainers)
Genuinely having a tough time figuring out what is happening here. In order to debug this properly I think you’d need to run the Halyard code manually with logging/tracing in place to get the information required to solve this and also likely look at the k8s master node logs as well. It’s not clear that private gke cluster support for Halyard is something we should prioritize over other efforts in the k8s (and larger spinnaker) space. I’ve added this to the k8s v2 backlog to track so we eventually solve this.
Ah this is very helpful! Looks like it’s not the
kubectl port-forwardthat’s failing, but thekubectl proxy. If you invokekubectl proxyon the halyard pod (should serve on localhost:8001 by default), can you queryhttp://localhost:8001/api/v1/namespaces/spinnaker/services/spin-rosco:8087/proxy/status/all
If that fails, can you confirm that you’ve opened ports on the kubernetes master (not the nodes, it’s a little confusing) like suggested here? https://github.com/spinnaker/spinnaker/issues/3577#issuecomment-446310362
@lwander to answer your questions:
Maybe there’s some issue running Halyard in a private mode cluster and deploying Spinnaker (to the Spinnaker namespace) in that same cluster? We don’t use a separate bastion host to run Halyard.
Thanks! -R