gocd: NPE in Agent Launcher

Issue Type
  • Bug Report
Summary

Agent cannot start due to NullPointerException

Environment
Basic environment details
  • Go 16.10.0 (4131-730ff1867576754414cc632957f344d0263bc06d)
  • Ubuntu 15.10 (4.2.0-41-generic)
  • JDK 1.8.0_101-b13

Standard configuration (no special changes) with single agent running.

Steps to Reproduce
  1. Start a build. Any build fails, because agent does not launch
  2. Started build hangs indefinitely at the first job
Expected Results

Agent should start. Build should work.

Actual Results

NullPointerException occurs in Agent Launcher

Log snippets
2016-10-17 15:02:27,661 [main     ] ERROR cruise.agent.launcher.AgentLauncherImpl:98 - Launch encountered an unknown exception
java.lang.NullPointerException
        at com.thoughtworks.cruise.agent.launcher.AgentLauncherImpl.getPort(AgentLauncherImpl.java:136)
        at com.thoughtworks.cruise.agent.launcher.AgentLauncherImpl.getUrlGenerator(AgentLauncherImpl.java:127)
        at com.thoughtworks.cruise.agent.launcher.AgentLauncherImpl.launch(AgentLauncherImpl.java:75)
        at com.thoughtworks.go.agent.bootstrapper.AgentBootstrapper.go(AgentBootstrapper.java:72)
        at com.thoughtworks.go.agent.bootstrapper.AgentBootstrapper.main(AgentBootstrapper.java:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.simontuffs.onejar.Boot.run(Boot.java:306)
        at com.simontuffs.onejar.Boot.main(Boot.java:159)

Any other info

I was not aware that Go.CD auto-upgrades when doing apt-get on Ubuntu. I had a working installation before the upgrade 😦 Also it seems I’ll have to upgrade plugins individually by downloading jar-files (did that for gradle plugin)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@lenucksi Well, the issue occurred because of a mix of a few things, let me try and explain.

In general an agent-bootstrapper is capable of upgrading an agent-launcher which in turn would upgrade and start an agent. So, normally you would just upgrade your server and let agent-bootstrapper take care of upgrading your agents automatically. Or one may choose to upgrade both server and agents by themselves.

Until version 16.6, bootstrapper expected some specific arguments to be passed along to it ie. server-host-name and server-port, but as a part of 16.7, the bootstrapper was updated to accept a different set of arguments, one being the serverUrl ie. ssl url (instead of accepting the hostname and port-number as separate args), along with a bunch of other arguments for certificate verification. @arvindsv’s comment here provides a little more details about this. The same set of arguments are passed along to the launcher which would then be used to upgrade an agent.

Coming to the issue, it was a case of new bootstrapper running with an old version of launcher which shouldn’t ever be the case. Upgrade process should have cleared out the older launcher, but it did not [bug]. As mentioned earlier, older launcher expected the hostname and ports to be provided, but the new bootstrapper passed along the serverUrl instead and hence this code from older launcher would throw an exception. The fix was to cleanup older launcher during the upgrade process.

So, in terms of who does the fix apply to -

  • if you upgrade your server from 16.6 or earlier versions to 16.7 or later versions, but do not upgrade your agents, then things will continue to work. Ofcourse you will miss out on end-to-end security of agent-server communication(read more), but your agents would continue to be active and run your builds.
  • if you upgrade your server from 16.6 (or earlier versions) to 16.12 and upgrade your agents too, the things would work and you would be able to enable the end-to-end transport security for agent-server communication.
  • if you upgrade your server from 16.6(or earlier versions) to 16.7 (or later versions), and upgrade your agent to any of the versions between 16.7-16.11, then you would be hit by this bug. The workaround would be to delete the locally available launcher jar and restart bootstrapper.

I hope it makes things clear now.