azure-pipelines-agent: Self-hosted agents intermittently do not pick up new jobs
Having issue with YAML?
No
Having issue with Tasks?
No
Having issue with software on Hosted Agent?
No
Having generic issue with Azure-Pipelines/VSTS/TFS?
No
Have you tried troubleshooting?
Yes
Agent Version and Platform
Version of your agent? 2.175.2
OS of the machine running the agent? CentOS 7
Azure DevOps Type and Version
dev.azure.com
If dev.azure.com, what is your organization name? https://dev.azure.com/ (will provide this privately if necessary)
What’s not working?
We have a series of pipelines that all behave the same way:
First Stage
- Run on an Ubuntu 18.x Azure hosted agent
- Create a VM in GCP that runs an Azure Self-Hosted Agent in a Docker container as described on this web page: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops
- Make sure the VM and the agent have a unique name per job
Second Stage
- Wait for the self-hosted agent to pick up the work, based on a custom “demand” that looks for the unique agent name
- Run some custom code on the self-hosted agent
- Finish running custom code
- Delete GCP VM
Agent and Worker’s Diagnostic Logs
See the following log files for an example of a successful run, and an unsuccessful run.
self-hosted-agent-log-failure.log self-hosted-agent-log-successful.log
Key differences that I’ve noticed:
The Linux version printed at the top is different, though I’m not exactly sure how/why, and I’m not sure why that would matter for this particular issue:
successful
[2020-10-28 05:30:47Z INFO AgentProcess] RuntimeInformation: Linux 4.19.150+ #1 SMP Sat Oct 24 07:57:26 PDT 2020.
failure
[2020-10-21 23:01:22Z INFO AgentProcess] RuntimeInformation: Linux 5.4.49+ #1 SMP Sun Oct 18 19:43:35 PDT 2020.
Note that the failure log shows that the agent is listening for jobs but then times out after 30 minutes, but the success log receives the job within 30 seconds
successful
[2020-10-28 05:30:49Z INFO MessageListener] Session created.
[2020-10-28 05:30:49Z INFO Terminal] WRITE LINE: 2020-10-28 05:30:49Z: Listening for Jobs
[2020-10-28 05:30:49Z INFO JobDispatcher] Set agent/worker IPC timeout to 30 seconds.
[2020-10-28 05:31:28Z INFO RSAFileKeyManager] Loading RSA key parameters from file /azp/agent/.credentials_rsaparams
[2020-10-28 05:31:28Z INFO MessageListener] Message '1' received from session 'b2dbac0f-1ab5-45ec-ae40-c811b5d35d0d'.
[2020-10-28 05:31:28Z INFO JobDispatcher] Job request 2037 for plan e2905f74-12be-4282-8fb2-215cd5c5d3f3 job fc308004-fcdd-5de5-2151-99c66bc3b9d8 received.
[2020-10-28 05:31:28Z INFO Terminal] WRITE LINE: 2020-10-28 05:31:28Z: Running job: Build container
failure
[2020-10-21 23:01:23Z INFO RSAFileKeyManager] Loading RSA key parameters from file /azp/agent/.credentials_rsaparams
[2020-10-21 23:01:23Z INFO VisualStudioServices] AAD Correlation ID for this token request: Unknown
[2020-10-21 23:01:23Z INFO MessageListener] Session created.
[2020-10-21 23:01:23Z INFO Terminal] WRITE LINE: 2020-10-21 23:01:23Z: Listening for Jobs
[2020-10-21 23:01:23Z INFO JobDispatcher] Set agent/worker IPC timeout to 30 seconds.
[2020-10-21 23:31:24Z INFO MessageListener] No message retrieved from session 'dc2f77ad-6fdb-4a0f-b539-f0eefaef1c8d' within last 30 minutes.
[2020-10-21 23:56:26Z WARN VisualStudioServices] Authentication failed with status code 401.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (7 by maintainers)
@dvmorris @KrylixZA @mk-AVA I’m closing this at the moment due to inactivity - please let us know if it’s still actual for you and provide more details - for us to investigate it further.
Relative to the logs I added above, it is between these two logged outputs:
More specifically, it is exactly at 13:11:23Z when the send job request message is made.