azure-pipelines-agent: Agent (version higher than 2.164.8) fails to connect to Azure DevOps

Agent Version and Platform

2.173.0 on centos-release-7-6.1810.2.el7.centos.x86_64

Edit: It’s a release agent for a deployment pool.

Azure DevOps Type and Version

dev.azure.com (cloud)

What’s not working?

After the auto update to 2.173.0 the agent can no longer connect to Azure DevOps.

# Running run once with agent version 2.160.1
./run.sh --once
Scanning for tool capabilities.
Connecting to the server.
2020-08-25 21:31:02Z: Listening for Jobs
Agent update in progress, do not shutdown agent.
Downloading 2.173.0 agent
Waiting for current job finish running.
Generate and execute update script.
Agent will exit shortly for update, should back online within 10 seconds.
‘/root/azagent/_diag/SelfUpdate-20200825-213148.log’ -> ‘/root/azagent/_diag/SelfUpdate-20200825-213148.log.succeed’
Scanning for tool capabilities.
Connecting to the server. 
# this now runs indefinitely

Similar to issue #2824

Could someone please tell me how to stop the auto update? Multiple agents on production machines are offline and I have, as of now, no idea how to fix that.

Edit: After trying different approaches and versions for hours, I found a workaround: Currently using version 2.164.8 (with the agent.disableupdate variable to disable auto updates). However, this is not a long term solution and break any moment, as soon as version 2.164.8 won’t be accepted by the server anymore.

Agent and Worker’s Diagnostic Logs

agent.log

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 35 (17 by maintainers)

Most upvoted comments

@mjroghelia The same is happening on an Ubuntu 18.04 installation. I am also getting an Authentication failed with status code 401. message (version 2.174.1 this time). The process hangs, does not crash, and no further log appears. I’ve also enabled http tracing - those are the last few lines of the log:

[2020-09-01 14:44:12Z INFO RSAFileKeyManager] Loading RSA key parameters from file /root/azagent/.credentials_rsaparams
[2020-09-01 14:44:13Z INFO RSAFileKeyManager] Loading RSA key parameters from file /root/azagent/.credentials_rsaparams
[2020-09-01 14:44:13Z WARN VisualStudioServices] Authentication failed with status code 401.
Cache-Control: no-cache
Pragma: no-cache
P3P: CP="CAO DSP COR ADMa DEV CONo TELo CUR PSA PSD TAI IVDo OUR SAMi BUS DEM NAV STA UNI COM INT PHY ONL FIN PUR LOC CNT"
WWW-Authenticate: Bearer authorization_uri=https://login.microsoftonline.com/0ba2168f-2571-49f4-851f-a59710df852f, Basic realm="https://tfsprodweu5.visualstudio.com/", TFS-Federated
X-TFS-ProcessId: 127fcabc-8462-4afc-99d4-3ec597a296ef
Strict-Transport-Security: max-age=31536000; includeSubDomains
ActivityId: f167f102-f872-46d0-8403-d33f4379ae4e
X-TFS-Session: dfcf33c5-ab0b-4cd0-b7df-cd4a43f2953a
X-VSS-E2EID: 1cd18c4e-8456-4aa1-9c74-fb644f248d38
X-Frame-Options: SAMEORIGIN
X-TFS-FedAuthRealm: https://tfsprodweu5.visualstudio.com/
X-TFS-FedAuthIssuer: https://dev.azure.com/orgname/
X-VSS-AuthorizationEndpoint: https://vssps.dev.azure.com/orgname/
X-VSS-ResourceTenant: 0ba2168f-2571-49f4-851f-a59710df852f
X-VSS-S2STargetService: 00000002-0000-8888-8000-000000000000/visualstudio.com
X-TFS-FedAuthRedirect: https://spsprodweu4.vssps.visualstudio.com/_signin?realm=dev.azure.com&reply_to=https%3A%2F%2Fdev.azure.com%2Forgname%2F_apis%2FconnectionData%3FconnectOptions%3D0%26lastChangeId%3D61616171%26lastChangeId64%3D61616171&redirect=1&hid=e1801778-422d-4abd-aac1-504d3ffbd27d&context=eyJodCI6MiwiaGlkIjoiMGYzMjg1NWYtZjc3OS00YjE5LTgwMDYtYTlhYjRlMzVhNDBkIiwicXMiOnt9LCJyciI6IiIsInZoIjoiIiwiY3YiOiIiLCJjcyI6IiJ90#ctx=eyJTaWduSW5Db29raWVEb21haW5zIjpbImh0dHBzOi8vbG9naW4ubWljcm9zb2Z0b25saW5lLmNvbSIsImh0dHBzOi8vbG9naW4ubWljcm9zb2Z0b25saW5lLmNvbSJdfQ2
Request-Context: appId=cid-v1:1d01aa00-5a22-493d-8bc0-acd0af8f839d
Access-Control-Expose-Headers: Request-Context
X-Content-Type-Options: nosniff
X-MSEdge-Ref: Ref A: 197D666C864645F28E6CA616D548D70A Ref B: ZRHEDGE0813 Ref C: 2020-09-01T14:44:12Z
Date: Tue, 01 Sep 2020 14:44:12 GMT

[2020-09-01 14:44:13Z WARN VisualStudioServices] Authentication failed with status code 401.
Cache-Control: no-cache
Pragma: no-cache
P3P: CP="CAO DSP COR ADMa DEV CONo TELo CUR PSA PSD TAI IVDo OUR SAMi BUS DEM NAV STA UNI COM INT PHY ONL FIN PUR LOC CNT"
WWW-Authenticate: Bearer authorization_uri=https://login.microsoftonline.com/0ba2168f-2571-49f4-851f-a59710df852f, Basic realm="https://tfsprodweu5.visualstudio.com/", TFS-Federated
X-TFS-ProcessId: ab178765-4ec5-4710-b2b2-dde539ce7601
Strict-Transport-Security: max-age=31536000; includeSubDomains
ActivityId: 70a5fbe5-a57a-48dc-b0a5-519475ac42f3
X-TFS-Session: dfcf33c5-ab0b-4cd0-b7df-cd4a43f2953a
X-VSS-E2EID: 12a82770-7360-4f2e-8d20-ea480814827e
X-Frame-Options: SAMEORIGIN
X-TFS-FedAuthRealm: https://tfsprodweu5.visualstudio.com/
X-TFS-FedAuthIssuer: https://dev.azure.com/orgname/
X-VSS-AuthorizationEndpoint: https://vssps.dev.azure.com/energyschweiz/
X-VSS-ResourceTenant: 0ba2168f-2571-49f4-851f-a59710df852f
X-VSS-S2STargetService: 00000002-0000-8888-8000-000000000000/visualstudio.com
X-TFS-FedAuthRedirect: https://spsprodweu4.vssps.visualstudio.com/_signin?realm=dev.azure.com&reply_to=https%3A%2F%2Fdev.azure.com%2Forgname%2F_apis%2FconnectionData%3FconnectOptions%3D0%26lastChangeId%3D61616171%26lastChangeId64%3D61616171&redirect=1&hid=e1801778-422d-4abd-aac1-504d3ffbd27d&context=eyJodCI6MiwiaGlkIjoiMGYzMjg1NWYtZjc3OS00YjE5LTgwMDYtYTlhYjRlMzVhNDBkIiwicXMiOnt9LCJyciI6IiIsInZoIjoiIiwiY3YiOiIiLCJjcyI6IiJ90#ctx=eyJTaWduSW5Db29raWVEb21haW5zIjpbImh0dHBzOi8vbG9naW4ubWljcm9zb2Z0b25saW5lLmNvbSIsImh0dHBzOi8vbG9naW4ubWljcm9zb2Z0b25saW5lLmNvbSJdfQ2
Request-Context: appId=cid-v1:1d01aa00-5a22-493d-8bc0-acd0af8f839d
Access-Control-Expose-Headers: Request-Context
X-Content-Type-Options: nosniff
X-MSEdge-Ref: Ref A: B4F2F49A95B04D3B89C1584555AAF96E Ref B: ZRHEDGE0811 Ref C: 2020-09-01T14:44:12Z
Date: Tue, 01 Sep 2020 14:44:13 GMT

[2020-09-01 14:44:13Z INFO VisualStudioServices] AAD Correlation ID for this token request: Unknown
[2020-09-01 14:44:13Z INFO VisualStudioServices] Finished operation Location.GetConnectionData
[2020-09-01 14:44:13Z INFO VisualStudioServices] Finished operation Location.GetConnectionData
[2020-09-01 14:44:13Z INFO VisualStudioServices] Finished operation Location.GetConnectionData

@anatolybolshakov

I’m not sure if you’re able to access all the comments in this VS Developer Community Thread, but it holds many logs and things I have already tried. I will try and share as many things publicly but the HTTP traces should stay secret afaik.

(In my next comment I will try to summarize some other approaches me and my hoster have been trying over the last months.)

@michaelschufi could you try also to remove agent in DevOps and reconfigure updated agent - by running ./config remove and ./config sequentially?

Ofc, I can and will get back to you. But if this is just to “clean up” the server side, I don’t see how it helps. Because if I newly provision an Ubuntu 18.04 VM (with a different hostname, etc.), the problem persists. It is not just while updating the agent. Please quickly tell me how it is relevant.

could you please check if there are any other errors in agent logs - except 401 issues (this seems to be not related to the current issue)?

Not many errors are in there. I think this is why most people report the 401 errors.

The most interesting one so far is:

[2020-09-23 08:43:13Z INFO DeploymentGroupAgentConfigProvider] Connect complete for deployment group
[2020-09-23 08:43:13Z INFO ConfigurationManager] Test Connection complete.
[2020-09-23 08:43:13Z INFO HostContext] Well known directory 'Bin': '/home/azure/azagent/bin'
[2020-09-23 08:43:13Z INFO HostContext] Well known directory 'Root': '/home/azure/azagent'
[2020-09-23 08:43:13Z INFO HostContext] Well known config file 'RSACredentials': '/home/azure/azagent/.credentials_rsaparams'
[2020-09-23 08:43:13Z INFO RSAFileKeyManager] Creating new RSA key using 2048-bit key length
[2020-09-23 08:43:13Z INFO RSAFileKeyManager] Successfully saved RSA key parameters to file /home/azure/azagent/.credentials_rsaparams
[2020-09-23 08:43:13Z INFO RSAFileKeyManager] Which: 'chmod'
[2020-09-23 08:43:13Z INFO RSAFileKeyManager] Location: '{matches.First()}'
[2020-09-23 08:43:13Z INFO HostContext] Well known directory 'Bin': '/home/azure/azagent/bin'
[2020-09-23 08:43:13Z INFO HostContext] Well known directory 'Root': '/home/azure/azagent'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] Starting process:
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   File name: '/bin/chmod'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Arguments: '600 /home/azure/azagent/.credentials_rsaparams'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Working directory: '/home/azure/azagent'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Persist current code page: 'False'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 5183.
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] System.IO.IOException: Invalid argument
   at System.IO.FileStream.WriteNative(ReadOnlySpan`1 source)
   at System.IO.FileStream.FlushWriteBuffer()
   at System.IO.FileStream.Dispose(Boolean disposing)
   at System.IO.Stream.Close()
   at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing)
   at System.IO.StreamWriter.Dispose(Boolean disposing)
   at System.IO.TextWriter.Dispose()
   at System.IO.File.WriteAllText(String path, String contents)
   at Microsoft.VisualStudio.Services.Agent.Util.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] Process started with process id 5183, waiting for process exit.
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] Exited process 5183 with exit code 0
[2020-09-23 08:43:13Z INFO ProcessInvokerWrapper] Finished process 5183 with exit code 0, and elapsed time 00:00:00.0422561.
[2020-09-23 08:43:13Z INFO RSAFileKeyManager] Successfully set permissions for RSA key parameters file /home/azure/azagent/.credentials_rsaparams
[2020-09-23 08:43:13Z INFO Terminal] WRITE LINE: 
[2020-09-23 08:43:13Z INFO Terminal] WRITE LINE: >> Register Agent:
[2020-09-23 08:43:13Z INFO Terminal] WRITE LINE: 
[2020-09-23 08:43:13Z INFO CommandSettings] Arg 'projectname': 'Test'

So it completed afterwards, even though there was en error. (As far as I can remember. This was more than a month ago) Running the agent afterwards does not work.

Hi @anatolybolshakov

I finally have some more information about this issue. Our hoster’s partner has investigated the issue and this was their response:

Initially, the file does not exist in the ./azagent directory and was created by the process (pid 444) which was one of the children of the main process pid 428 [1]. The process which forked pid 450 was killed before pid 450 exit and lead to a zombie state for unfinished chmod. The above scenario is considered to be out of <partner>'s scope as it is related to the way how script works. In other words, an issue you have reported can also be seen on a local systems (not all of them). This is not a flaw in <partner> product but rather a race condition occuring during the script execution. I appreciate your patience and cooperation while we were processing the case.

[1]
233 pts/1    Ss+    0:00  \_ /bin/sh
  391 pts/1    S      0:00 /bin/bash ./config.sh --deploymentgroup --deploymentgroupname All --acceptteeeula --agent node45241.example.ch
  428 pts/1    Sl     0:08  \_ ./bin/Agent.Listener configure --deploymentgroup --deploymentgroupname All --acceptteeeula --agent node45241-example.ch
  450 pts/1    Z      0:00      \_ [chmod] <defunct>
strace results:
450   fchmodat(AT_FDCWD, "/azagent/.credentials_rsaparams", 0600 <unfinished ...>
450   <... fchmodat resumed> )          = 0
450   fchmodat(AT_FDCWD, "/azagent/.credentials_rsaparams", 0600 <unfinished ...>
450   <... fchmodat resumed> )          = 0
450   close(1<pipe:[104053172]> <unfinished ...>
450   <... close resumed> )             = 0
450   close(2<pipe:[104053175]> <unfinished ...>
450   <... close resumed> )             = 0
450   exit_group(0)                     = ?
450   +++ exited with 0 +++

I would really appreciate if you could look into this issue. It’s been an issue for a rather long time now. I was also writing in this VS Developer Community thread).

Please tell me if I can test something or help in some other way.

Just as a quick update - looks like net core bump to 3.1.15 resolved the issue, created PR for this update.

It seems that there could be some issue on net core side - created ticket.

@anatolybolshakov Please have someone take a look at reproducing this one. The summary is that for some users versions of the agent since we upgraded to .Net Core 3.1 have been reported to hang or crash when starting up. One detail that may be relevant is that this user is running as root. Also note that 401s appear in the logs when the agent first starts normally, so that is not likely related (see #3149). The real issue sounds like it is that the process hangs and never starts polling for messages.