azure-pipelines-agent: Agent (version higher than 2.164.8) fails to connect to Azure DevOps
Agent Version and Platform
2.173.0 on centos-release-7-6.1810.2.el7.centos.x86_64
Edit: It’s a release agent for a deployment pool.
Azure DevOps Type and Version
dev.azure.com (cloud)
What’s not working?
After the auto update to 2.173.0 the agent can no longer connect to Azure DevOps.
# Running run once with agent version 2.160.1
./run.sh --once
Scanning for tool capabilities.
Connecting to the server.
2020-08-25 21:31:02Z: Listening for Jobs
Agent update in progress, do not shutdown agent.
Downloading 2.173.0 agent
Waiting for current job finish running.
Generate and execute update script.
Agent will exit shortly for update, should back online within 10 seconds.
‘/root/azagent/_diag/SelfUpdate-20200825-213148.log’ -> ‘/root/azagent/_diag/SelfUpdate-20200825-213148.log.succeed’
Scanning for tool capabilities.
Connecting to the server.
# this now runs indefinitely
Similar to issue #2824
Could someone please tell me how to stop the auto update? Multiple agents on production machines are offline and I have, as of now, no idea how to fix that.
Edit: After trying different approaches and versions for hours, I found a workaround: Currently using version 2.164.8 (with the agent.disableupdate
variable to disable auto updates). However, this is not a long term solution and break any moment, as soon as version 2.164.8 won’t be accepted by the server anymore.
Agent and Worker’s Diagnostic Logs
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 35 (17 by maintainers)
@mjroghelia The same is happening on an Ubuntu 18.04 installation. I am also getting an
Authentication failed with status code 401.
message (version 2.174.1 this time). The process hangs, does not crash, and no further log appears. I’ve also enabled http tracing - those are the last few lines of the log:@anatolybolshakov
I’m not sure if you’re able to access all the comments in this VS Developer Community Thread, but it holds many logs and things I have already tried. I will try and share as many things publicly but the HTTP traces should stay secret afaik.
(In my next comment I will try to summarize some other approaches me and my hoster have been trying over the last months.)
Ofc, I can and will get back to you. But if this is just to “clean up” the server side, I don’t see how it helps. Because if I newly provision an Ubuntu 18.04 VM (with a different hostname, etc.), the problem persists. It is not just while updating the agent. Please quickly tell me how it is relevant.
Not many errors are in there. I think this is why most people report the 401 errors.
The most interesting one so far is:
So it completed afterwards, even though there was en error. (As far as I can remember. This was more than a month ago) Running the agent afterwards does not work.
Hi @anatolybolshakov
I finally have some more information about this issue. Our hoster’s partner has investigated the issue and this was their response:
I would really appreciate if you could look into this issue. It’s been an issue for a rather long time now. I was also writing in this VS Developer Community thread).
Please tell me if I can test something or help in some other way.
Just as a quick update - looks like net core bump to 3.1.15 resolved the issue, created PR for this update.
It seems that there could be some issue on net core side - created ticket.
@anatolybolshakov Please have someone take a look at reproducing this one. The summary is that for some users versions of the agent since we upgraded to .Net Core 3.1 have been reported to hang or crash when starting up. One detail that may be relevant is that this user is running as root. Also note that 401s appear in the logs when the agent first starts normally, so that is not likely related (see #3149). The real issue sounds like it is that the process hangs and never starts polling for messages.