arcade: Problems with Azure Devops Reporter
- This issue is blocking
- This issue is causing unreasonable pain
As I’ve written up in https://github.com/dotnet/core-eng/issues/13026, we introduced a threading bug in https://github.com/dotnet/arcade/pull/7310/files that can crash the reporter. It may be difficult to intentionally reproduce this problem since it kind of relies on actually doing the reporting, but we can inspect the code or just revert the lock part of the change and instead just do the “don’t let it pass if it doesn’t finish” part of the change.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (15 by maintainers)
Commits related to this issue
- https://github.com/dotnet/arcade/issues/7371 - retry in the case of HTTP 503 (10x, 3 seconds between attempts) , and stop trying in the case of "run is deleted" — committed to MattGal/arcade by MattGal 3 years ago
- Add retries to Azure Devops Reporter scripts (#7399) * https://github.com/dotnet/arcade/issues/7371 - retry in the case of HTTP 503 (10x, 3 seconds between attempts) , and stop trying in the case of ... — committed to dotnet/arcade by MattGal 3 years ago
- Workaround for https://github.com/dotnet/arcade/issues/7371. Have azure-pipelines reporter parse XML for failures if ADO fails for any reason, return 0 if we actually passed. — committed to MattGal/arcade by MattGal 3 years ago
- Intentionally always fail to exercise https://github.com/dotnet/arcade/issues/7371 — committed to MattGal/arcade by MattGal 3 years ago
- Workaround for https://github.com/dotnet/arcade/issues/7371 (#7421) * Workaround for https://github.com/dotnet/arcade/issues/7371. Have azure-pipelines reporter parse XML for failures if ADO fails f... — committed to dotnet/arcade by MattGal 3 years ago
- Work around https://github.com/dotnet/arcade/issues/7371 - Don't have a threading contention for stdout by only using stdout in the exception case; "starting..." adds minimal value, one can assume if... — committed to MattGal/arcade by MattGal 3 years ago
- Work around https://github.com/dotnet/arcade/issues/7371 - Don't have a threading contention for stdout by only using stdout in the exception case; "starting..." adds minimal value, one can assume if... — committed to MattGal/arcade by MattGal 3 years ago
- Work around https://github.com/dotnet/arcade/issues/7371 (#7457) - Don't have a threading contention for stdout by only using stdout in the exception case; "starting..." adds minimal value, one can a... — committed to dotnet/arcade by MattGal 3 years ago
@ericstj I may take some stabs at this one tomorrow but it’s much harder than simply adding retries to AzDO calls.
Just had a great conversation with @safern and he made a very keen insight. Specifically, if we just update the arcade reporter behavior to just return the real, not lying exit code when Azure Devops reporter fails, it’s easy to implement and gives us the best of both things; when reporting fails, the work item still can pass, and when the work item fails and reporting fails, we still fail the check, e.g.:
The error stack trace in https://github.com/dotnet/core-eng/issues/13026 doesn’t have anything to do with the lock I added. This error is happening because the interpreter can’t get the lock for “stdout” and none of my change affects locking of stdout. This error is just a very rare race condition where one of the worker threads (Thread 0x000070000f354000 in that stack trace) happens to be inside a _print call at the exact time that the process is trying to exit.
Looking at what it was trying to print. One of the worker threads never got a chance to start, before the work was all completed and the process exited.