codecov-action: Retry if upload fails

Hi, Time to time we get 503 errors while uploading the data. The log looks like this:

...
[2023-02-24T17:38:21.359Z] ['verbose'] tag
[2023-02-24T17:38:21.359Z] ['verbose'] flags
[2023-02-24T17:38:21.359Z] ['verbose'] parent
[2023-02-24T17:38:21.360Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.3.5&token=*******....
[2023-02-24T17:38:21.360Z] ['verbose'] Passed token was 36 characters long
[2023-02-24T17:38:21.360Z] ['verbose'] https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.3.5&...
        Content-Type: 'text/plain'
        Content-Encoding: 'gzip'
        X-Reduced-Redundancy: 'false'
[2023-02-24T17:38:23.332Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
[2023-02-24T17:38:23.332Z] ['verbose'] The error stack is: Error: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
    at main (/snapshot/repo/dist/src/index.js)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
[2023-02-24T17:38:23.332Z] ['verbose'] End of uploader: 3001 milliseconds

It would be great to have a retry mechanism with some defined timeout.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 51
  • Comments: 23 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Just had a similar issue, this time with error code 502. https://github.com/home-assistant/core/actions/runs/4618964416/jobs/8167147703

[2023-04-05T13:27:00.542Z] ['error'] There was an error running the uploader: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 502 - 
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>

What we ended up doing is using a retry mechanism like https://github.com/Wandalen/wretry.action to retry the upload. fail_ci_if_error set to false is not an option really if someone cares about the coverage reports.

Using this retry action in my project significantly reduces the failure count. It serves as a workaround for the time being.

But basically tokenless uploads are failing more often due to GitHub api limits.

Given that the latest version requires a token, this is not the issue that most people are reporting here, and possibly not worth the extra work to extract the retry time from the message. The primary issue is with Codecov’s servers themselves, which occasionally fail to accept an upload. As shown above (https://github.com/codecov/codecov-action/issues/926#issuecomment-1964679977) this usually suggests retrying after around 30 seconds. This issue is just asking Codecov to follow the advice from their own server.

It seems that the need for a exponential backoff automatic retry is more urgent these days. I seen

The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.

This is as clear as it sounds and implementing retry in python is not really hard.

We’ve had the following (same problem as @LucasXu0 reported above) which prevented the upload from working.

[2023-06-05T13:59:02.657Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

This looks to have been caused by a temporary GitHub API outage, but because we don’t have fail_ci_if_error enabled the coverage on our main branch became incorrect as only a portion of the required coverage data was uploaded.

I would suggest a new optional argument for codecov-action allowing a given number of retries and an inter-retry cooldown to be specified.

As a workaround, instead of performing the coverage upload as part of the same job as the build & test this can be split out into a separate job. The upload-artifact action could be used to store the raw coverage data as an artifact which a later codecov job would retrieve and upload. If the codecov upload failed then all that would need to be rerun is the failed codecov job. This job would be just the upload, so it would avoid rerunning any build/test, saving many GitHub runner minutes.

We have been experiencing a lot of similar issues as described above. The amount of jobs that fail is really getting annoying, to the state that reviewers aren’t even bothering with restarting the CI.

We’ve limited runtime for the codecov jobs to prevent them from running for hours and exhausting our CI runners. On non-open source projects, this can be quite costly when GitHub bills the org.

Anything we can provide to resolve this issue?

…/Frenck

I have attempted to add a 30 second sleep and retry and it simply isn’t enough. If a retry is to be added, it needs to be more than that to work consistently.

@imnasnainaec At the time of suggesting that workaround in here I hadn’t implemented it yet. When I did this, I also found that a repo checkout is required by codecov. I believe this is because it needs the git history. My solution can be seen here - it is very similar to what you’ve posted above. With this approach, if codecov upload fails then only a single step (which takes under 1 minute) needs to be rerun to retry the upload - no need to rerun any tests which saves us many minutes.

We’re also seeing the above mentioned 502s.

What we ended up doing is using a retry mechanism like https://github.com/Wandalen/wretry.action to retry the upload. fail_ci_if_error set to false is not an option really if someone cares about the coverage reports.

This would be very helpful.

We fixed the initial problem “Unable to locate build via Github Actions API.” using some of the suggestions in the several different dscussions.

It has been running OK for few weeks now but now we started to see different errors, such as:

[2023-03-13T18:04:08.821Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.1-uploader-0.3.5&token=*******&branch=fix%2F10518&build=4407915657&build_url=https%3A%2F%2Fgithub.com%2Fdecidim%2Fdecidim%2Factions%2Fruns%2F4407915657&commit=538d19c980fa26abebbdb736c28488a81c69ac8a&job=%5BCI%5D+Meetings+%28unit+tests%29&pr=10519&service=github-actions&slug=decidim%2Fdecidim&name=decidim-meetings&tag=&flags=decidim-meetings&parent=
[2023-03-13T18:04:27.515Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 500 - {"error": "Server Error (500)"}

And

[2023-03-13T18:16:44.977Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.1-uploader-0.3.5&token=*******&branch=fix%2F10518&build=4407915631&build_url=https%3A%2F%2Fgithub.com%2Fdecidim%2Fdecidim%2Factions%2Fruns%2F4407915631&commit=538d19c980fa26abebbdb736c28488a81c69ac8a&job=%5BCI%5D+Meetings+%28system+public%29&pr=10519&service=github-actions&slug=decidim%2Fdecidim&name=decidim-meetings-system-public&tag=&flags=decidim-meetings-system-public&parent=
[2023-03-13T18:17:15.139Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) HeadersTimeoutError: Headers Timeout Error

It would be really helpful if the codecov action waited few seconds and retried so that we don’t have to rerun the whole action which can take up to 30 mins (depending on the workflow).