codecov-action: Retry if upload fails
Hi, Time to time we get 503 errors while uploading the data. The log looks like this:
...
[2023-02-24T17:38:21.359Z] ['verbose'] tag
[2023-02-24T17:38:21.359Z] ['verbose'] flags
[2023-02-24T17:38:21.359Z] ['verbose'] parent
[2023-02-24T17:38:21.360Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.3.5&token=*******....
[2023-02-24T17:38:21.360Z] ['verbose'] Passed token was 36 characters long
[2023-02-24T17:38:21.360Z] ['verbose'] https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.3.5&...
Content-Type: 'text/plain'
Content-Encoding: 'gzip'
X-Reduced-Redundancy: 'false'
[2023-02-24T17:38:23.332Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
[2023-02-24T17:38:23.332Z] ['verbose'] The error stack is: Error: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
at main (/snapshot/repo/dist/src/index.js)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
[2023-02-24T17:38:23.332Z] ['verbose'] End of uploader: 3001 milliseconds
It would be great to have a retry mechanism with some defined timeout.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 51
- Comments: 23 (2 by maintainers)
Commits related to this issue
- Don't fail CI on codecov upload errors Seeing a ton of HTTP 502 and other errors on codecov uploads. this should be breaking CI. see also: https://github.com/codecov/codecov-action/issues/926 — committed to asherf/python-jose by asherf a year ago
- Don't fail CI on codecov upload errors (#318) Seeing a ton of HTTP 502 and other errors on codecov uploads. this should be breaking CI. see also: https://github.com/codecov/codecov-action/issues/92... — committed to mpdavis/python-jose by asherf a year ago
- Retry requests to codecov to avoid failure of the whole pipeline on intermitten codecov backend issues. Uses a workaround provided by @LucasXu0 in https://github.com/codecov/codecov-action/issues/92... — committed to gitlabform/gitlabform by gdubicki a year ago
- Use the retry-action action to retry the codecov uploader action (#57) <!-- Please add a title in the form of a great git commit message in the imperative mood (https://cbea.ms/git-commit/) --> *... — committed to tagatac/bagoup by tagatac 7 months ago
Just had a similar issue, this time with error code 502. https://github.com/home-assistant/core/actions/runs/4618964416/jobs/8167147703
Using this retry action in my project significantly reduces the failure count. It serves as a workaround for the time being.
Given that the latest version requires a token, this is not the issue that most people are reporting here, and possibly not worth the extra work to extract the retry time from the message. The primary issue is with Codecov’s servers themselves, which occasionally fail to accept an upload. As shown above (https://github.com/codecov/codecov-action/issues/926#issuecomment-1964679977) this usually suggests retrying after around 30 seconds. This issue is just asking Codecov to follow the advice from their own server.
It seems that the need for a exponential backoff automatic retry is more urgent these days. I seen
This is as clear as it sounds and implementing retry in python is not really hard.
We’ve had the following (same problem as @LucasXu0 reported above) which prevented the upload from working.
This looks to have been caused by a temporary GitHub API outage, but because we don’t have
fail_ci_if_error
enabled the coverage on our main branch became incorrect as only a portion of the required coverage data was uploaded.I would suggest a new optional argument for
codecov-action
allowing a given number of retries and an inter-retry cooldown to be specified.As a workaround, instead of performing the coverage upload as part of the same job as the build & test this can be split out into a separate job. The upload-artifact action could be used to store the raw coverage data as an artifact which a later codecov job would retrieve and upload. If the codecov upload failed then all that would need to be rerun is the failed codecov job. This job would be just the upload, so it would avoid rerunning any build/test, saving many GitHub runner minutes.
We have been experiencing a lot of similar issues as described above. The amount of jobs that fail is really getting annoying, to the state that reviewers aren’t even bothering with restarting the CI.
We’ve limited runtime for the codecov jobs to prevent them from running for hours and exhausting our CI runners. On non-open source projects, this can be quite costly when GitHub bills the org.
Anything we can provide to resolve this issue?
…/Frenck
I have attempted to add a 30 second sleep and retry and it simply isn’t enough. If a retry is to be added, it needs to be more than that to work consistently.
@imnasnainaec At the time of suggesting that workaround in here I hadn’t implemented it yet. When I did this, I also found that a repo checkout is required by codecov. I believe this is because it needs the git history. My solution can be seen here - it is very similar to what you’ve posted above. With this approach, if codecov upload fails then only a single step (which takes under 1 minute) needs to be rerun to retry the upload - no need to rerun any tests which saves us many minutes.
We’re also seeing the above mentioned 502s.
What we ended up doing is using a retry mechanism like https://github.com/Wandalen/wretry.action to retry the upload.
fail_ci_if_error
set tofalse
is not an option really if someone cares about the coverage reports.This would be very helpful.
We fixed the initial problem “Unable to locate build via Github Actions API.” using some of the suggestions in the several different dscussions.
It has been running OK for few weeks now but now we started to see different errors, such as:
And
It would be really helpful if the codecov action waited few seconds and retried so that we don’t have to rerun the whole action which can take up to 30 mins (depending on the workflow).