rustup: Flaky conflict installing `rust-src` on GitHub actions runners

On the PyO3 CI we’re hitting a flaky issue installing rust-src as part of setup steps on a GitHub actions runner. (Via dtolnay/rust-toolchain action). E.g. https://github.com/PyO3/pyo3/actions/runs/6829368772/job/18575387359#step:5:111

error: failed to install component: 'rust-src', detected conflict: 'lib/rustlib/src/rust/Cargo.lock'

I think we’ve been encountering this for a little while at a very low probability, but since yesterday this was failing at a probably of maybe 2-3%. Still low, but because we want the whole build matrix to succeed, just one job failing will fail the merge. Restarting the CI doesn’t help us much, because we get a different job failing with the same error.

See https://github.com/PyO3/pyo3/pull/3570 for a repeated chain of failed merges hitting this.

Any insight you can offer to help resolve this would be greatly appreciated. Given the flakiness, it feels like a cache issue, but at the point of failing install I don’t think we’ve restored anything from cache.

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 1
  • Comments: 24 (9 by maintainers)

Most upvoted comments

Another point to note is that this error only occurs during workflows that call maturin build, and never in other workflows.

That’s not the case for us, we had this problem on a job which just installed and ran rustfmt.

  • rustup appears to have some concurrency issues.

Sounds like #988

Oh, it has been on my watch list for so long, but I tend to avoid declaring everything as a duplicate of #988 (^_^') before the evidence is found.

Now, however, I’m fully convinced that is indeed another duplicate. Thanks for all your comments!

PS: #988 is actually the next target for me after #3483, however as you can see I’m a bit occupied by the graduation stuff right now. I’ll definitely have a deeper look into it after finishing all that 😃

  • rustup appears to have some concurrency issues.

Sounds like #988

After PR https://github.com/apache/incubator-opendal/pull/3633, we didn’t re-trigger this issue again. Thanks for @messense’s idea!

So the conclusion is:

  • rustup appears to have some concurrency issues. Perhaps we need a file lock when users invoke cargo on an unset rust-toolchain? cc @rami3l, please try calling cargo concurrently in this case.
  • Calling a simple cargo version before we start our workflow can avoid such thing happen. cc @davidhewitt, please consider add such step as a pre-action.

Setup Rust toolchain in OpenDAL CI just perform works like set RUSTFLAGS, RUST_BACKTRACE and CARGO_REGISTRIES_CRATES_IO_PROTOCOL, no rust toolchain been setup.

Though off topic here, but that means the step name is just confusing.

Anyway you should definitely try actually setup Rust before invoking pip/maturin.

It also seems that our rust-toolchain.toml was part of the problem. It seemed to be triggering rustup to check if rust-src was installed multiple times over.

Removing it helped us get to a green CI run again: https://github.com/PyO3/pyo3/pull/3575

@davidhewitt Would you mind helping me record the output of rustup component list --installed and the status of lib/rustlib/src before setting up Rust in your failed CI? I suspect there is an incoherence problem. Thanks!