rustup: Invalid cross-device link (os error 18) when upgrading on a docker OverlayFS

$ rustup update nightly
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: latest update on 2017-08-21, rust version 1.21.0-nightly (8c303ed87 2017-08-20)
info: downloading component 'rustc'
info: downloading component 'rust-std'
info: downloading component 'cargo'
info: downloading component 'rust-docs'
info: removing component 'rustc'
info: rolling back changes
error: could not rename component directory from '/root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/etc' to '/root/.rustup/tmp/x5u5mnp0hhtywco8_dir/bk'
info: caused by: Invalid cross-device link (os error 18)

std::fs::rename() basically doesn’t work on OverlayFS as far as I can tell by looking at other similar reports for various languages and projects hitting cross-device link errors on OverlayFS is boils down to using the rename syscall.

I’d like to propose wrapping the std::fs::rename() calls and if on linux detect os error 18 attempt to do a copy and delete instead. There are periodic other reports of errors like this on various platforms, the wrapper could try to handle the other OS cases too if they have a similar error code (or maybe even the same one if this is standard, I’m not sure).

Interestingly there is the bootstrap/update problem where folks who are experiencing may be unable to update their rustup install and not be able get the update that fixes the problem once there is a solution. Those folks will need to be advised to reinstall their rustup.

If the proposed solution to the problem works for the dev team, I’ll attempt to provide a PR within a week of getting the go ahead.

This is relevant because some people use a common Docker image for their CI environments that may not be updated frequently enough for beta/nightly and have rustup update $desired_env in their script. Which is how I found this problem.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 17
  • Comments: 18 (9 by maintainers)

Most upvoted comments

For those affected by this bug, see the renaming section of the kernel documentation.

@wraithan fs::rename inside std implies atomicity. For a renaming operation that doesn’t fail, we should put it in a separate crate, as copying will likely involve locking.

The rust-lang/rust issue tracker covers the standard library.

I’m experiencing this issue in Fedora Linux. The same logic works nicely in Debian-based systems like Ubuntu, but the error Invalid cross-device link (os error 18) happens in Fedora. The steps to reproduce it are:

  1. Download Edge from https://packages.microsoft.com/repos/edge/pool/main/m/microsoft-edge-stable/microsoft-edge-stable_123.0.2420.53-1_amd64.deb
  2. Extract the content of the DEB file
  3. Try to move the resulting parent folder to a different path using fs::rename()

As of rust 1.63.0 I seem to be encountering this issue again during the clippy stage. Posting the relevant log:

$ CARGO_HOME=/usr/local/cargo rustup update stable
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: latest update on 2022-08-11, rust version 1.63.0 (4b91a6ea7 2022-08-08)
info: downloading component 'clippy'
info: downloading component 'cargo'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: removing previous version of component 'clippy'
info: rolling back changes
error: could not rename component file from '/usr/local/rustup/toolchains/stable-x86_64-unknown-linux-gnu/share/doc/clippy' to '/usr/local/rustup/tmp/1vsy16kvdse0rwk9_dir/bk': Invalid cross-device link (os error 18)
Cleaning up file based variables 00:00
ERROR: Job failed: command terminated with exit code 1

Could this have creeped back in somewhere?

Copy+Delete would be exceedingly slow because the rename stuff is used in our transactional filesystem accessing code. If we had to open+open+{read,write,loop}+close+close rather than rename then our toolchain update process would become immensely slow. Perhaps we can detect that particular OS error by attempting a rename on something innocuous first, and if that fails, refuse to update a toolchain on such a filesystem. Though that would prevent the installation of new components/targets too. More thought needed, but in the short term the workaround is to either not include a toolchain in your underlying docker image, or else remove and then install the toolchain in your CI.

@CatarinaPedreira If you need to work around the issue, just remove the toolchain and install it again. I think it would avoid involving renaming across overlayfs boundary.