gitoxide: gix-lock locked access forever and could not recover

Current behavior 😯

I’ve upgraded rustsec crate to a version using gix, and after a while (probably after an unexpected restart) every fetch of rustsec’s git database has been failing due to being locked forever.

unable to acquire filesystem lock: directory “{resource_path:?}” still locked after {attempts} attempts

BTW: the error was literally with the placeholders like that.

I’ve been restarting and retrying the process many times, and it was locked up every time.

I’ve noticed that in the parent dir of the path given to rustsec’s Repository::fetch there was a 0-byte file rustsec.rustsec.lock. Deleting the file unblocked rustsec’s fetch.

Backtrace of the process while it was waiting:

  frame #2: 0x0000aaaae1d97d2c crates-server`std::thread::sleep::h55cdaca4ca4be380 [inlined] std::sys::unix::thread::Thread::sleep::hfafd94621608adaa at thread.rs:237:20
    frame #3: 0x0000aaaae1d97d00 crates-server`std::thread::sleep::h55cdaca4ca4be380 at mod.rs:872:5
    frame #4: 0x0000aaaae1030418 crates-server`gix_lock::acquire::lock_with_mode::h11c2d5ec4a114104 at acquire.rs:123:25
    frame #5: 0x0000aaaae1098aac crates-server`rustsec::repository::git::repository::Repository::fetch::h099db295696cce7e [inlined] gix_lock::acquire::_$LT$impl$u20$gix_lock..Marker$GT$::acquire_to_hold_resource::hd64be3ad5ec63f0d at acquire.rs:78:35
    frame #6: 0x0000aaaae1098a8c crates-server`rustsec::repository::git::repository::Repository::fetch::h099db295696cce7e at repository.rs:113:21

Expected behavior 🤔

I think a lockfile has to include and check process pid, because a crash of the process can leave a lockfile behind, and it has to have a way to recover.

Also parent dir of the git checkout is a permanent location, so such lock will survive reboots. On Linux I’d expect lockfiles in /var/run.

Steps to reproduce 🕹

touch example_dir/rustsec.rustsec.lock

use rustsec::*;
Repository::fetch(DEFAULT_URL, "example_dir/rustsec", true, Duration::from_secs(60))?;

About this issue

Original URL
State: closed
Created 9 months ago
Comments: 17 (10 by maintainers)

Commits related to this issue

doc: Improve documentation to make possibility of leaks more clear (#1026) — committed to Byron/gitoxide by Byron 9 months ago
doc: make clear that locks can leak and what that means on acquiring a lock (#1026) — committed to Byron/gitoxide by Byron 9 months ago

Most upvoted comments

Thanks for sharing, I am happy this finally gets resolved. Of course I hope that tame-index will spin off its cargo-inspired lock implementation into their own crate, I definitely want more of these, @Jake-Shadle 😃 🙏.

Byron on Oct 1, 2023

I don’t think this behavior should be controlled by a Cargo feature. Features are additive and global, so any dependency anywhere in the tree enabling this feature will affect all the other dependencies. If any configuration options are provided, they should be done as an argument to a function, or as separate functions.

Acquiring two locks does not really solve the problem of the stale lock being left behind, so I don’t think that is workable. Within the constraints of git compatibility, I believe the best we can do is write the process ID into the lockfile and check that the process with the ID from the file is still alive when trying to acquire the lock. And maybe also something like boot time, to catch instances of the PID being reused after a reboot.

I’ve looked at the Cargo implementation and it just uses OS locks everywhere, forgoing git locks and all the issues associated with them. It does seem that for crates like rustsec that do not need to coordinate with git CLI it is best to do what Cargo does and just use OS locks instead of git lockfiles.

Shnatsel on Sep 21, 2023

Unfortunately the current gix locks are problematic by design - there is no way to recover from a SIGKILL delivered to the process. Once that happens, the lock remains stale forever.

Requiring manual intervention, like git CLI does, would be very problematic on servers - it would be a constanty source of outages.

There are two ways to go about this:

Switch to fslock crate that avoids these issues by using OS-provided locks instead of lock files.
Keep piling on more features on top of manually implemented lock files. Specifically write the PID to the file and have other processes check if that PID is still alive; and write the locks in a directory that is cleared on reboot, as suggested by @kornelski

I was trying to think of downsides of (1), but I could not find any that don’t also apply to (2). Both won’t work on truly obscure platforms such as Fuchsia, and both won’t work over NFS (but there is no sane way to lock over NFS anyway).

So I suggest using fslock crate that translates to flock() on Unix and handles all the edge cases automatically - the lock is released once the process terminates even with SIGKILL, and is cleared on reboot.

Shnatsel on Sep 21, 2023

I’ve just published rustsec v0.28.3 that switches to OS locking and resolves the issue.

Shnatsel on Oct 14, 2023

tame-index has resolved this by using OS locking: https://github.com/EmbarkStudios/tame-index/pull/33

I’ve opened a PR to switch from gix locks to tame-index’s OS locking implementation in rustsec too: https://github.com/rustsec/rustsec/pull/1032

Shnatsel on Sep 30, 2023