dvc: repro: rwlock file corrupted
Bug Report
Issue name
repro: rwlock file corrupted
Description
Run multiple pipelines that read the same file as input, in the middle of the process rwlock file get corrupted
Exception
Reproduce
We execute a huge amount of pipelines that share the same file as input, all processes executed on a computer cluster with GPU support, but I don’t think GPU has any impact on this bug. It seems like concurrency bug that hard to reproduce. rwlock file See last line in this file, it’s reformatted and renamed JSON for easy navigation
Environment information
DVC version: 2.0.18 (pip)
---------------------------------
Platform: Python 3.8.8 on Linux-4.18.0-240.15.1.el8_3.x86_64-x86_64-with-glibc2.10
Supports: http, https, s3
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (7 by maintainers)
Hi,
dvc config core.hardlink_lock true
did a great job and we succeeded do finish the bulk of running jobs without a crash. Now i am going to check if rwlock issue also solved by this change, will updatesure, will try. 10x