tarpaulin: tarpaulin stopped working for veloren
Hi xd009642,
Veloren stopped working with tarpaulin a few weeks ago. And i am investigating what caused this problem. I am currently debugging tarpaulin and wonder if i found a bug.
Situation. When you compile veloren (steps to reproduce below) you’ll see that the compilcation will never finish. it will just halt. I attached gdb to my local cargo-tarpaulin, wait for it to get stuck and i’ll see:
(gdb) bt
#0 0x00007ffff7a2feaa in wait4 () from /usr/lib/libc.so.6
#1 0x000055555639b270 in std::sys::unix::process::process_inner::Process::wait::{{closure}} () at src/libstd/sys/unix/process/process_unix.rs:438
#2 std::sys::unix::cvt_r () at src/libstd/sys/unix/mod.rs:152
#3 std::sys::unix::process::process_inner::Process::wait () at src/libstd/sys/unix/process/process_unix.rs:438
#4 std::process::Child::wait () at src/libstd/process.rs:1394
#5 0x00005555559a80ff in cargo_tarpaulin::cargo::run_cargo (metadata=0x7fffffff4e28, manifest=..., config=0x55555681c9f0, ty=..., result=0x7fffffff4df0) at src/cargo.rs:160
#6 0x00005555559a6eb9 in cargo_tarpaulin::cargo::get_tests (config=0x55555681c9f0) at src/cargo.rs:116
#7 0x00005555558b83ab in cargo_tarpaulin::launch_tarpaulin (config=0x55555681c9f0, logger=0x7fffffff6cd8) at src/lib.rs:144
#8 0x00005555558b50e9 in cargo_tarpaulin::trace (configs=...) at src/lib.rs:65
#9 0x00005555558b6593 in cargo_tarpaulin::run (configs=...) at src/lib.rs:100
#10 0x00005555556a5125 in cargo_tarpaulin::main () at src/main.rs:194
Line 5
suggests its stuck in src/cargo.rs:160
.
Looking at the code: https://github.com/xd009642/tarpaulin/blob/develop/src/cargo.rs#L160
Okay my guess is, that tarpaulin just calls the system cargo here, and tarpaulin does just wait on cargo. So maybe the problem is not with tarpaulin but with cargo ???
Okay, i want to investigate further and analyse what cargo is doing. so i grap the PID from the child and attach a gdb there:
Sep 29 14:30:05.716 TRACE cargo_tarpaulin::cargo: Running command "cargo" "test" "--no-run" "--message-format" "json" "--manifest-path" "/mnt/nfs/marcel/Entw/Rust/veloren/Cargo.toml" "--tests" "-vvv" "--target-dir" "/mnt/games/cargo-build/"
[Detaching after vfork from child process 167525]
GDB of the 2nd process, the cargo process:
^C
Thread 1 "cargo" received signal SIGINT, Interrupt.
0x00007ffff7e99f9f in write () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff7e99f9f in write () from /usr/lib/libc.so.6
#1 0x0000555555fc9e6d in std::sys::unix::fd::FileDesc::write () at library/std/src/sys/unix/fd.rs:139
#2 <std::sys::unix::stdio::Stdout as std::io::Write>::write () at library/std/src/sys/unix/stdio.rs:38
#3 <std::io::stdio::StdoutRaw as std::io::Write>::write () at library/std/src/io/stdio.rs:117
#4 <std::io::buffered::BufWriter<W> as std::io::Write>::write () at library/std/src/io/buffered.rs:758
#5 <std::io::buffered::LineWriterShim<W> as std::io::Write>::write () at library/std/src/io/buffered.rs:1003
#6 <std::io::buffered::LineWriter<W> as std::io::Write>::write () at library/std/src/io/buffered.rs:1396
#7 <std::io::stdio::StdoutLock as std::io::Write>::write () at library/std/src/io/stdio.rs:642
#8 0x0000555555fc97c7 in <&std::io::stdio::Stdout as std::io::Write>::write () at library/std/src/io/stdio.rs:616
#9 <std::io::stdio::Stdout as std::io::Write>::write () at library/std/src/io/stdio.rs:590
#10 0x000055555584137a in <termcolor::LossyStandardStream<W> as std::io::Write>::write ()
#11 0x0000555555811dc5 in std::io::Write::write_all ()
#12 0x0000555555714b81 in <std::io::Write::write_fmt::Adaptor<T> as core::fmt::Write>::write_str ()
#13 0x0000555555fff13c in core::fmt::write () at library/core/src/fmt/mod.rs:1080
#14 0x0000555555812105 in std::io::Write::write_fmt ()
#15 0x000055555594a466 in cargo::core::compiler::job_queue::DrainState::drain_the_queue ()
#16 0x00005555558510b3 in std::panic::catch_unwind ()
#17 0x00005555557c9197 in crossbeam_utils::thread::scope ()
#18 0x0000555555948195 in cargo::core::compiler::job_queue::JobQueue::execute ()
#19 0x0000555555794c7f in cargo::core::compiler::context::Context::compile ()
#20 0x0000555555726bdc in cargo::ops::cargo_compile::compile_ws ()
#21 0x000055555572693e in cargo::ops::cargo_compile::compile ()
#22 0x0000555555addd22 in cargo::ops::cargo_test::compile_tests ()
#23 0x0000555555adaadf in cargo::ops::cargo_test::run_tests ()
#24 0x00005555556be3fd in cargo::commands::test::exec ()
#25 0x00005555556da8e2 in cargo::cli::main ()
#26 0x0000555555677008 in cargo::main ()
#27 0x00005555556ca223 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#28 0x00005555556ca489 in std::rt::lang_start::{{closure}} ()
#29 0x0000555555fd93e0 in core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once () at /rustc/623fb90b5a1f324e0ec44085116bf858cef19a00/library/core/src/ops/function.rs:259
#30 std::panicking::try::do_call () at library/std/src/panicking.rs:381
#31 std::panicking::try () at library/std/src/panicking.rs:345
#32 std::panic::catch_unwind () at library/std/src/panic.rs:382
#33 std::rt::lang_start_internal () at library/std/src/rt.rs:51
#34 0x0000555555679b32 in main ()
and i’ll see that the cargo process started by tarpaulin is waiting on some write.
Which sounds really strange to me. I checked my harddrive, but it’s not writing to disk. also in line 8 i see stdio
which lets me to my hypothesis: Running the cargo command manually, also finished. but running it from tarpaulin doesn’t finish. which seems super weird to me.
tarpaulin starts cargo and waits for its result.
cargo wants to write something in a pipe back to tarpaulin. but this pipe is full. cargo waits for this pipe is cleared.
tarpaulin doesn’t read from the pipe but wants to wait
till the process is finished…
Voila we got a deadlock.
this would only occur on big projects (like veloren is) and could be solved by tarpaulin not just waiting, but also storing the stdio somewhere.
Do you think that could be a realistic root cause ?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 18 (14 by maintainers)
@dlaehnemann IIRC tarpalin was fixed so that it no longer deadlocks on this error (for us it was a deadlock, not a timeout). And we have a similar error in our pipeline now, i think the rustlang error posted above is a reason for this error
I don’t know how to solve that feature issue, a quick google pointed me towards an RFC so it might not be possible. Also, the fix I’m working on changes the bit of code where your PR is and might make it unneeded, I was just going to run some tests first 👀
@xd009642 i created a minimal reproduction scenario (just include
vek
withplatform_intrinsics
https://github.com/xMAC94x/testtarpaulin it shows additional logs with the PR i just created.