bazel: Bazel does not notice compilation action finished and waits for it forever

Description of the problem / feature request:

Occasionally (for me specifically it happens once in a day or two) Bazel build gets into the state when it runs forever and never finishes. When this happens you can see something like the following on the console:

[17,773 / 17,796] Compiling .../tabui/main/calculationeditor/qtdialogs/CalculationDialogWidget.cpp; 4150s local
[17,773 / 17,796] Compiling .../tabui/main/calculationeditor/qtdialogs/CalculationDialogWidget.cpp; 4153s local
[17,773 / 17,796] Compiling .../tabui/main/calculationeditor/qtdialogs/CalculationDialogWidget.cpp; 4159s local
[17,773 / 17,796] Compiling .../tabui/main/calculationeditor/qtdialogs/CalculationDialogWidget.cpp; 4160s local

Pay attention to ridiculously high number of seconds the compilation is going. Looking at the produced object files I can tell that compilation has finished long ago (object file produced and correct), but somehow Bazel does not notice it is done and waits for it to finish forever. Another observation: when this happens one must hit Ctrl+C three times to terminate Bazel server. After the first and second time it just continue incrementing the seconds.

As you can imagine it is a major bug because it causes CI machines to “hung” occasionally on such runaway builds. I wonder if there is a way to set timeout on the compilation action to at least mitigate it and avoid complete hanging.

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

We could not reproduce this problem on demand.

What operating system are you running Bazel on?

We run this build on Mac, Linux and Windows, but the bug seems to be specific to Windows.

What’s the output of bazel info release?

5.0.0

Have you found anything relevant by searching the web?

I have found the issue https://github.com/bazelbuild/bazel/issues/4216 which manifested itself in exactly the same way, but that bug was specific to Linux and got fixed in 2017 after Linux kernel update. Don’t know how relevant it may be.

Any other information, logs, or outputs that you want to share?

I preserved java.log when it happened today and shared it on my OneDrive here. You can see plenty of errors at the end of the log, but most of them I believe caused by me hitting Ctrl+C three times to stop the build.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 20 (16 by maintainers)

Most upvoted comments

Ok, so this might be relevant for you in case it comes up again. I was running into a problem where vctip.exe (aka “Microsoft® VC compiler and tools experience improvement data uploader”) was becoming an unstoppable zombie process. It gets launched by cl.exe for some reason, and sometimes would just get into this bad state.

Other people have encountered this problem in the past:

and found that it worked to simply delete vctip.exe. So I tried doing that about three months ago, and haven’t seen the issue since.