libelektra: Jenkins: Retry Failed Builds

Description

Currently the Jenkins build fails quite often for various reasons. This issue should list some of the problems that currently include:

.

Failures

Branch Failure Reason Failed Build Job/Stage
PR #2932 Maven build debian-unstable-clang-asan
master Homepage build Deploy Website
master Homepage build Deploy Website
PR #2945 Internal compiler error build-elektra-web-base
master Cmake install failure debian-stretch-full
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
PR #2945 Haskell build failure debian-stretch-full-optimizations-off
PR #2945 APT install failed build-elektra-website
PR #2932 Maven build debian-unstable-clang-asan
master Timeout debian-stretch-full-mmap-asan
PR #2975 Timeout debian-buster-mingw-w64
master Homepage build Deploy Website
master Homepage build Deploy Website
master Timeout debian-buster-full
master Haskell build failure debian-stretch-full-ini
master Timeout debian-unstable-full
master Failing tests debian-buster-full
master Internal compiler error build-elektra-web-base
master Homepage build Deploy Website
master Homepage build Deploy Website
master Homepage build Deploy Website
master Homepage build Deploy Website
PR #2998 Timeout, Connection problems build-elektra-web-base, debian-buster-full-i386
master Maven build debian-unstable-clang-asan
PR #2998 Timeout build-elektra-website-backend
master Connection problems build-elektra-web-base
master Homepage build Deploy Website
master Maven build debian-unstable-full-clang
master Git commit failure buildPackage/debian/buster
master Git commit failure buildPackage/debian/buster
master Git commit failure buildPackage/debian/buster, buildPackage/debian/stretch
master Git commit failure buildPackage/debian/buster
master Git commit failure buildPackage/debian/buster

Failing Tests

Test Location Times Failed
check_external_example_codegen_econf debian-buster-full 1
check_external_example_codegen_menu debian-buster-full 1
check_external_example_codegen_tree debian-buster-full 1
check_external_example_highlevel debian-buster-full 1
check_spec debian-buster-full 1
testkdb_ensure debian-buster-full 1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 36 (36 by maintainers)

Commits related to this issue

Most upvoted comments

It just Disk quota exceeded , I did not want to overkill it with memory. I cleaned it up now. Its up again.

I updated the Node, again, there shouldnt be any permission issues anymore.

Looks like docker pull fails on hetzner-jenkins1, since the node has not enough free space:

Node updated.

I updated the node. It should work now. If something goes wrong you can update me here again.

The failures with docker pull failing in the website stage occurs quite often now.

What do you think about #3224?

Trying to update the start post or trying to fix all these issues is hopeless. We need automatic retrying. I hope @Mistreated will implement this soon on our new server.

Yet another error in https://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/12/pipeline/

Step 12/31 : RUN curl -o cppcms-${CPPCMS_VERSION}.tar.bz -L         "https://sourceforge.net/projects/cppcms/files/cppcms/${CPPCMS_VERSION}/cppcms-${CPPCMS_VERSION}.tar.bz2/download"     && tar -xjvf cppcms-${CPPCMS_VERSION}.tar.bz     && mkdir cppcms-${CPPCMS_VERSION}/build     && cd cppcms-${CPPCMS_VERSION}/build     && cmake ..     && make -j ${PARALLEL}     && make install     && cd /app/deps     && rm -Rf cppcms-${CPPCMS_VERSION}

 ---> Running in f5ed5e42a480

curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

The command '/bin/sh -c curl -o cppcms-${CPPCMS_VERSION}.tar.bz -L         "https://sourceforge.net/projects/cppcms/files/cppcms/${CPPCMS_VERSION}/cppcms-${CPPCMS_VERSION}.tar.bz2/download"     && tar -xjvf cppcms-${CPPCMS_VERSION}.tar.bz     && mkdir cppcms-${CPPCMS_VERSION}/build     && cd cppcms-${CPPCMS_VERSION}/build     && cmake ..     && make -j ${PARALLEL}     && make install     && cd /app/deps     && rm -Rf cppcms-${CPPCMS_VERSION}' returned a non-zero code: 92

script returned exit code 92

I am afraid https://wiki.jenkins.io/display/JENKINS/Naginator+Plugin is the only bigger step forwards.

Unfortunately, it will not fix the problems for Travis or Cirrus.

Two more tests that sometimes fail (#3168):

 27/134 MemCheck  #23: testcpp_contextual_thread ........***Exception: Other  2.59 sec
Running main() from /opt/gtest/googletest/src/gtest_main.cc
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from test_contextual_thread
[ RUN      ] test_contextual_thread.instanciation

/home/jenkins/workspace/libelektra_PR-3168-L5JHIPUUQR3TWFGKHQIDK6HHW6QAMSQXWJC5ZUZMBLDMLTYA2ENA@2/src/bindings/cpp/tests/testcpp_contextual_thread.cpp:70: Failure

Expected equality of these values:
  ks.lookup ("user/hello").getString ()
    Which is: "8"
  "5"
terminate called without an active exception
60/254 Test  #57: testio_glib .................................***Failed    5.08 sec

BINDING TEST-SUITE

==================

test basics
test idle
test timer
testTimerShouldCallbackOnce (warning): measured 316ms, expected 250ms - deviation 66ms.
testTimerShouldCallbackAtIntervals (warning): measured 343ms, expected 250ms - deviation 93ms.
testTimerShouldCallbackAtIntervals (warning): measured 322ms, expected 250ms - deviation 72ms.
testTimerShouldCallbackAtIntervals (warning): measured 338ms, expected 250ms - deviation 88ms.
../src/bindings/io/test/test_timer.c:273: error in testTimerShouldChangeInterval: timer was not called the required amount of times
test file descriptor
test mix

Looks like building Docker images does not work on hetzner-jenkins1:

stderr: error: could not lock config file .git/config: Disk quota exceeded

. I disabled the node.

Build jobs on hetzner-jenkins1 seem to fail, because of permission related problems:

Resource: Could not create directory ‘/.config’. Reason: Permission denied. Identity: uid: 47000, euid: 47000, gid: 47000, egid: 47000

.

If something goes wrong you can update me here again.

Looks like docker pull fails on hetzner-jenkins1, since the node has not enough free space:

Cannot contact hetzner-jenkins1: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel failed to register layer: ApplyLayer exit status 1 stdout: stderr: write /usr/lib/git-core/git-credential-store: disk quota exceeded

.

After a bit of struggling I managed to add a new Jenkins Node.

Thank you for adding the new Jenkins node. I disabled the node for now, since it seems to break the build.

I think our best guess to make our lives much easier is to “fix” these problems using https://wiki.jenkins.io/display/JENKINS/Naginator+Plugin

Then Jenkins will restart failed jobs several times. I think we could try 5 restarts before giving up?

@Mistreated Can you implement this also on the old server? Or is this too risky?

Before we implement this, however, we need the new Jenkins Node as otherwise the queue will get too long.

Can you please report that separately?

Done, see #3086

For issues related to source code I agree. For the issues related to docker/jenkins instability it is enough to collect issues here as it is very limited what we can do next to the migration we already do but unfortunately takes longer as expected. It would be nice if @Mistreated could give more information about the status, maybe in #160.

@sanssecours is there some procedure how to add new tests in the above list?

Nope. I already gave up on modifying the list, since the Jenkins build fails too often. I would recommend we just open an issue for each specific problem.

Yes, I agree test_service_convertengine is not reported here yet. Actually we can disable the test as the service is not modified anyway.

@sanssecours is there some procedure how to add new tests in the above list?

The failures with docker pull failing in the website stage occurs quite often now.

Is this all the retrying and waiting after Pulling from build-elektra-web-base (log)?

Additionally, I think this error is new: test_service_convertengine fails during Starting build/hub.libelektra.org/build-elektra-website-backend (log 2)

I just got connection problems for build-elektra-web-base, too.

3d070e3209ce: Retrying in 1 second

error creating overlay mount to /home/_docker/overlay2/e9563564b9365114c47d90b7e8d307565225097a525e6b1b866a2da2877b2aa8/merged: device or resource busy

script returned exit code 1

This is a full log.

For the Haskell problems we can remove the haskell bindings/plugins. They are not maintained anyway.

For the maven builds we already have an issue: #2855

I know 😊. I already added a link in the issue description.