tiflow: DM worker can't exit in CI

Which jobs are flaking?

IT-ha_cases3_1

Which test(s) are flaking?

DM IT

Jenkins logs or GitHub Actions link

[2022-03-31T09:47:44.177Z] kill dm-worker2
[2022-03-31T09:47:44.177Z] kill: sending signal to 1344 failed: No such process
[2022-03-31T09:47:44.177Z] [Thu Mar 31 17:47:44 CST 2022] <<<<<< START DM-WORKER on port 8263, config: /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/tests/ha_cases3_1/conf/dm-worker2.toml >>>>>>
[2022-03-31T09:47:44.438Z] dmctl test cmd: "pause-task test"
[2022-03-31T09:47:47.719Z] got=3 expected=3
[2022-03-31T09:47:47.974Z] dmctl test cmd: "resume-task test"
[2022-03-31T09:47:47.974Z] got=3 expected=3
[2022-03-31T09:49:39.369Z] use sync_diff_inspector to check increment data
[2022-03-31T09:49:39.369Z] check diff successfully
[2022-03-31T09:49:39.369Z] [Thu Mar 31 17:49:29 CST 2022] <<<<<< finish test_isolate_master_and_worker >>>>>>
[2022-03-31T09:49:39.369Z] 5 dm-master alive
[2022-03-31T09:49:39.369Z] 6 dm-worker alive
[2022-03-31T09:49:39.369Z] 0 dm-syncer alive
...
[2022-03-31T09:50:11.859Z] wait process dm-worker.test exit...
[2022-03-31T09:50:12.794Z] wait process dm-worker.test exit...
[2022-03-31T09:50:13.728Z] wait process dm-worker.test exit...
[2022-03-31T09:50:14.661Z] wait process dm-worker.test exit...
[2022-03-31T09:50:14.661Z] process dm-worker.test didn't exit after 30 seconds

DM worker log : worker.log

Anything else we need to know

  • Does this test exist for other branches as well?

  • Has there been a high frequency of failure lately?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

current processlist: jenkins 9647 0 10 12:58 ? 00:00:15 /home/jenkins/agent/w

seems a zombie process, but it’s ppid is 0, should be 1 normally, rare but possible, see https://stackoverflow.com/questions/37577931/how-can-i-kill-a-zombie-process-whose-parent-is-pid-0