tiflow: DM worker can't exit in CI
Which jobs are flaking?
IT-ha_cases3_1
Which test(s) are flaking?
DM IT
Jenkins logs or GitHub Actions link
[2022-03-31T09:47:44.177Z] kill dm-worker2
[2022-03-31T09:47:44.177Z] kill: sending signal to 1344 failed: No such process
[2022-03-31T09:47:44.177Z] [Thu Mar 31 17:47:44 CST 2022] <<<<<< START DM-WORKER on port 8263, config: /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/tests/ha_cases3_1/conf/dm-worker2.toml >>>>>>
[2022-03-31T09:47:44.438Z] dmctl test cmd: "pause-task test"
[2022-03-31T09:47:47.719Z] got=3 expected=3
[2022-03-31T09:47:47.974Z] dmctl test cmd: "resume-task test"
[2022-03-31T09:47:47.974Z] got=3 expected=3
[2022-03-31T09:49:39.369Z] use sync_diff_inspector to check increment data
[2022-03-31T09:49:39.369Z] check diff successfully
[2022-03-31T09:49:39.369Z] [Thu Mar 31 17:49:29 CST 2022] <<<<<< finish test_isolate_master_and_worker >>>>>>
[2022-03-31T09:49:39.369Z] 5 dm-master alive
[2022-03-31T09:49:39.369Z] 6 dm-worker alive
[2022-03-31T09:49:39.369Z] 0 dm-syncer alive
...
[2022-03-31T09:50:11.859Z] wait process dm-worker.test exit...
[2022-03-31T09:50:12.794Z] wait process dm-worker.test exit...
[2022-03-31T09:50:13.728Z] wait process dm-worker.test exit...
[2022-03-31T09:50:14.661Z] wait process dm-worker.test exit...
[2022-03-31T09:50:14.661Z] process dm-worker.test didn't exit after 30 seconds
DM worker log : worker.log
Anything else we need to know
-
Does this test exist for other branches as well?
-
Has there been a high frequency of failure lately?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (19 by maintainers)
Commits related to this issue
- worker(dm): fix deadlock (#5427) close pingcap/tiflow#5089 — committed to pingcap/tiflow by lance6716 2 years ago
- test(dm): show full result of ps on kill fail (#5667) ref pingcap/tiflow#5089 — committed to pingcap/tiflow by D3Hunter 2 years ago
- test(dm): fix unstable tests (#5865) ref pingcap/tiflow#5089, close pingcap/tiflow#5746, close pingcap/tiflow#5793 — committed to pingcap/tiflow by lance6716 2 years ago
- test(dm): fix unstable tests (#5865) (#5872) ref pingcap/tiflow#5089, close pingcap/tiflow#5746, close pingcap/tiflow#5793 — committed to pingcap/tiflow by ti-chi-bot 2 years ago
- test(dm): fix unstable tests (#5865) (#5870) ref pingcap/tiflow#5089, close pingcap/tiflow#5746, close pingcap/tiflow#5793 — committed to pingcap/tiflow by ti-chi-bot 2 years ago
- worker(dm): fix Server Start/Close race (#6213) close pingcap/tiflow#5089, close pingcap/tiflow#5836 — committed to pingcap/tiflow by lance6716 2 years ago
- worker(dm): fix Server Start/Close race (#6213) (#6292) close pingcap/tiflow#5089, close pingcap/tiflow#5836 — committed to pingcap/tiflow by ti-chi-bot 2 years ago
- test(dm): fix unstable tests (#5865) (#5871) ref pingcap/tiflow#5089, close pingcap/tiflow#5746, close pingcap/tiflow#5793 — committed to pingcap/tiflow by ti-chi-bot 2 years ago
seems a zombie process, but it’s ppid is 0, should be 1 normally, rare but possible, see https://stackoverflow.com/questions/37577931/how-can-i-kill-a-zombie-process-whose-parent-is-pid-0