milvus: [Bug]: [nightly] Test pipeline hangs for cluster mode

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 7cc995d
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): latest
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Test pipeline hangs for cluster mode

Expected Behavior

Tests executed successfully

Steps To Reproduce

Nightly:
https://ci.milvus.io:18080/jenkins/blue/rest/organizations/jenkins/pipelines/milvus-nightly-ci/branches/master/runs/512/nodes/190/steps/211/log/?start=0

Milvus Log

logs:
https://ci.milvus.io:18080/jenkins/blue/organizations/jenkins/milvus-nightly-ci/detail/master/512/artifacts/

artifacts-milvus-distributed-pulsar-master-512-pymilvus-e2e-logs.tar.gz

Some clues:

time="2022-05-05T21:01:19Z" level=warning msg="[Failed to connect to broker.]" error="dial tcp: lookup mdp-512-n-pulsar-proxy on 10.201.0.10:53: no such host" remote_addr="pulsar://mdp-512-n-pulsar-proxy:6650"

[2022/05/05 21:01:19.761 +00:00] [ERROR] [pulsar_consumer.go:122] ["failed to unsubscribe"] [subscription=by-dev-dataCoord] [error="All attempts results:\nattempt #1:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #2:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #3:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #4:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #5:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #6:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\n"] [stack="github.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar.(*Consumer).Close.func1\n\t/go/src/github.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar/pulsar_consumer.go:122\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar.(*Consumer).Close\n\t/go/src/github.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar/pulsar_consumer.go:114\ngithub.com/milvus-io/milvus/internal/mq/msgstream.(*mqMsgStream).Close\n\t/go/src/github.com/milvus-io/milvus/internal/mq/msgstream/mq_msgstream.go:199\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).handleDataNodeTimetickMsgstream.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:478\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).handleDataNodeTimetickMsgstream\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:484"]

time="2022-05-05T21:01:19Z" level=info msg="Closing consumer=1" consumerID=1 name=zsplm subscription=by-dev-dataCoord topic="persistent://public/default/by-dev-datacoord-timetick-channel"

time="2022-05-05T21:01:19Z" level=warning msg="[Failed to close consumer]" consumerID=1 error="connection closed" name=zsplm subscription=by-dev-dataCoord topic="persistent://public/default/by-dev-datacoord-timetick-channel"

[2022/05/05 21:01:19.762 +00:00] [ERROR] [server.go:475] ["Failed to close ttMessage"] [recovered="All attempts results:\nattempt #1:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #2:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #3:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #4:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #5:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\nattempt #6:topic persistent://public/default/by-dev-datacoord-timetick-channel, subscription by-dev-dataCoord: connection closed\n"] [stack="github.com/milvus-io/milvus/internal/datacoord.(*Server).handleDataNodeTimetickMsgstream.func1.1\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:475\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:965\ngithub.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar.(*Consumer).Close.func1\n\t/go/src/github.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar/pulsar_consumer.go:123\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar.(*Consumer).Close\n\t/go/src/github.com/milvus-io/milvus/internal/mq/msgstream/mqwrapper/pulsar/pulsar_consumer.go:114\ngithub.com/milvus-io/milvus/internal/mq/msgstream.(*mqMsgStream).Close\n\t/go/src/github.com/milvus-io/milvus/internal/mq/msgstream/mq_msgstream.go:199\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).handleDataNodeTimetickMsgstream.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:478\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).handleDataNodeTimetickMsgstream\n\t/go/src/github.com/milvus-io/milvus/internal/datacoord/server.go:484"]


### Anything else?

_No response_

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

working on it.