milvus: [Bug]: [cluster] Nightly test case hangs for panic in querynode and datacoord

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 3eb5cc4
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.0.0rc10.dev11
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

test case hangs for cluster mode

Expected Behavior

All the test cases executed successfully

Steps To Reproduce

Running nightly: [cluster mode]
https://ci.milvus.io:18080/jenkins/blue/organizations/jenkins/milvus-nightly-ci/detail/master/305/pipeline

Anything else?

Tests hangs and there is no logs generated at this point, but the environment is still existed, so could login the machine to have some check, thanks.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (15 by maintainers)

Most upvoted comments

@DragonDriver rootcoord crashes for many times. please help to check https://ci.milvus.io:18080/jenkins/blue/organizations/jenkins/milvus-nightly-ci/detail/master/311/pipeline

[2022/01/21 15:23:38.364 +00:00] [ERROR] [task.go:1701] ["loadBalanceTask: show collection's partitionIDs failed"] [collectionID=430644467046023169] [error="ShowPartitions failed: can't find collection id : 430644467046023169"] [stack="github.com/milvus-io/milvus/internal/querycoord.(*loadBalanceTask).execute\n\t/go/src/github.com/milvus-io/milvus/internal/querycoord/task.go:1701\ngithub.com/milvus-io/milvus/internal/querycoord.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/querycoord/task_scheduler.go:550\ngithub.com/milvus-io/milvus/internal/querycoord.(*TaskScheduler).scheduleLoop\n\t/go/src/github.com/milvus-io/milvus/internal/querycoord/task_scheduler.go:627"]

[2022/01/21 15:23:38.365 +00:00] [DEBUG] [task.go:1977] ["loadBalanceTask postExecute done"] ["trigger type"=4] [sourceNodeIDs="[7]"] [balanceReason=NodeDown] [taskID=430644161585087469]

panic: ShowPartitions failed: can't find collection id : 430644467046023169



goroutine 200 [running]:

github.com/milvus-io/milvus/internal/querycoord.(*loadBalanceTask).execute(0xc0008765a0, 0x3595b60, 0xc001170960, 0x0, 0x0)

	/go/src/github.com/milvus-io/milvus/internal/querycoord/task.go:1703 +0x5cbe

github.com/milvus-io/milvus/internal/querycoord.(*TaskScheduler).processTask(0xc00026cdc0, 0x35bcc80, 0xc0008765a0, 0x0, 0x0)

	/go/src/github.com/milvus-io/milvus/internal/querycoord/task_scheduler.go:550 +0x6d4

github.com/milvus-io/milvus/internal/querycoord.(*TaskScheduler).scheduleLoop(0xc00026cdc0)

	/go/src/github.com/milvus-io/milvus/internal/querycoord/task_scheduler.go:627 +0x12bf

created by github.com/milvus-io/milvus/internal/querycoord.(*TaskScheduler).Start

	/go/src/github.com/milvus-io/milvus/internal/querycoord/task_scheduler.go:868 +0x65

querycoord panicd when rootcoord drop collection, and rootcoord release collection failed when drop collection. So after querycoord resarted, it called rootcoord.showpartition() and get error message, then querycoord will panic after every reboot

[2022/01/21 15:24:31.278 +00:00] [ERROR] [client.go:254] ["querycoord ClientBase ReCall grpc second call get error "] [error="err: number of querycoord is incorrect, 0\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:253 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:180 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ReleaseCollection\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:823 github.com/milvus-io/milvus/internal/rootcoord.(*Core).SetQueryCoord.func2\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:364 github.com/milvus-io/milvus/internal/rootcoord.(*DropCollectionReqTask).Execute\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:60 github.com/milvus-io/milvus/internal/rootcoord.executeTask.func1\n/usr/local/go/src/runtime/asm_amd64.s:1374 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:254\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ReleaseCollection\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:180\ngithub.com/milvus-io/milvus/internal/rootcoord.(*Core).SetQueryCoord.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:823\ngithub.com/milvus-io/milvus/internal/rootcoord.(*DropCollectionReqTask).Execute\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:364\ngithub.com/milvus-io/milvus/internal/rootcoord.executeTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:60"]

[2022/01/21 15:24:31.278 +00:00] [ERROR] [task.go:365] ["Failed to CallReleaseCollectionService"] [error="err: number of querycoord is incorrect, 0\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:253 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:180 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ReleaseCollection\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:823 github.com/milvus-io/milvus/internal/rootcoord.(*Core).SetQueryCoord.func2\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:364 github.com/milvus-io/milvus/internal/rootcoord.(*DropCollectionReqTask).Execute\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:60 github.com/milvus-io/milvus/internal/rootcoord.executeTask.func1\n/usr/local/go/src/runtime/asm_amd64.s:1374 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*DropCollectionReqTask).Execute\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:365\ngithub.com/milvus-io/milvus/internal/rootcoord.executeTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:60"]

[2022/01/21 15:24:31.278 +00:00] [ERROR] [root_coord.go:1318] ["DropCollection failed"] [role=rootcoord] ["collection name"=search_collection_hR8hZgTI] [msgID=430644455249584592] [error="err: number of querycoord is incorrect, 0\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:253 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:180 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ReleaseCollection\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:823 github.com/milvus-io/milvus/internal/rootcoord.(*Core).SetQueryCoord.func2\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:364 github.com/milvus-io/milvus/internal/rootcoord.(*DropCollectionReqTask).Execute\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/task.go:60 github.com/milvus-io/milvus/internal/rootcoord.executeTask.func1\n/usr/local/go/src/runtime/asm_amd64.s:1374 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*Core).DropCollection\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:1318\ngithub.com/milvus-io/milvus/internal/distributed/rootcoord.(*Server).DropCollection\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/service.go:344\ngithub.com/milvus-io/milvus/internal/proto/rootcoordpb._RootCoord_DropCollection_Handler.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proto/rootcoordpb/root_coord.pb.go:879\ngithub.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tracing/opentracing/server_interceptors.go:38\ngithub.com/milvus-io/milvus/internal/proto/rootcoordpb._RootCoord_DropCollection_Handler\n\t/go/src/github.com/milvus-io/milvus/internal/proto/rootcoordpb/root_coord.pb.go:881\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:934"]