milvus: [Bug]: Deploy test all failed due to timeout of second test after reinstall or upgrade

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20220607-af108b5b
- Deployment mode(standalone or cluster): both
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.1.0.dev69
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

all task timeout when running the second test after reinstall or upgrade.

After repeatedly outputting some logs, the proxy has no log output. Then stop the client program manually and you will get the error reported.

[2022/06/08 03:47:28.965 +00:00] [DEBUG] [id.go:140] ["IDAllocator pickCanDoFunc"] [need=1] [total=199911] [remainReqCnt=0]
[2022/06/08 03:47:28.965 +00:00] [DEBUG] [impl.go:760] ["GetCollectionStatistics enqueued"] [traceID=35cd4e2502ddc161] [role=proxy] [MsgID=433759203819257946] [BeginTS=433759203866443796] [EndTS=433759203866443796] [db=] [collection=task_1_BIN_IVF_FLAT]
[2022/06/08 03:47:28.966 +00:00] [DEBUG] [impl.go:793] ["GetCollectionStatistics done"] [traceID=35cd4e2502ddc161] [role=proxy] [MsgID=433759203819257946] [BeginTS=433759203866443796] [EndTS=433759203866443796] [db=] [collection=task_1_BIN_IVF_FLAT]
[2022/06/08 03:47:28.967 +00:00] [DEBUG] [impl.go:361] ["HasCollection received"] [traceID=69e62636d8fad8a2] [role=proxy] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.967 +00:00] [DEBUG] [id.go:140] ["IDAllocator pickCanDoFunc"] [need=1] [total=199910] [remainReqCnt=0]
[2022/06/08 03:47:28.967 +00:00] [DEBUG] [impl.go:392] ["HasCollection enqueued"] [traceID=69e62636d8fad8a2] [role=proxy] [MsgID=433759203819257947] [BeginTS=433759203866443797] [EndTS=433759203866443797] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.967 +00:00] [DEBUG] [impl.go:422] ["HasCollection done"] [traceID=69e62636d8fad8a2] [role=proxy] [MsgID=433759203819257947] [BeginTS=433759203866443797] [EndTS=433759203866443797] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.968 +00:00] [DEBUG] [impl.go:640] ["DescribeCollection received"] [traceID=5e62470ab19e2eb6] [role=proxy] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.969 +00:00] [DEBUG] [id.go:140] ["IDAllocator pickCanDoFunc"] [need=1] [total=199909] [remainReqCnt=0]
[2022/06/08 03:47:28.969 +00:00] [DEBUG] [impl.go:664] ["DescribeCollection enqueued"] [traceID=5e62470ab19e2eb6] [role=proxy] [MsgID=433759203819257948] [BeginTS=433759203866443798] [EndTS=433759203866443798] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.969 +00:00] [DEBUG] [impl.go:697] ["DescribeCollection done"] [traceID=5e62470ab19e2eb6] [role=proxy] [MsgID=433759203819257948] [BeginTS=433759203866443798] [EndTS=433759203866443798] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.971 +00:00] [DEBUG] [impl.go:545] ["ReleaseCollection received"] [traceID=5a7b07d9878ea3f6] [role=proxy] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 03:47:28.971 +00:00] [DEBUG] [id.go:140] ["IDAllocator pickCanDoFunc"] [need=1] [total=199908] [remainReqCnt=0]
[2022/06/08 03:47:28.971 +00:00] [DEBUG] [impl.go:569] ["ReleaseCollection enqueued"] [traceID=5a7b07d9878ea3f6] [role=proxy] [MsgID=433759203819257949] [BeginTS=433759203877191681] [EndTS=433759203877191681] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 04:11:37.056 +00:00] [WARN] [impl.go:580] ["ReleaseCollection failed to WaitToFinish"] [error="proxy TaskCondition context Done"] [traceID=5a7b07d9878ea3f6] [role=proxy] [MsgID=433759203819257949] [BeginTS=433759203877191681] [EndTS=433759203877191681] [db=] [collection=task_1_BIN_FLAT]
[2022/06/08 04:11:37.056 +00:00] [WARN] [client.go:245] ["querycoord ClientBase ReCall grpc first call get error "] [error="err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:244 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:183 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).ReleaseCollection\n/go/src/github.com/milvus-io/milvus/internal/proxy/task.go:2847 github.com/milvus-io/milvus/internal/proxy.(*releaseCollectionTask).Execute\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:465 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:494 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).definitionLoop\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"]
[2022/06/08 04:11:37.057 +00:00] [ERROR] [task_scheduler.go:468] ["Failed to execute task: context canceled"] [traceID=5a7b07d9878ea3f6] [stack="github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:468\ngithub.com/milvus-io/milvus/internal/proxy.(*taskScheduler).definitionLoop\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:494"]
[2022/06/08 04:12:28.458 +00:00] [WARN] [channels_time_ticker.go:95] ["Proxy channelsTimeTickerImpl failed to get ts from tso"] [error="syncTimeStamp Failed:StateCode=Abnormal"]
[2022/06/08 04:12:28.458 +00:00] [WARN] [channels_time_ticker.go:162] [channelsTimeTickerImpl.tickLoop] [error="syncTimeStamp Failed:StateCode=Abnormal"]
[2022/06/08 04:12:28.458 +00:00] [WARN] [proxy.go:291] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="StateCode=Abnormal"]

Expected Behavior

all test cases passed

Steps To Reproduce

see http://10.100.32.144:8080/blue/organizations/jenkins/deploy_test/detail/deploy_test/761/pipeline

Milvus Log

failed job: http://10.100.32.144:8080/blue/organizations/jenkins/deploy_test/detail/deploy_test/761/pipeline

log: artifacts-cluster-reinstall-761-logs.tar.gz

Anything else?

All failed in Github action and Jenkins

https://github.com/milvus-io/milvus/actions/runs/2457295448/attempts/1

image

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

It may be that indexcoord did not write the index file correctly

The reason is that metaTable.segID2IndexMeta is incorrectly restored after rootcoord restart, resulting in the indexInfo corresponding to segmentID containing the indexinfo of all segments of the collection