milvus: [Bug]: Milvus crashed with panic when enabling compaction and GC

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: latest
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): latest
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Milvus is crashed with panic

d-dml_234_429327338027614209v0 is not watched on node 7\nattempt #2:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #3:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #4:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #5:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #6:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #7:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #8:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #9:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\nattempt #10:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7\n"]
2021-11-24T11:37:32.426529677Z stderr F panic: All attempts results:
2021-11-24T11:37:32.426567598Z stderr F attempt #1:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426572509Z stderr F attempt #2:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426574623Z stderr F attempt #3:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426594442Z stderr F attempt #4:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.42659899Z stderr F attempt #5:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426602122Z stderr F attempt #6:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426604116Z stderr F attempt #7:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426606019Z stderr F attempt #8:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426624374Z stderr F attempt #9:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426628487Z stderr F attempt #10:data service save bin log path failed, reason = channel by-dev-rootcoord-dml_234_429327338027614209v0 is not watched on node 7
2021-11-24T11:37:32.426630482Z stderr F
2021-11-24T11:37:32.426633033Z stderr F
2021-11-24T11:37:32.426634914Z stderr F goroutine 247170 [running]:
2021-11-24T11:37:32.426688175Z stderr F github.com/milvus-io/milvus/internal/datanode.flushNotifyFunc.func1(0xc001cf6690)
2021-11-24T11:37:32.426695234Z stderr F         /go/src/github.com/milvus-io/milvus/internal/datanode/flush_manager.go:509 +0x1439
2021-11-24T11:37:32.426697399Z stderr F github.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).waitFinish(0xc006768600, 0xc00a7c9020, 0xc00c779c90)
2021-11-24T11:37:32.426719453Z stderr F         /go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:186 +0xbd
2021-11-24T11:37:32.426759007Z stderr F created by github.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).init.func1
2021-11-24T11:37:32.426763191Z stderr F         /go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:118 +0xb0

Expected Behavior

No panic

Steps To Reproduce

1 Start milvus with compaction and GC enabled:

--set dataCoordinator.enableCompaction="true" \
--set dataCoordinator.enableGarbageCollection="true" \
--set dataCoordinator.gc.interval=60 \
--set dataCoordinator.gc.missingTolerance=60 \
--set dataCoordinator.gc.dropTolerance=60 \

2 Runnig CI test cases

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

@congqixia hi, congqi, how is the fix going?

Thanks.

@binbinlv Waiting for the final PR