milvus: [Bug]: querynode crash

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.0.1
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 32Cores/192GB
- GPU: 
- Others:

Current Behavior

one querynode crash

Expected Behavior

querynode running healthily

Steps To Reproduce

No response

Milvus Log

querynode.log

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

I took a look at the log and here’s what happened:

  1. DataNode-1 starts processing insert messages for segment-A.
  2. DataNode-1 flushes segment-A. Making it a flushed segment.
  3. DataNode-1 crashes (#17851) and restarted as DataNode-2.
  4. DataNode-2 recovers segment-A as a flushed segment.
  5. DataNode-2 continues to process insert messages (which are set to be written to segment-A previously).
  6. DataNode-2 fails to update segment-A’s rowNum, because it is a flushed segment (i.e. none-growing, none-normal): https://github.com/milvus-io/milvus/blob/master/internal/datanode/flow_graph_insert_buffer_node.go#L439 However, this case will only print a log. The insert will still continue: https://github.com/milvus-io/milvus/blob/master/internal/datanode/segment_replica.go#L638
  7. DataNode-2 continues to process insert messages but the rowNum of segment-A will never get updated.
  8. QueryCoord finds that field data has different row count than other column’s row count and fails to load segment-A into memory.

More details:

Here’s where insertMsg are supposed to get filtered to avoid double consuming: https://github.com/milvus-io/milvus/blob/6954a5ba3e91bd39d42da4ff5e0483c091eb283f/internal/datanode/flow_graph_dd_node.go#L145-L153

where FilterThreshold is the start time of DataNode and is set during DataNode Start(): https://github.com/milvus-io/milvus/blob/f0b036a35ae17ada3101eebc6a200b5f621b7e7e/internal/datanode/data_node.go#L488

In the case of two or more living DataNodes.

  1. When one of them (DN-1) crashes. DN-2 will rewatch DN-1’s DML channel from checkpoint to recover the segments.
  2. Since DN-2 has started long ago, its FilterThreshold could be a very old/small timestamp (for example, 3 days ago). If DN-1 crashed quite recently, the insert message are new. msg.EndTs() < FilterThreshold will always be false and nothing will get filtered.
  3. All insert message after the checkpoint will get double consumed then?

To make things right, we need to give FilterThreshold a proper value. Shall it be the timestamp of each individual channel watch operation instead of DataNode’s start time?

  • new path should be ignored (Meta shouldn’t be wrong)

Yep, SaveBinlog path should be idempotent. And we shouldn’t use FilterThreshold, but filter by each segment