milvus: [Bug]: Search failed with error `Timestamp lag too large lag` after reinstallation

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20230813-a3c17604
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):kafka    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior


[2023-08-13T11:10:29.965Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=attempt #0: fail to Search, QueryNode ID=2, reason=Timestamp lag too large lag(469979h3m42.108s) max(24h0m0s): channel=by-dev-rootcoord-dml_9_443527017521818345v0: fail to access shard delegator: attempt #1: no available shard delegator found: service unavailable: fail to search on all shard leaders)>

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [Connections.connect] args: ['default', '', '', ''], kwargs: {'host': '10.255.24.200', 'port': 19530} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['int64', <DataType.INT64: 5>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>}  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['float', <DataType.FLOAT: 10>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['varchar', <DataType.VARCHAR: 21>, ''], kwargs: {'max_length': 65535, 'is_primary': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['json_field', <DataType.JSON: 23>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'name': 'json_field', 'description': '', 'type': <DataType.JSON: 23>}  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['float_vector', <DataType.FLOAT_VECTOR: 101>, ''], kwargs: {'dim': 128, 'is_primary': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['json_field', <DataType.JSON: 23>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'name': 'json_field', 'description': '', 'type': <DataType.JSON: 23>}  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [CollectionSchema] args: [[{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'float_vector', 'description': '', 'type': <DataT......, kwargs: {'primary_field': 'int64', 'auto_id': False, 'enable_dynamic_field': False} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params......  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - INFO - ci_test]: insert_data_general: collection creation (client_base.py:213)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : True  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [Collection] args: ['deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000', {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True......, kwargs: {'is_index': True, 'consistency_level': 'Strong'} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_response) : <Collection>:

[2023-08-13T11:10:29.964Z] -------------

[2023-08-13T11:10:29.964Z] <name>: deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000

[2023-08-13T11:10:29.964Z] <description>: 

[2023-08-13T11:10:29.964Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'descrip......  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:04 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 120} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:07 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:07 - INFO - ci_test]: inserted 3000 data into collection deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000 (common_func.py:1194)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:07 - DEBUG - ci_test]: Dynamic field is enabled: False (common_func.py:1196)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:07 - DEBUG - ci_test]: (api_request)  : [Collection.insert] args: [      int64   float varchar                                       float_vector

[2023-08-13T11:10:29.964Z] 0         0     0.0       0  [0.06233680018690522, 0.050136280237176235, 0....

[2023-08-13T11:10:29.964Z] 1         1     1.0       1  [0.11821487922064525, 0.049054303431628324, 0....

[2023-08-13T11:10:29.964Z] 2         2     2.0       2  [0.0549420438716113, 0.1344124626......, kwargs: {'timeout': 120} (api_request.py:62)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:08 - DEBUG - ci_test]: (api_response) : (insert count: 3000, delete count: 0, upsert count: 0, timestamp: 443527800555569154, success count: 3000, err count: 0)  (api_request.py:37)

[2023-08-13T11:10:29.964Z] [2023-08-13 10:58:14 - INFO - ci_test]: index info: [] (test_action_second_deployment.py:58)

[2023-08-13T11:10:29.965Z] [2023-08-13 11:03:42 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to Search, QueryNode ID=2, reason=Timestamp lag too large lag(469979h3m42.108s) max(24h0m0s): channel=by-dev-rootcoord-dml_9_443527017521818345v0: fail to access shard delegator: attempt #1: no available shard delegator found: service unavailable: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-08-13 11:03:42.119346', 'RPC error': '2023-08-13 11:03:42.820356'}> (decorators.py:126)[get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs

Expected Behavior

all test cases pass

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_cron/detail/deploy_test_kafka_cron/1134/pipeline

log:

artifacts-kafka-standalone-reinstall-1134-server-second-deployment-logs.tar.gz

artifacts-kafka-standalone-reinstall-1134-server-first-deployment-logs.tar.gz

artifacts-kafka-standalone-reinstall-1134-pytest-logs.tar.gz

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

I am still getting this error with 2.3.1

ERROR: SearchRequest failed:failed to search: attempt #0: failed to search/query delegator 1086 for channel by-dev-_0_444847057342106707v0: fail to Search, QueryNode ID=1086, reason=Timestamp lag too large lag(121h19m39.391s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable Search time cost: 4461ms Exception in thread "main" java.lang.RuntimeException: failed to search: attempt #0: failed to search/query delegator 1086 for channel by-dev-_0_444847057342106707v0: fail to Search, QueryNode ID=1086, reason=Timestamp lag too large lag(121h19m39.391s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable at MilvusJavaClient.handleResponseStatus(MilvusJavaClient.java:122) at MilvusJavaClient.searchFace(MilvusJavaClient.java:430) at MilvusJavaClient.main(MilvusJavaClient.java:579)

with image master-20230817-3c62acde, there is no error about Timestamp lag too large lag.

For the other error, I opened another issue https://github.com/milvus-io/milvus/issues/26437