VictoriaMetrics: Error in vmstorage regarding big packetsize

Describe the bug vmstorage has error logs complaining about packet size.

app/vmstorage/transport/server.go:160 cannot process vminsert conn from 10.20.11.94:40018: cannot read packet with size 104176512: unexpected EOF

I assume this also causes data to drop

Expected behavior Such error should not happen in vmstorage and vminsert should be writing all the data in vmstorage without an error

Version docker image tag: 1.26.0-cluster

Additional context

  • We are running clustered victoriametrics in GKE.
  • For vmstorage, the data is stored in a HDD persistent disk.
  • There are two pods running for each component
  • There are two pods of prometheus (HA) writing data to this VictoriaMetrics
  • RAM & CPU are as below:
    • vmstorage (3 cpu & 15 Gi memory) *2
    • vmselect (1 cpu & 10 Gi memory) * 2
    • vminsert (4 cpu & 6 Gi memory) * 2
  • sum(rate(vm_rows_inserted_total[5m])) by (instance) {instance=“10.24.7.59:8480”} : 111152.98245614035 {instance=“10.24.8.48:8480”} : 111287.36842105263 image

Please let me know if additional information is required.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17

Commits related to this issue

Most upvoted comments

Thanks for the update! Then leaving the issue open until possible solutions for this issue are published in README.md for cluster version.

The solutions:

  • increasing the number of vminsert or vmstorage nodes, so more CPU cores can be loaded with serving increased number of connections between vminsert and vmselect nodes.
  • Disabling compression for such connections with -rpc.disableCompression on vminsert.

These workarounds are needed, since currently vminsert establishes a single connection per each configured vmstorage node and this connection is served by a single CPU core, which can be saturated with high amounts of data. In the future we can add support for automatic adjustement for the number of connections between vminsert and vmstorage nodes depending on the ingestion rate.

Also it seems, even when the logs say vminsert will try to connect to the next storage node, the logs shows that it is still sending traffic to the same vmstorage node.

vminsert keeps trying to send data to unhealthy vmstorage node in order to determine when the node becomes healthy again.