kubernetes: Volume plugin streaming e2e test failing on large cluster

The “Volume plugin streaming [Slow] NFS should write files of various sizes, verify size, validate content” e2e test is failing on our 2k-node gce clusters (https://k8s-testgrid.appspot.com/google-gce-scale#gce-large-correctness) with the following error:

    unable to test file content via `grep /opt/nfs/e2e-tests-volume-io-fwm9d/nfs_io_test-1048576`: error running &{/workspace/kubernetes/platforms/linux/amd64/kubectl [kubectl --server=https://104.196.132.168 --kubeconfig=/workspace/.kube/config exec --namespace=e2e-tests-volume-io-fwm9d nfs-io-client -- /bin/sh -c grep -c -m1 -f /opt/nfs/e2e-tests-volume-io-fwm9d/dd_if /opt/nfs/e2e-tests-volume-io-fwm9d/nfs_io_test-1048576] []  <nil>  Killed
    Command stdout:
    
    stderr:
    Killed
    command terminated with exit code 137

cc @kubernetes/sig-storage-bugs @kubernetes/sig-scalability-misc

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

@verult Thanks a lot for fixing this!

It seems like even if the generated file is broken up by newlines, grep could still use on the order of hundreds of MB of memory for a 100MiB file (even with the -m1 and -F options).

We could check the file line by line, but it would take a long time.

A better solution would be to check the MD5 file hash. It’s fast and uses little memory. The expected hashes will be hard-coded into the test, and if different file sizes are added in the future, a hash needs to be computed and added as well.

The other strange thing is that it’s only failing in this suite and not the others.