longhorn: [IMPROVEMENT] Improve environment_check script for NFS protocol bug and the host system self diagnosis

Is your improvement request related to a feature? Please describe (👍 if you like this request)

The objective of the improvement is to

  • Update the KB doc for the NFS bug in kernel 5.15.0-94 (ref)
  • Update environment_check script for checking kernel versions (ref)
  • Add condition(s) in node.status.conditions to check the environment, like the kernel version
    • After users update their system stuff like the host distro or kernel, they usually skip the environment check. To make it easier for them to spot any known environment problems, we can just run checks in the node controller and note down the status in node.status.conditions.
  • Add OS distro and kernel version to logging when mount fails.

Describe the solution you’d like

Describe alternatives you’ve considered

Additional context

About this issue

  • Original URL
  • State: open
  • Created 5 months ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

I installed a .100 release tonight for Ubuntu. Seems to have resolved the issue on my end.

Here we same, .100 kernel solved it. Also the actual /new her-kernel (6.5) solves it.

Is there currently a workaround?

I think it might just be either downgrading to the .92 kernel release or wait for the .100 release

@derekbit Is that accurate?

Yes, totally right!

Besides of the environment_check script, I’m considering adding a condition in node.status.conditions to check the environment, like the kernel version. WDYT? @innobead @james-munson

@james-munson remember to update the zenhub status.

Sounds good to me @derekbit . Let’s update the requirements of this request.

Updated. Thank you.