prometheus: Freeze on graph page when working with a large amount of metrics due to no upper limit on insertable metric dropdown.
Bug Report
What did you do? ~After upgrading from 2.6.1 -> 2.8.0, we start seeing large page freezes just after page load. It’s like… Page load, click around for a moment, interact with the expression bar, then there’s a heavy loading pause (10 seconds?), then it goes to normal with some brief pauses afterwards. We have seen this across all of our instances of 2.8.0, but be aware that we do have pretty beefy deployments.~
~Once it finishes loading, it’s often OK, but that into chug is painful.~
Read below, but this turned out to be a coincidence, the problem was a large increase in metric labels that caused the DOM node population on the insertable metric dropdown to cause heavy slowdowns. We should have an upper limit on the amount of DOM nodes we create in https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L276
What did you expect to see? Not the spinning loading wheel.
What did you see instead? Under which circumstances? An inability to interact with the GUI
Environment
-
System information:
Linux 4.4.161-1.el7.elrepo.x86_64 x86_64
-
Prometheus version:
prometheus --version
prometheus, version 2.8.0 (branch: HEAD, revision: 59369491cfdfe8dcb325723d6d28a837887a07b9)
build user: root@4c4d5c29b71f
build date: 20190312-07:46:58
go version: go1.11.5
- Logs: There’s no JS errors logging, but here is a gif of the behavior. What are you are looking at is the lengthy freeze where the text cursor stops blinking, and the expression bar stays highlighted.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 15 (7 by maintainers)
My understanding is that 2.8 limit was for the lookahead stuff, correct? These heavy pauses lined up with the amount of dom nodes inserted in the code block…
at https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L274 for the
insert metric at cursor
dropdown populationWhen I used the chrome debugger to artificially limit the amount of data allowed in that loop, the pause decreased dramatically. The hardest thing is communicating that the amount of metrics populated into that dropdown has been limited. I was looking at something like…
But that totally disables that functionality when this occurs. Is that OK?
It ends up looking like so:
