prometheus: Freeze on graph page when working with a large amount of metrics due to no upper limit on insertable metric dropdown.

Bug Report

What did you do? ~After upgrading from 2.6.1 -> 2.8.0, we start seeing large page freezes just after page load. It’s like… Page load, click around for a moment, interact with the expression bar, then there’s a heavy loading pause (10 seconds?), then it goes to normal with some brief pauses afterwards. We have seen this across all of our instances of 2.8.0, but be aware that we do have pretty beefy deployments.~

~Once it finishes loading, it’s often OK, but that into chug is painful.~

Read below, but this turned out to be a coincidence, the problem was a large increase in metric labels that caused the DOM node population on the insertable metric dropdown to cause heavy slowdowns. We should have an upper limit on the amount of DOM nodes we create in https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L276

What did you expect to see? Not the spinning loading wheel.

What did you see instead? Under which circumstances? An inability to interact with the GUI

Environment

  • System information: Linux 4.4.161-1.el7.elrepo.x86_64 x86_64

  • Prometheus version:

prometheus --version
prometheus, version 2.8.0 (branch: HEAD, revision: 59369491cfdfe8dcb325723d6d28a837887a07b9)
  build user:       root@4c4d5c29b71f
  build date:       20190312-07:46:58
  go version:       go1.11.5
  • Logs: There’s no JS errors logging, but here is a gif of the behavior. What are you are looking at is the lengthy freeze where the text cursor stops blinking, and the expression bar stays highlighted.

Screen Recording 2019-03-29 at 03 48 PM

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 15 (7 by maintainers)

Commits related to this issue

Most upvoted comments

My understanding is that 2.8 limit was for the lookahead stuff, correct? These heavy pauses lined up with the amount of dom nodes inserted in the code block…

        pageConfig.allMetrics = json.data; // todo: do we need self.allMetrics? Or can it just live on the page
        for (var i = 0; i < pageConfig.allMetrics.length; i++) {
          self.insertMetric[0].options.add(new Option(pageConfig.allMetrics[i], pageConfig.allMetrics[i]));
        }

at https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L274 for the insert metric at cursor dropdown population

When I used the chrome debugger to artificially limit the amount of data allowed in that loop, the pause decreased dramatically. The hardest thing is communicating that the amount of metrics populated into that dropdown has been limited. I was looking at something like…

        pageConfig.allMetrics = json.data; // todo: do we need self.allMetrics? Or can it just live on the page
        insertMetric = self.insertMetric[0]
        var length = json.data.length;
        if (length > 50000) {
          length = 0
          insertMetric.options[0].text = "- disabled due to volume (too many metric names)"
        }

  
        var fragment = document.createDocumentFragment();
  
        for (var i = 0; i < length; i++) {
          var el = document.createElement('option');
          el.value = json.data[i]
          el.text = json.data[i]
          fragment.appendChild(el);
        }

        insertMetric.appendChild(fragment)

But that totally disables that functionality when this occurs. Is that OK?

It ends up looking like so: Screen Shot 2019-04-10 at 8 37 37 PM Screen Shot 2019-04-10 at 8 37 40 PM