test-infra: triage: go.k8s.io/triage fails to render in Google Chrome due to excessively large cluster data

What happened:

Loading… parsing 0MB.

console:

VM106:1 Uncaught SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at interactive.js:338
(anonymous) @ interactive.js:338
setTimeout (async)
(anonymous) @ interactive.js:337
req.onload @ interactive.js:291
load (async)
get @ interactive.js:290
getData @ interactive.js:326
load @ interactive.js:365
(anonymous) @ index.html?test=Pods should support pod readiness gates:81

What you expected to happen:

results should load

How to reproduce it (as minimally and precisely as possible):

load https://go.k8s.io/triage

Please provide links to example occurrences, if any:

https://go.k8s.io/triage

Anything else we need to know?:

https://kubernetes.slack.com/archives/C09QZ4DQB/p1625957726455500

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 25 (23 by maintainers)

Most upvoted comments

Update: it’s working in Firefox, not working in Chrome.

lambdanis on Jul 16, 2021

Hey so, when I visited go.k8s.io/triage, it counted up as far as 600MB before dropping to 0. That is way, way too high.

I think it’s ingesting some poison data (aka data that doesn’t cluster well and is causing it to slow way down) Screen Shot 2021-07-15 at 11 44 24 AM

spiffxp on Jul 15, 2021

It’s baaaack

BenTheElder on Oct 1, 2021

Let’s choose a known good/bad

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-test-infra-triage/1412216912305197056 - 2021-07-05, seems good
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-test-infra-triage/1415707578314264576 - 2021-07-15, latest bad

Some choice lines from each log

good-build-log.txt
122:Operation completed over 17 objects/267.6 MiB.
4088:I0706 01:20:17.561731     141 cluster.go:148] Finished locally clustering 3912 unique tests (1646334 failures) in 7m15.500468367s
31434:I0706 01:27:05.893098     141 cluster.go:322] Finished clustering 3912 unique tests (1605090 failures) into 3595 clusters in 6m48.331330116s
31436:I0706 01:37:30.856375     141 summarize.go:162] Finished rendering results in 10m24.963227678s

bad-build-log.txt
130:Operation completed over 21 objects/423.3 MiB.
5780:I0715 16:41:47.902850     142 cluster.go:148] Finished locally clustering 5519 unique tests (1965545 failures) in 15m43.101649281s
40597:I0715 17:30:58.467614     142 cluster.go:322] Finished clustering 5519 unique tests (1888136 failures) into 4944 clusters in 49m10.562568486s
40599:I0715 17:52:43.354063     142 summarize.go:162] Finished rendering results in 21m44.886376366s

$ for x in good bad; do grep cluster.go:121 $x-build-log.txt | cut -d, -f3-9 | sort > $x-failures.txt; done
$ comm -23 good-failures.txt bad-failures.txt | wc -l
     735
$ comm -12 good-failures.txt bad-failures.txt | wc -l
    3177

So while the total volume of failures went up by ~1.5 the number of unique-to-bad failures is… pretty high. Maybe we could find out what those tests are, when they were introduced, etc.

spiffxp on Jul 15, 2021

/assign I’ll take a look

spiffxp on Jul 15, 2021