test-infra: [Kettle] Process hangs when generating json.gz for 'all' table

What would you like to be added: Add more compute to Kettle instances. Or figure out if CPU is impacting speed/lockup.

Why is this needed:

top - 10:34:45 up 2 days,  5:18,  0 users,  load average: 1.16, 1.19, 1.18
Tasks:   8 total,   2 running,   6 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.4 us,  0.3 sy,  0.0 ni, 86.2 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 53588024 total,  2290884 free,  7649364 used, 43647776 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 45492108 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    619 root      20   0 5952940 5.110g  42432 R 100.0 10.0   2897:56 pypy3
      1 root      20   0   18384   3096   2820 S   0.0  0.0   0:00.02 runner.sh
     40 root      20   0   26768   9056   5204 S   0.0  0.0   0:00.03 python3
    618 root      20   0    4636    836    768 S   0.0  0.0   0:00.00 sh
    620 root      20   0    4700    828    768 S   0.0  0.0   1:05.94 pv
    621 root      20   0    4792   1528   1252 S   0.0  0.0   0:01.64 gzip
    643 root      20   0   18512   3404   3028 S   0.0  0.0   0:00.00 bash
    652 root      20   0   36628   3092   2644 R   0.0  0.0   0:00.03 top

It seems that Kettle Prod is hitting cpu limits when trying to build json. It takes extremely long to complete an update cycle and now seems to freeze at point, not updating at all and “catching” on specific builds logs seem to end there

Error while reading data, error message: JSON parsing error in row starting at position 605377307: Parser terminated before end of string

Error while reading data, error message: JSON parsing error in row starting at position 752722019: Parser terminated before end of string

Error while reading data, error message: JSON parsing error in row starting at position 1126381032: Parser terminated before end of string

ERROR:root:error on gs://pivotal-e2e-results/kubo-windows-2019/1553782223
Traceback (most recent call last):
  File "make_json.py", line 281, in make_rows
    yield rowid, row_for_build(path, started, finished, results)
  File "make_json.py", line 254, in row_for_build
    build = Build.generate(path, tests, started, finished, metadata, repos)
  File "make_json.py", line 94, in generate
    build = cls(path, tests)
  File "make_json.py", line 90, in __init__
    self.populate_path_to_job_and_number()
  File "make_json.py", line 112, in populate_path_to_job_and_number
    raise ValueError(f'unknown build path for {self.path} in known bucket paths')
ValueError: unknown build path for gs://pivotal-e2e-results/kubo-windows-2019/1553782223 in known bucket paths

/area kettle /assign

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (13 by maintainers)

Most upvoted comments

I think I will try something like this to avoid the STDOUT issues https://stackoverflow.com/questions/49534901/is-there-a-way-to-use-json-dump-with-gzip