test-infra: Pods are OOMKilling itself

senlu@senlu:~/work/src/k8s.io/test-infra/prow$ kubectl get po -n=test-pods -a | grep OOM | wc -l
50

We probably want to set memory limit for each job, like regular e2e jobs use ~1Gi, however bazel job can eat up ~7Gi

"sacrifice child!", said by the node

/area prow /area jobs

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 22 (22 by maintainers)

Most upvoted comments

2017-11-14 23:30:02.823312: {'Completed': 238, 'Error': 112, 'Evicted': 2, 'Running': 209}

We’re in a pretty good place for the moment.

seems we already have the resources fields in type.go, I’ll add them for jobs using bazel build

This seems to still be a problem. #5457 was deployed before the weekend, right?

$ kubectl get po -n=test-pods -a | grep "OOMKilled" | wc -l
56

FYI, current status:

senlu@senlu:~/work/src/k8s.io/test-infra/prow$ kubectl top nodes
NAME                                  CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
gke-prow-default-pool-42819f20-bg3z   3605m        91%       6648Mi          53%       
gke-prow-default-pool-42819f20-2jmh   3532m        90%       8902Mi          71%       
gke-prow-default-pool-42819f20-1rjm   3249m        82%       7587Mi          61%       
gke-prow-default-pool-42819f20-c81m   3342m        85%       6546Mi          52%       
gke-prow-default-pool-42819f20-hmx1   2577m        65%       10963Mi         88%       
gke-prow-default-pool-42819f20-nlk6   3290m        83%       10185Mi         82%       
gke-prow-default-pool-42819f20-z1v4   80m          2%        5811Mi          46%       
gke-prow-default-pool-42819f20-frsc   2323m        59%       12538Mi         101%      
gke-prow-default-pool-42819f20-nh4x   345m         8%        6499Mi          52%       
gke-prow-default-pool-42819f20-v12h   1151m        29%       8257Mi          66%       
gke-prow-default-pool-42819f20-pbpz   3577m        91%       4425Mi          35%       
gke-prow-default-pool-42819f20-8lxf   3180m        81%       7657Mi          61%       
gke-prow-default-pool-42819f20-4j5c   530m         13%       10118Mi         81%       
gke-prow-default-pool-42819f20-pl3m   246m         6%        8003Mi          64%       
gke-prow-default-pool-42819f20-25fm   272m         6%        8132Mi          65%       
gke-prow-default-pool-42819f20-4nvp   3381m        86%       9984Mi          80%       
gke-prow-default-pool-42819f20-nwd2   80m          2%        6970Mi          56%       
gke-prow-default-pool-42819f20-m0wk   98m          2%        6385Mi          51%       
gke-prow-default-pool-42819f20-2l32   2671m        68%       9209Mi          74%       
gke-prow-default-pool-42819f20-4dc6   212m         5%        8109Mi          65%       
gke-prow-default-pool-42819f20-j3b8   283m         7%        9262Mi          74%       
gke-prow-default-pool-42819f20-kh1l   2959m        75%       10672Mi         86%       
gke-prow-default-pool-42819f20-dghc   108m         2%        3497Mi          28%       
gke-prow-default-pool-42819f20-28z7   114m         2%        8246Mi          66%       
gke-prow-default-pool-42819f20-2vp5   993m         25%       5592Mi          45%       
gke-prow-default-pool-42819f20-pc5n   2990m        76%       12065Mi         97%       
gke-prow-default-pool-42819f20-cmh7   188m         4%        7191Mi          57%       
gke-prow-default-pool-42819f20-4kjl   3388m        86%       5398Mi          43%       
gke-prow-default-pool-42819f20-7kk1   3958m        100%      6922Mi          55%       
gke-prow-default-pool-42819f20-3snv   3367m        85%       7845Mi          63%       
gke-prow-default-pool-42819f20-b8d8   2742m        69%       11205Mi         90%       
gke-prow-default-pool-42819f20-wz3r   3215m        82%       7935Mi          63%       
gke-prow-default-pool-42819f20-spc8   1101m        28%       11321Mi         91%       
gke-prow-default-pool-42819f20-jvtd   1755m        44%       7266Mi          58%       
gke-prow-default-pool-42819f20-svsn   172m         4%        1414Mi          11%       
gke-prow-default-pool-42819f20-q3k3   3350m        85%       2498Mi          20%       
gke-prow-default-pool-42819f20-xk8f   394m         10%       7162Mi          57%       
gke-prow-default-pool-42819f20-qrzf   284m         7%        992Mi           7%        
gke-prow-default-pool-42819f20-l0zx   1406m        35%       2144Mi          17%       
gke-prow-default-pool-42819f20-sxtk   2679m        68%       3301Mi          26%       
gke-prow-default-pool-42819f20-d2xp   2383m        60%       1113Mi          8%        
gke-prow-default-pool-42819f20-t779   404m         10%       2483Mi          20%       
gke-prow-default-pool-42819f20-s4gf   2735m        69%       10729Mi         86%       
gke-prow-default-pool-42819f20-v9zm   3257m        83%       8000Mi          64%       
gke-prow-default-pool-42819f20-m58t   3288m        83%       8708Mi          70%       
gke-prow-default-pool-42819f20-xf8k   265m         6%        10336Mi         83%       
gke-prow-default-pool-42819f20-wn4n   78m          1%        8999Mi          72%       
gke-prow-default-pool-42819f20-2bsd   198m         5%        6057Mi          48%       
gke-prow-default-pool-42819f20-mpp3   287m         7%        8714Mi          70%       
gke-prow-default-pool-42819f20-t5rd   3283m        83%       8523Mi          68%       
gke-prow-default-pool-42819f20-6r8w   2457m        62%       8427Mi          67%       
gke-prow-default-pool-42819f20-4tkh   316m         8%        2865Mi          23%       
gke-prow-default-pool-42819f20-g532   2398m        61%       9238Mi          74%       
gke-prow-default-pool-42819f20-7768   3483m        88%       4516Mi          36%       
gke-prow-default-pool-42819f20-zs96   1856m        47%       8527Mi          68%       
gke-prow-default-pool-42819f20-34vx   2311m        58%       1334Mi          10%       
gke-prow-default-pool-42819f20-9xfn   89m          2%        7234Mi          58%       
gke-prow-default-pool-42819f20-kt11   3741m        95%       5641Mi          45%       
gke-prow-default-pool-42819f20-kwsv   68m          1%        7879Mi          63%       
gke-prow-default-pool-42819f20-02sl   140m         3%        2672Mi          21%       
gke-prow-default-pool-42819f20-vw7s   3874m        98%       10005Mi         80%       
gke-prow-default-pool-42819f20-1rh9   3134m        79%       9354Mi          75%       
gke-prow-default-pool-42819f20-27rp   3233m        82%       8003Mi          64%       
gke-prow-default-pool-42819f20-5t9b   3531m        90%       3639Mi          29%       
gke-prow-default-pool-42819f20-qqgc   3997m        101%      9240Mi          74%       
gke-prow-default-pool-42819f20-fptg   407m         10%       6639Mi          53%       
gke-prow-default-pool-42819f20-sx26   3357m        85%       9818Mi          79%       
gke-prow-default-pool-42819f20-8h86   798m         20%       7891Mi          63%       
gke-prow-default-pool-42819f20-vj85   131m         3%        9127Mi          73%       
gke-prow-default-pool-42819f20-pzzv   84m          2%        6506Mi          52%       
gke-prow-default-pool-42819f20-4kqg   693m         17%       8760Mi          70%       
gke-prow-default-pool-42819f20-vw5l   2132m        54%       11611Mi         93%       
gke-prow-default-pool-42819f20-cw5p   231m         5%        8284Mi          66%       
gke-prow-default-pool-42819f20-ls8z   2102m        53%       10194Mi         82%       
gke-prow-default-pool-42819f20-wwgr   264m         6%        7205Mi          58%       
gke-prow-default-pool-42819f20-j8sq   2737m        69%       9195Mi          74%       
gke-prow-default-pool-42819f20-p3jz   290m         7%        6655Mi          53%       
gke-prow-default-pool-42819f20-j4bw   3354m        85%       10976Mi         88%       
gke-prow-default-pool-42819f20-96n1   3514m        89%       4960Mi          39%       
gke-prow-default-pool-42819f20-pv9n   3547m        90%       8574Mi          69%       
gke-prow-default-pool-42819f20-z495   3406m        86%       9855Mi          79%       
gke-prow-default-pool-42819f20-bnfq   2858m        72%       9878Mi          79%       
gke-prow-default-pool-42819f20-bv6m   2408m        61%       3066Mi          24%       
gke-prow-default-pool-42819f20-lg65   3585m        91%       8852Mi          71%       
gke-prow-default-pool-42819f20-d4cf   1943m        49%       9798Mi          78%       
gke-prow-default-pool-42819f20-zh2h   377m         9%        6144Mi          49%       
gke-prow-default-pool-42819f20-jgn8   110m         2%        6962Mi          56%       
gke-prow-default-pool-42819f20-rl9t   107m         2%        2331Mi          18%       
gke-prow-default-pool-42819f20-0q1j   3154m        80%       11291Mi         90%       
gke-prow-default-pool-42819f20-r9z8   96m          2%        6836Mi          55% 

https://kubernetes.io/docs/tasks/administer-cluster/memory-default-namespace/

we should probably add some configuration for this, as well as look at configuring some of the more intensive jobs (jobs with builds) to request more