test-infra: prow fails to update job configmap due to length limit

What happened: Prow’s config-updater plugin failed due to the job configmap being too large

{
 insertId:  "tey3tsg1971nnw"  
 jsonPayload: {
  author:  "BenTheElder"   
  component:  "hook"   
  error:  "update config map err: ConfigMap "job-config" is invalid: []: Too long: must have at most 1048576 characters"   
  event-GUID:  "7c77cca0-41db-11e9-83d0-bbff3a601bc3"   
  event-type:  "pull_request"   
  level:  "error"   
  msg:  "Error handling PullRequestEvent."   
  org:  "kubernetes"   
  plugin:  "config-updater"   
  pr:  11702   
  repo:  "test-infra"   
  url:  "https://github.com/kubernetes/test-infra/pull/11702"   
 }
 labels: {
  compute.googleapis.com/resource_name:  "fluentd-gcp-v3.2.0-56c8g"   
  container.googleapis.com/namespace_name:  "default"   
  container.googleapis.com/pod_name:  "hook-587bd464fc-pgdwd"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/k8s-prow/logs/hook"  
 receiveTimestamp:  "2019-03-08T19:51:01.491289557Z"  
 resource: {
  labels: {
   cluster_name:  "prow"    
   container_name:  "hook"    
   instance_id:  "7000980459144515921"    
   namespace_id:  "default"    
   pod_id:  "hook-587bd464fc-pgdwd"    
   project_id:  "k8s-prow"    
   zone:  "us-central1-f"    
  }
  type:  "container"   
 }
 severity:  "ERROR"  
 timestamp:  "2019-03-08T19:50:55Z"  
}

What you expected to happen: The config updater should not fail so we can update job configs

How to reproduce it (as minimally and precisely as possible): have a giant job config and merge a PR that updates the cnofig

Please provide links to example occurrences, if any: PR that triggered this: https://github.com/kubernetes/test-infra/pull/11702

/priority critical-urgent /area prow

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (19 by maintainers)

Most upvoted comments

gzip the job configs

just throwing this out there:

$ find ./config/jobs/ -name "*.yaml" -exec cat {} + > all-files
$ gzip < all-files > all-files.gz
$ stat --format=%s all-files
997270
$ stat --format=%s all-files.gz
69305

ratio is 14:1.

Yeah I thought the downsides to gzip were:

  1. only delays the problem, does not solve it
  2. required code where sharding did not (but without negative globs you may want validation, which needs code)
  3. did not support incremental updates

If those are all not blockers, gzip seems ok

@cjwagner why was gzip rejected? Where was this conversation? Do you want to rearrange the configs and shard?

I don’t recall discussing this at length and I’m definitely for it if we can make it work more easily. I was favoring config sharding as a temporary mitigation that didn’t require additional code, but if validation requires extra code then that isn’t really a selling point. One concern mentioned was that incremental updates might be harder if the config is gzipped, but I’m not sure thats true?

@stevekuznetsov Is there some complexity to adding optional gzip to the config package that we’re missing?

gzip was considered at length by @cjwagner and I but it’s complicated – we need to gzip on the input in the plugin as well as on the output by every config loader. Sharding it requires 0 code change and works great. That’s what we chose for simplicity and I don’t really see why y’all can’t, either 😃