lithops: Catch HTTP 409 error for Code Engine

Recently, I have been seeing frequent HTTP 409 errors when processing data in bulk. I spoke with the IBM customer support to clarify where the source of the problem is and how it can be fixed. So far, all I’ve gotten from them is this response:

This looks like that you have some application code which is interacting with a Code Engine project using the Kubernetes API endpoint. 
There is a quite famous bug in Kubernetes that concurrent create operations can lead to 409 errors of this kind. 
You will have to retry them in your application code. A retry would look like this in pseudo code:

const MAX_RETRY = 5

  main() {
    myObject = { ... }
    createWithRetry(myObject)
  }

  createWithRetry(object) {
    for (i = 0; i < MAX_RETRY; i ++) {
       sleepSeconds(i)
       try {
          return kubeClient.create(object)
       } catch (e) {
          if (!isRetryable(e) || i == MAX_RETRY - 1) {
            logError(e)
            throw e
          }
          logWarning("Retrying due to", e)
       }
    }
  }

  isRetryable(exception) {
    if (exception is ConflictError) {
      return true
    }

    // It makes sense to retry more like network failures

    return false
  }

The original stacktrace I get:

Traceback (most recent call last):
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/lithops.py", line 81, in _callback
    self._manager.annotate_lithops(ds=ds, del_first=msg.get('del_first', False))
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/dataset_manager.py", line 114, in annotate_lithops
    ServerAnnotationJob(executor, ds, perf).run()
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/annotation_job.py", line 321, in run
    self.db_formula_image_ids = self.pipe.store_images_to_s3(self.ds.id)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/pipeline.py", line 199, in store_images_to_s3
    return store_images_to_s3(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/store_images.py", line 67, in store_images_to_s3
    results = executor.map(_upload_png_batch, [(cobj,) for cobj in png_cobjs], runtime_memory=512)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 292, in map
    raise exc
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 325, in run
    futures = executor.map(
  File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/executors.py", line 288, in map
    futures = self.invoker.run_job(job)
  File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 266, in run_job
    futures = self._run_job(job)
  File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 205, in _run_job
    raise e
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'audit-id': 'a082f7b4-70d3-4708-a29e-8ec22f965220', 'cache-control': 'no-cache, private', 'content-length': '326', 'content-type': 'application/json', 'date': 'Sat, 03 Sep 2022 03:19:13 GMT', 'x-kubernetes-pf-flowschema-uid': '05dc5f92-6c95-416b-b707-da00a57e0adc', 'x-kubernetes-pf-prioritylevel-uid': '8d40cc3d-a5c4-47ee-8978-ec5f7a197920'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on resourcequotas \"5s4f5qcqf4f\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"5s4f5qcqf4f","kind":"resourcequotas"},"code":409}

Is it possible to add handling of such exceptions to lithops? I use lithops 2.7.0.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (3 by maintainers)

Most upvoted comments

@JosepSampe Thanks, I will be grateful for the release of a new version when possible.

@kpavel can you handle it? 😃