lithops: Catch HTTP 409 error for Code Engine
Recently, I have been seeing frequent HTTP 409 errors when processing data in bulk. I spoke with the IBM customer support to clarify where the source of the problem is and how it can be fixed. So far, all I’ve gotten from them is this response:
This looks like that you have some application code which is interacting with a Code Engine project using the Kubernetes API endpoint.
There is a quite famous bug in Kubernetes that concurrent create operations can lead to 409 errors of this kind.
You will have to retry them in your application code. A retry would look like this in pseudo code:
const MAX_RETRY = 5
main() {
myObject = { ... }
createWithRetry(myObject)
}
createWithRetry(object) {
for (i = 0; i < MAX_RETRY; i ++) {
sleepSeconds(i)
try {
return kubeClient.create(object)
} catch (e) {
if (!isRetryable(e) || i == MAX_RETRY - 1) {
logError(e)
throw e
}
logWarning("Retrying due to", e)
}
}
}
isRetryable(exception) {
if (exception is ConflictError) {
return true
}
// It makes sense to retry more like network failures
return false
}
The original stacktrace I get:
Traceback (most recent call last):
File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/lithops.py", line 81, in _callback
self._manager.annotate_lithops(ds=ds, del_first=msg.get('del_first', False))
File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/dataset_manager.py", line 114, in annotate_lithops
ServerAnnotationJob(executor, ds, perf).run()
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/annotation_job.py", line 321, in run
self.db_formula_image_ids = self.pipe.store_images_to_s3(self.ds.id)
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/pipeline.py", line 199, in store_images_to_s3
return store_images_to_s3(
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/store_images.py", line 67, in store_images_to_s3
results = executor.map(_upload_png_batch, [(cobj,) for cobj in png_cobjs], runtime_memory=512)
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 292, in map
raise exc
File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 325, in run
futures = executor.map(
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/executors.py", line 288, in map
futures = self.invoker.run_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 266, in run_job
futures = self._run_job(job)
File "/opt/dev/miniconda3/envs/sm38/lib/python3.8/site-packages/lithops/invokers.py", line 205, in _run_job
raise e
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'audit-id': 'a082f7b4-70d3-4708-a29e-8ec22f965220', 'cache-control': 'no-cache, private', 'content-length': '326', 'content-type': 'application/json', 'date': 'Sat, 03 Sep 2022 03:19:13 GMT', 'x-kubernetes-pf-flowschema-uid': '05dc5f92-6c95-416b-b707-da00a57e0adc', 'x-kubernetes-pf-prioritylevel-uid': '8d40cc3d-a5c4-47ee-8978-ec5f7a197920'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on resourcequotas \"5s4f5qcqf4f\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"5s4f5qcqf4f","kind":"resourcequotas"},"code":409}
Is it possible to add handling of such exceptions to lithops? I use lithops 2.7.0.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (3 by maintainers)
@JosepSampe Thanks, I will be grateful for the release of a new version when possible.
@kpavel can you handle it? 😃