appwrite: 🐛 Bug Report: Runtime Container Doesn't Exist but Executor Thinks it Does

👟 Reproduction steps

A few users have reported this problem, but I’m not sure why it happens. One can reproduce this by removing the container via the Docker CLI and then try to execute the function.

👍 Expected behavior

Container state and Executor are in sync

👎 Actual Behavior

It seems it’s possible for the Appwrite Executor’s table of active runtimes and the actual running runtime containers to go out of sync. When the executor thinks the container is still active, but it actually isn’t, it won’t spin up the container and executing the function will result in:

Message: An internal curl error has occurred within the executor! Error Msg: Could not resolve host: [container name]

From the logs, it seems the executor tries to remove the inactive runtime container, but fails:

appwrite-executor  | Executing Runtime: 62e95cd16f29990cd1ce-630f3976572d929432b8
appwrite-executor  | Function executed in 0.32679891586304 seconds, status: completed
appwrite-executor  | Running maintenance task ...
appwrite-executor  | Inactive Runtime deletion failed: Docker Error: 
appwrite-executor  | Executing Runtime: 62e95cd16f29990cd1ce-630f3976572d929432b8
appwrite-executor  | [Error] Type: Exception
appwrite-executor  | [Error] Message: An internal curl error has occurred within the executor! Error Msg: Could not resolve host: 62e95cd16f29990cd1ce-630f3976572d929432b8
appwrite-executor  | [Error] File: /usr/src/code/app/executor.php
appwrite-executor  | [Error] Line: 536

Although there was an error, it seems the container does successfully get removed. So, subsequent attempts to remove the runtime container result in:

appwrite-executor  | Inactive Runtime deletion failed: Docker Error: Error: No such container: 62e95cd16f29990cd1ce-630f3976572d929432b8

Restarting the appwrite-executor or re-deploying the function usually gets things working again.

đŸŽČ Appwrite version

Version 0.15.x

đŸ’» Operating system

Linux

đŸ§± Your Environment

No response

👀 Have you spent some time to check if this issue has been raised before?

  • I checked and didn’t find similar issue

🏱 Have you read the Code of Conduct?

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 9
  • Comments: 43 (5 by maintainers)

Most upvoted comments

Yeah, mine whole app in production is getting destroyed just by this stupid error. Is there any way I can solve it by my self?

One more thing which maybe helpful: So i delete dan old fucniton (~150 days old) which was working fine and now when I deployed it again. Same issue start in that function too!!

Can someone please help me?

I use 4gb ram with 2 cores in hetzner server. And appwrite server version is 0.15.3

I have made a workaroud using an external python docker container that can reboot executor container or even do a systemctl reboot to the host.

I think it could be made even with pure shell using CURL.

I just look for docker error on appwrite function logs, and if It finds one, do a reboot

i do have the same probleme, the most frustrating part of the problem is that the crashes prevent you and users from accessing your instance, so you have to restart it every time it crashes. i’m thinking about using other alternative to function like firebase or amazon

Quando o erro ocorreu para vocĂȘs, vocĂȘs estavam fazendo a função com o mesmo id anterior, ou seja, seu id Ă© o mesmo depois de recriar a função?

Uma possĂ­vel conclusĂŁo a que estou chegando Ă© que: se vocĂȘ excluir uma função e criĂĄ-la novamente com o mesmo id, isso estĂĄ causando esse erro de curl. Houve algum problema semelhante hĂĄ muito tempo (6 a 12 meses atrĂĄs nas funçÔes da nuvem), nĂŁo me lembro exatamente, mas estava relacionado Ă  função da nuvem que nĂŁo foi excluĂ­da corretamente.

Yes. I always use the same id

Same happens to me when the function is not used for some time. Using Dart-2.17, few ms execution time, payload is sent via flutter sdk or document added/updated event. Very annoying to have some features break all of a sudden but usually redeploying the function fixes it for a while. Using Appwrite 1.2.

The problem happened again. The function had not been used for a few weeks, when it was activated it was in processing status. At that time the other cloud functions did not respond. I had to delete the function and deploy it again. All functions became responsive. I’m waiting for authorization to share the logs