cvat: Gateway Timeout (504) error when running with SAM

My actions before raising this issue

  • Read/searched the docs
  • Searched past issues

Steps to Reproduce (for bugs)

  1. Downloaded cvat, am on commit ad534b2ac32f57.

  2. Installed NVIDIA container toolkit.

  3. Followed Serverless Setup steps.

  4. Installed nuctl by following guide here. Verified that nuclio is version 1.8.14.

  5. Ran command to launch SAM nuctl function as described here. cd serverless && ./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/

  6. Checked that nuclio function is running properly nuctl get function returns that SAM function is in STATE ready.

  7. Launched CVAT in serverless mode using docker-compose -f docker-compose.yml -f components/serverless/docker-compose.serverless.yml up -d.

  8. Open CVAT task, select “Segment Anything” from AI tools, click on image. Get a “Waiting a response from Segment Anything.” After a while I get a 504 timeout error. Failed to load resource: the server responded with a status of 504 (Gateway Timeout). Clicking on the link in the browser console shows me REST api call (image below).

image

Current Behaviour

It seems that CVAT instance is unable to communicate with the nuclio SAM function. I have verified that SAM function is running in nuclio dashboard.

Your Environment

  • Git hash commit ad534b2a:
  • Docker version 23.0.4
  • Are you using Docker Swarm or Kubernetes? Regular docker
  • Operating System and version (e.g. Linux, Windows, MacOS):. Linux 22.04

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 4
  • Comments: 20 (5 by maintainers)

Commits related to this issue

Most upvoted comments

@whom-da dawg, thanks for this comment. I finally got things working after struggling for weeks! Because I have openvpn and ssh servers installed on my machine, I enabled the firewall which is disabled in Ubuntu by default. And so I had to issue sudo ufw allow 32772/tcp. A little more info is at the issue that I posted, #6087

I’ve got exactly the same issue. The weird this is that I got it to work on CPU on one computer and then followed the exact same steps on another and there it doesn’t work.

In the docker logs I’ve found this:

2023-04-20 09:15:55,536 DEBG 'runserver' stderr output:
[Thu Apr 20 09:15:55.536775 2023] [wsgi:error] [pid 337:tid 140222251001408] [remote 172.19.0.6:48534] WARNING:django.request:Not Found: /api/functions/requests/


cvat_server             | 2023-04-20 09:09:43,395 DEBG 'runserver' stderr output:
cvat_server             | [Thu Apr 20 09:09:43.395696 2023] [wsgi:error] [pid 337:tid 140222267786816] [remote 172.19.0.6:45738] [2023-04-20 09:09:43,395] ERROR django.request: Service Unavailable: /api/lambda/functions/pth.facebookresearch.sam.vit_h
cvat_server             | 
cvat_server             | 2023-04-20 09:09:43,396 DEBG 'runserver' stderr output:
cvat_server             | [Thu Apr 20 09:09:43.395858 2023] [wsgi:error] [pid 337:tid 140222267786816] [remote 172.19.0.6:45738] ERROR:django.request:Service Unavailable: /api/lambda/functions/pth.facebookresearch.sam.vit_h

I found no further errors in the nuclio container and the sam container.

I’ve tested with another serverless function and there I got the same issue. Currently I run with the env CVAT_HOST, but I got the same behaviour without.

So I suspect that there might be some communication issues between the cvat containers, but I don’t really see a clear way to debug this.

I’m having the exact same issue on cvat 2.4.3, both with SAM and YOLOv5.

Logs seem to indicate that the cvat server is not able to communicate with the serverless container (example for YOLOv5):

...

cvat_server               | 2023-05-02 08:20:36,954 DEBG 'runserver' stderr output:
cvat_server               | [Tue May 02 08:20:36.954546 2023] [wsgi:error] [pid 147:tid 140425650320960] [remote 172.23.0.2:57610] ERROR:django.request:Service Unavailable: /api/lambda/functions/ultralytics-yolov5

...

Logs of the YOLO container: (SAM is similar to previous comment ) :

(base) tlips@paard:~/Documents/cvat/serverless$ docker logs nuclio-nuclio-ultralytics-yolov5 
23.05.02 08:16:24.371                 processor (I) Starting processor {"version": "Label: 1.8.14, Git commit: cbb0774230996a3eb4621c1a2079e2317578005b, OS: linux, Arch: amd64, Go version: go1.17.8"}
23.05.02 08:16:24.371                 processor (D) Read configuration {"config": "{\n    \"metadata\": {\n        \"name\": \"ultralytics-yolov5\",\n        \"namespace\": \"nuclio\",\n        \"labels\": {\n            \"nuclio.io/project-name\": \"cvat\"\n        },\n        \"annotations\": {\n            \"framework\": \"pytorch\",\n            \"name\": \"YOLO v5\",\n            \"spec\": \"[\\n  { \\\"id\\\": 0, \\\"name\\\": \\\"person\\\" },\\n  { \\\"id\\\": 1, \\\"name\\\": \\\"bicycle\\\" },\\n  { \\\"id\\\": 2, \\\"name\\\": \\\"car\\\" },\\n  { \\\"id\\\": 3, \\\"name\\\": \\\"motorbike\\\" },\\n  { \\\"id\\\": 4, \\\"name\\\": \\\"aeroplane\\\" },\\n  { \\\"id\\\": 5, \\\"name\\\": \\\"bus\\\" },\\n  { \\\"id\\\": 6, \\\"name\\\": \\\"train\\\" },\\n  { \\\"id\\\": 7, \\\"name\\\": \\\"truck\\\" },\\n  { \\\"id\\\": 8, \\\"name\\\": \\\"boat\\\" },\\n  { \\\"id\\\": 9, \\\"name\\\": \\\"traffic light\\\" },\\n  { \\\"id\\\": 10, \\\"name\\\": \\\"fire hydrant\\\" },\\n  { \\\"id\\\": 11, \\\"name\\\": \\\"stop sign\\\" },\\n  { \\\"id\\\": 12, \\\"name\\\": \\\"parking meter\\\" },\\n  { \\\"id\\\": 13, \\\"name\\\": \\\"bench\\\" },\\n  { \\\"id\\\": 14, \\\"name\\\": \\\"bird\\\" },\\n  { \\\"id\\\": 15, \\\"name\\\": \\\"cat\\\" },\\n  { \\\"id\\\": 16, \\\"name\\\": \\\"dog\\\" },\\n  { \\\"id\\\": 17, \\\"name\\\": \\\"horse\\\" },\\n  { \\\"id\\\": 18, \\\"name\\\": \\\"sheep\\\" },\\n  { \\\"id\\\": 19, \\\"name\\\": \\\"cow\\\" },\\n  { \\\"id\\\": 20, \\\"name\\\": \\\"elephant\\\" },\\n  { \\\"id\\\": 21, \\\"name\\\": \\\"bear\\\" },\\n  { \\\"id\\\": 22, \\\"name\\\": \\\"zebra\\\" },\\n  { \\\"id\\\": 23, \\\"name\\\": \\\"giraffe\\\" },\\n  { \\\"id\\\": 24, \\\"name\\\": \\\"backpack\\\" },\\n  { \\\"id\\\": 25, \\\"name\\\": \\\"umbrella\\\" },\\n  { \\\"id\\\": 26, \\\"name\\\": \\\"handbag\\\" },\\n  { \\\"id\\\": 27, \\\"name\\\": \\\"tie\\\" },\\n  { \\\"id\\\": 28, \\\"name\\\": \\\"suitcase\\\" },\\n  { \\\"id\\\": 29, \\\"name\\\": \\\"frisbee\\\" },\\n  { \\\"id\\\": 30, \\\"name\\\": \\\"skis\\\" },\\n  { \\\"id\\\": 31, \\\"name\\\": \\\"snowboard\\\" },\\n  { \\\"id\\\": 32, \\\"name\\\": \\\"sports ball\\\" },\\n  { \\\"id\\\": 33, \\\"name\\\": \\\"kite\\\" },\\n  { \\\"id\\\": 34, \\\"name\\\": \\\"baseball bat\\\" },\\n  { \\\"id\\\": 35, \\\"name\\\": \\\"baseball glove\\\" },\\n  { \\\"id\\\": 36, \\\"name\\\": \\\"skateboard\\\" },\\n  { \\\"id\\\": 37, \\\"name\\\": \\\"surfboard\\\" },\\n  { \\\"id\\\": 38, \\\"name\\\": \\\"tennis racket\\\" },\\n  { \\\"id\\\": 39, \\\"name\\\": \\\"bottle\\\" },\\n  { \\\"id\\\": 40, \\\"name\\\": \\\"wine glass\\\" },\\n  { \\\"id\\\": 41, \\\"name\\\": \\\"cup\\\" },\\n  { \\\"id\\\": 42, \\\"name\\\": \\\"fork\\\" },\\n  { \\\"id\\\": 43, \\\"name\\\": \\\"knife\\\" },\\n  { \\\"id\\\": 44, \\\"name\\\": \\\"spoon\\\" },\\n  { \\\"id\\\": 45, \\\"name\\\": \\\"bowl\\\" },\\n  { \\\"id\\\": 46, \\\"name\\\": \\\"banana\\\" },\\n  { \\\"id\\\": 47, \\\"name\\\": \\\"apple\\\" },\\n  { \\\"id\\\": 48, \\\"name\\\": \\\"sandwich\\\" },\\n  { \\\"id\\\": 49, \\\"name\\\": \\\"orange\\\" },\\n  { \\\"id\\\": 50, \\\"name\\\": \\\"broccoli\\\" },\\n  { \\\"id\\\": 51, \\\"name\\\": \\\"carrot\\\" },\\n  { \\\"id\\\": 52, \\\"name\\\": \\\"hot dog\\\" },\\n  { \\\"id\\\": 53, \\\"name\\\": \\\"pizza\\\" },\\n  { \\\"id\\\": 54, \\\"name\\\": \\\"donut\\\" },\\n  { \\\"id\\\": 55, \\\"name\\\": \\\"cake\\\" },\\n  { \\\"id\\\": 56, \\\"name\\\": \\\"chair\\\" },\\n  { \\\"id\\\": 57, \\\"name\\\": \\\"sofa\\\" },\\n  { \\\"id\\\": 58, \\\"name\\\": \\\"pottedplant\\\" },\\n  { \\\"id\\\": 59, \\\"name\\\": \\\"bed\\\" },\\n  { \\\"id\\\": 60, \\\"name\\\": \\\"diningtable\\\" },\\n  { \\\"id\\\": 61, \\\"name\\\": \\\"toilet\\\" },\\n  { \\\"id\\\": 62, \\\"name\\\": \\\"tvmonitor\\\" },\\n  { \\\"id\\\": 63, \\\"name\\\": \\\"laptop\\\" },\\n  { \\\"id\\\": 64, \\\"name\\\": \\\"mouse\\\" },\\n  { \\\"id\\\": 65, \\\"name\\\": \\\"remote\\\" },\\n  { \\\"id\\\": 66, \\\"name\\\": \\\"keyboard\\\" },\\n  { \\\"id\\\": 67, \\\"name\\\": \\\"cell phone\\\" },\\n  { \\\"id\\\": 68, \\\"name\\\": \\\"microwave\\\" },\\n  { \\\"id\\\": 69, \\\"name\\\": \\\"oven\\\" },\\n  { \\\"id\\\": 70, \\\"name\\\": \\\"toaster\\\" },\\n  { \\\"id\\\": 71, \\\"name\\\": \\\"sink\\\" },\\n  { \\\"id\\\": 72, \\\"name\\\": \\\"refrigerator\\\" },\\n  { \\\"id\\\": 73, \\\"name\\\": \\\"book\\\" },\\n  { \\\"id\\\": 74, \\\"name\\\": \\\"clock\\\" },\\n  { \\\"id\\\": 75, \\\"name\\\": \\\"vase\\\" },\\n  { \\\"id\\\": 76, \\\"name\\\": \\\"scissors\\\" },\\n  { \\\"id\\\": 77, \\\"name\\\": \\\"teddy bear\\\" },\\n  { \\\"id\\\": 78, \\\"name\\\": \\\"hair drier\\\" },\\n  { \\\"id\\\": 79, \\\"name\\\": \\\"toothbrush\\\" }\\n]\\n\",\n            \"type\": \"detector\"\n        }\n    },\n    \"spec\": {\n        \"description\": \"YOLO v5 via pytorch hub\",\n        \"handler\": \"main:handler\",\n        \"runtime\": \"python:3.6\",\n        \"resources\": {\n            \"limits\": {\n                \"nvidia.com/gpu\": \"1\"\n            },\n            \"requests\": {\n                \"cpu\": \"25m\",\n                \"memory\": \"1Mi\"\n            }\n        },\n        \"image\": \"cvat/ultralytics-yolov5:latest\",\n        \"targetCPU\": 75,\n        \"triggers\": {\n            \"myHttpTrigger\": {\n                \"class\": \"\",\n                \"kind\": \"http\",\n                \"name\": \"myHttpTrigger\",\n                \"maxWorkers\": 1,\n                \"workerAvailabilityTimeoutMilliseconds\": 10000,\n                \"attributes\": {\n                    \"maxRequestBodySize\": 33554432\n                }\n            }\n        },\n        \"volumes\": [\n            {\n                \"volume\": {\n                    \"name\": \"volume-1\",\n                    \"hostPath\": {\n                        \"path\": \"/home/tlips/Documents/cvat/serverless/common\"\n                    }\n                },\n                \"volumeMount\": {\n                    \"name\": \"volume-1\",\n                    \"mountPath\": \"/opt/nuclio/common\"\n                }\n            }\n        ],\n        \"build\": {\n            \"functionConfigPath\": \"pytorch/ultralytics/yolov5/nuclio//function-gpu.yaml\",\n            \"image\": \"cvat/ultralytics-yolov5\",\n            \"baseImage\": \"ultralytics/yolov5:latest\",\n            \"directives\": {\n                \"preCopy\": [\n                    {\n                        \"kind\": \"USER\",\n                        \"value\": \"root\"\n                    },\n                    {\n                        \"kind\": \"RUN\",\n                        \"value\": \"apt update \\u0026\\u0026 apt install --no-install-recommends -y libglib2.0-0\"\n                    },\n                    {\n                        \"kind\": \"WORKDIR\",\n                        \"value\": \"/opt/nuclio\"\n                    }\n                ]\n            },\n            \"codeEntryType\": \"image\",\n            \"timestamp\": 1683015382\n        },\n        \"platform\": {\n            \"attributes\": {\n                \"mountMode\": \"volume\",\n                \"restartPolicy\": {\n                    \"maximumRetryCount\": 3,\n                    \"name\": \"always\"\n                }\n            }\n        },\n        \"readinessTimeoutSeconds\": 120,\n        \"securityContext\": {},\n        \"eventTimeout\": \"30s\"\n    },\n    \"PlatformConfig\": null\n}", "platformConfig": "{\n    \"kind\": \"local\",\n    \"webAdmin\": {\n        \"enabled\": true,\n        \"listenAddress\": \":8081\"\n    },\n    \"healthCheck\": {\n        \"enabled\": true,\n        \"listenAddress\": \":8082\"\n    },\n    \"logger\": {\n        \"sinks\": {\n            \"stdout\": {\n                \"kind\": \"stdout\"\n            }\n        },\n        \"system\": [\n            {\n                \"level\": \"debug\",\n                \"sink\": \"stdout\"\n            }\n        ],\n        \"functions\": [\n            {\n                \"level\": \"debug\",\n                \"sink\": \"stdout\"\n            }\n        ]\n    },\n    \"metrics\": {},\n    \"scaleToZero\": {\n        \"multiTargetStrategy\": \"random\"\n    },\n    \"autoScale\": {},\n    \"cronTriggerCreationMode\": \"processor\",\n    \"functionReadinessTimeout\": \"2m0s\",\n    \"ingressConfig\": {},\n    \"kube\": {\n        \"defaultServiceType\": \"ClusterIP\",\n        \"defaultFunctionPodResources\": {\n            \"requests\": {},\n            \"limits\": {}\n        }\n    },\n    \"local\": {\n        \"FunctionContainersHealthinessEnabled\": false,\n        \"FunctionContainersHealthinessTimeout\": 5000000000,\n        \"FunctionContainersHealthinessInterval\": 30000000000\n    },\n    \"imageRegistryOverrides\": {},\n    \"opa\": {\n        \"address\": \"127.0.0.1:8181\",\n        \"clientKind\": \"nop\",\n        \"requestTimeout\": 10,\n        \"permissionQueryPath\": \"/v1/data/iguazio/authz/allow\",\n        \"permissionFilterPath\": \"/v1/data/iguazio/authz/filter_allowed\"\n    },\n    \"streamMonitoring\": {\n        \"webapiURL\": \"http://v3io-webapi:8081\",\n        \"v3ioRequestConcurrency\": 64\n    }\n}"}
23.05.02 08:16:24.371 cessor.healthcheck.server (I) Listening {"listenAddress": ":8082"}
23.05.02 08:16:24.371            processor.http (D) Creating worker pool {"num": 1}
23.05.02 08:16:24.371 sor.http.w0.python.logger (D) Creating listener socket {"path": "/tmp/nuclio-rpc-ch8cdm4kja0bes7paqb0.sock"}
23.05.02 08:16:24.372 sor.http.w0.python.logger (W) Python 3.6 runtime is deprecated and will soon not be supported. Please migrate your code and use Python 3.7 runtime (`python:3.7`) or higher
23.05.02 08:16:24.372 sor.http.w0.python.logger (D) Using Python wrapper script path {"path": "/opt/nuclio/_nuclio_wrapper.py"}
23.05.02 08:16:24.372 sor.http.w0.python.logger (D) Using Python handler {"handler": "main:handler"}
23.05.02 08:16:24.372 sor.http.w0.python.logger (D) Using Python executable {"path": "/opt/conda/bin/python3"}
23.05.02 08:16:24.372 sor.http.w0.python.logger (D) Setting PYTHONPATH {"value": "PYTHONPATH=/opt/nuclio"}
23.05.02 08:16:24.372 sor.http.w0.python.logger (D) Running wrapper {"command": "/opt/conda/bin/python3 -u /opt/nuclio/_nuclio_wrapper.py --handler main:handler --socket-path /tmp/nuclio-rpc-ch8cdm4kja0bes7paqb0.sock --platform-kind local --namespace nuclio --worker-id 0 --trigger-kind http --trigger-name myHttpTrigger --decode-event-strings"}
/opt/nuclio/_nuclio_wrapper.py:395: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()
23.05.02 08:16:26.091 sor.http.w0.python.logger (I) Wrapper connected {"wid": 0, "pid": 19}
23.05.02 08:16:26.091 sor.http.w0.python.logger (D) Waiting for start
{"datetime": "2023-05-02 08:16:26,091", "level": "info", "message": "Replacing logger output", "with": {"handler_name": "default", "worker_id": "0"}}
23.05.02 08:16:26.091 sor.http.w0.python.logger (I) Init context...  0% {"worker_id": "0"}
/opt/conda/lib/python3.10/site-packages/torch/hub.py:286: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip
requirements: /root/.cache/torch/hub/requirements.txt not found, check failed.
YOLOv5 🚀 2023-5-2 Python-3.10.9 torch-2.0.0 CUDA:0 (NVIDIA GeForce RTX 3090, 24260MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100%|██████████| 14.1M/14.1M [00:01<00:00, 8.44MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 
23.05.02 08:16:36.209 sor.http.w0.python.logger (I) Init context...100% {"worker_id": "0"}
23.05.02 08:16:36.209 sor.http.w0.python.logger (D) Started
23.05.02 08:16:36.209                 processor (I) Starting event timeout watcher {"timeout": "30s"}
23.05.02 08:16:36.209 .webadmin.server.triggers (D) Registered custom route {"routeName": "triggers", "stream": false, "pattern": "/{id}/stats", "method": "GET"}
23.05.02 08:16:36.209 processor.webadmin.server (D) Registered resource {"name": "triggers"}
23.05.02 08:16:36.209                 processor (W) No metric sinks configured, metrics will not be published
23.05.02 08:16:36.209                 processor (D) Starting triggers {"triggersError": "json: unsupported value: encountered a cycle via *http.http"}
23.05.02 08:16:36.214            processor.http (I) Starting {"listenAddress": ":8080", "readBufferSize": 16384, "maxRequestBodySize": 33554432, "reduceMemoryUsage": false, "cors": null}
23.05.02 08:16:36.214 processor.webadmin.server (I) Listening {"listenAddress": ":8081"}
23.05.02 08:16:36.214                 processor (D) Processor started
 

Same issue with same logs… Not working with GPU or CPU.

For those still having “ERROR django.request: Service Unavailable” issues, I found yaochenglouis’s answer at #2641 worked for me. Was just a firewall issue - do “ufw allow” on the port the Nuclio function is listening on.

I’m also having the same issue, No “AI Tool”, either SAM interactor or YOLO detectors, are working on GPU or CPU on my desktop. However, they are working fine on my laptop, where I used identical installation steps. The laptop has lesser hardware in every respect than my desktop.

My desktop has openvpn and ssh servers installed. They aren’t running in docker. I don’t think this should affect anything, but I can’t think of why else my computer would differ significantly from any other.

The operation simply times out, with the error message shown in the following image cvat_issue

Here are logs from nuclio-nuclio-pth.facebookresearch.sam.vit_h container, on both my laptop and desktop. This one is the only one that looked significantly different between the desktop and laptop, at least to my eye. I can see two calls to the call handler on my laptop which correspond to the two times I used SAM. On my desktop, there are no lines corresponding to the call handler. So it seems the container built from nuclio-nuclio-pth.facebookresearch.sam.vit_h is not communicating with the others properly?

Desktop:

serverless$ docker logs nuclio-nuclio-pth.facebookresearch.sam.vit_h | tail
/opt/nuclio/_nuclio_wrapper.py:395: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()
/opt/nuclio/_nuclio_wrapper.py:395: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()
23.04.30 15:38:52.392 sor.http.w0.python.logger (I) Init context...100% {"worker_id": "0"}
23.04.30 15:38:52.392 sor.http.w0.python.logger (D) Started
23.04.30 15:38:52.392                 processor (I) Starting event timeout watcher {"timeout": "30s"}
23.04.30 15:38:52.392 .webadmin.server.triggers (D) Registered custom route {"routeName": "triggers", "stream": false, "pattern": "/{id}/stats", "method": "GET"}
23.04.30 15:38:52.392 processor.webadmin.server (D) Registered resource {"name": "triggers"}
23.04.30 15:38:52.392                 processor (W) No metric sinks configured, metrics will not be published
23.04.30 15:38:52.392                 processor (D) Starting triggers {"triggersError": "json: unsupported value: encountered a cycle via *http.http"}
23.04.30 15:38:52.393            processor.http (I) Starting {"listenAddress": ":8080", "readBufferSize": 16384, "maxRequestBodySize": 33554432, "reduceMemoryUsage": false, "cors": null}
23.04.30 15:38:52.393 processor.webadmin.server (I) Listening {"listenAddress": ":8081"}
23.04.30 15:38:52.393                 processor (D) Processor started

Laptop

serverless$ docker logs nuclio-nuclio-pth.facebookresearch.sam.vit_h  | tail
/opt/nuclio/_nuclio_wrapper.py:395: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()
/opt/nuclio/_nuclio_wrapper.py:395: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()
23.04.30 15:33:17.291                 processor (I) Starting event timeout watcher {"timeout": "30s"}
23.04.30 15:33:17.293 .webadmin.server.triggers (D) Registered custom route {"routeName": "triggers", "stream": false, "pattern": "/{id}/stats", "method": "GET"}
23.04.30 15:33:17.293 processor.webadmin.server (D) Registered resource {"name": "triggers"}
23.04.30 15:33:17.294                 processor (W) No metric sinks configured, metrics will not be published
23.04.30 15:33:17.294                 processor (D) Starting triggers {"triggersError": "json: unsupported value: encountered a cycle via *http.http"}
23.04.30 15:33:17.307            processor.http (I) Starting {"listenAddress": ":8080", "readBufferSize": 16384, "maxRequestBodySize": 33554432, "reduceMemoryUsage": false, "cors": null}
23.04.30 15:33:17.310 processor.webadmin.server (I) Listening {"listenAddress": ":8081"}
23.04.30 15:33:17.310                 processor (D) Processor started
23.04.30 15:35:48.483 sor.http.w0.python.logger (I) call handler {"worker_id": "0"}
23.04.30 15:37:04.823 sor.http.w1.python.logger (I) call handler {"worker_id": "1"}

Supplementary Info

To install, I followed these steps.

  1. Installed CVAT, using these instructions. I followed them only up to the git clone.
  2. I downloaded nuctl as described here, 2nd bullet. I created a project for cvat as described there, but I didn’t deploy any of the models.
  3. I followed these instructions to deploy the segment anything model, i.e. this command, cd serverless && ./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/
  4. Finally, I used the docker up command from back here, 1st bullet to get cvat and nuclio running.
  5. I go to localhost:8080 in Chrome, create a project and task, upload any image you’d like (I used this).
  6. Once I select Segment Anything in the interactors under AI Tools and click somewhere on the image, it buffers for 10s of seconds, and then returns with an error message about timing out. If I monitor htop while it is running, I see nothing happening after I click.
  • Docker version docker version (e.g. Docker 17.0.05): 23.0.5
  • Are you using Docker Swarm or Kubernetes? No
  • Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 20.04

I’ve got exactly the same issue. The weird this is that I got it to work on CPU on one computer and then followed the exact same steps on another and there it doesn’t work.

In the docker logs I’ve found this:

2023-04-20 09:15:55,536 DEBG 'runserver' stderr output:
[Thu Apr 20 09:15:55.536775 2023] [wsgi:error] [pid 337:tid 140222251001408] [remote 172.19.0.6:48534] WARNING:django.request:Not Found: /api/functions/requests/


cvat_server             | 2023-04-20 09:09:43,395 DEBG 'runserver' stderr output:
cvat_server             | [Thu Apr 20 09:09:43.395696 2023] [wsgi:error] [pid 337:tid 140222267786816] [remote 172.19.0.6:45738] [2023-04-20 09:09:43,395] ERROR django.request: Service Unavailable: /api/lambda/functions/pth.facebookresearch.sam.vit_h
cvat_server             | 
cvat_server             | 2023-04-20 09:09:43,396 DEBG 'runserver' stderr output:
cvat_server             | [Thu Apr 20 09:09:43.395858 2023] [wsgi:error] [pid 337:tid 140222267786816] [remote 172.19.0.6:45738] ERROR:django.request:Service Unavailable: /api/lambda/functions/pth.facebookresearch.sam.vit_h

I found no further errors in the nuclio container and the sam container.

I’ve tested with another serverless function and there I got the same issue. Currently I run with the env CVAT_HOST, but I got the same behaviour without.

So I suspect that there might be some communication issues between the cvat containers, but I don’t really see a clear way to debug this.