llama-gpt: Unable to build CUDA-enabled image, missing Makefile

Attaching to llama-gpt-llama-gpt-api-cuda-ggml-1, llama-gpt-llama-gpt-ui-1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait]  docker-compose-wait 2.12.1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] ---------------------------
llama-gpt-llama-gpt-ui-1             | [DEBUG wait] Starting with configuration:
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Hosts to be waiting for: [llama-gpt-api-cuda-ggml:8000]
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Paths to be waiting for: []
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Timeout before failure: 3600 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - TCP connection timeout before retry: 5 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time before checking for hosts/paths availability: 0 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time once all hosts/paths are available: 0 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time between retries: 1 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Checking availability of host [llama-gpt-api-cuda-ggml:8000]
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | ==========
llama-gpt-llama-gpt-api-cuda-ggml-1  | == CUDA ==
llama-gpt-llama-gpt-api-cuda-ggml-1  | ==========
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | CUDA Version 12.1.1
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
llama-gpt-llama-gpt-api-cuda-ggml-1  | By pulling and using the container, you accept the terms and conditions of this license:
llama-gpt-llama-gpt-api-cuda-ggml-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | /models/llama-2-7b-chat.bin model found.
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1 exited with code 0
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | /models/llama-2-7b-chat.bin model found.
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-api-cuda-ggml-1 exited with code 0
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | /models/llama-2-7b-chat.bin model found.
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1 exited with code 132
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | /models/llama-2-7b-chat.bin model found.
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-api-cuda-ggml-1 exited with code 132

Run on Ubuntu Server 22.04.3. It looks like there isn’t a Makefile included for the CUDA container that the corresponding run.sh is looking for.

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Reactions: 1
  • Comments: 17

Most upvoted comments

any update please. keeps spaming make: *** No rule to make target ‘build’.

@tstechnologies the makefile error will always be there as there is not a file. I’m not sure if it is actually meant to be there or not. I also have not been able to get this working on Arch and started using ollama instead which was been flawless. It definitely appears to related to deps and drivers.

You’ll notice the loop happening because the cuda image keeps crashing when the server attempts to start

BIG UPS for that recommendation holy moly. ollama-webui has been the solution I’ve been seeking

@tstechnologies the makefile error will always be there as there is not a file. I’m not sure if it is actually meant to be there or not. I also have not been able to get this working on Arch and started using ollama instead which was been flawless. It definitely appears to related to deps and drivers.

You’ll notice the loop happening because the cuda image keeps crashing when the server attempts to start

For anyone here having issues, update ARG CUDA_IMAGE="12.3.1-devel-ubuntu22.04" to match the version you have running on your system, Default is 12.1.1.

❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

@Wh1t3Fox Sure. I just noticed the Makefile error still appears, but things just work. Literally just removed all docker containers and cloned the repo again just to get this output:

~/Programs/llama-gpt$ ./run.sh --with-cuda
No model value provided. Defaulting to 7b. If you want to change the model, exit the script and use --model to provide the model value.
Supported models are 7b, 13b, 70b, code-7b, code-13b, code-34b.
[+] Building 1.4s (30/30) FINISHED                                                                                                                                                                  docker:default
 => [llama-gpt-ui internal] load build definition from Dockerfile                                                                                                                                             0.0s
 => => transferring dockerfile: 859B                                                                                                                                                                          0.0s
 => [llama-gpt-ui internal] load .dockerignore                                                                                                                                                                0.0s
 => => transferring context: 82B                                                                                                                                                                              0.0s
 => [llama-gpt-api-cuda-ggml internal] load .dockerignore                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                                               0.0s
 => [llama-gpt-api-cuda-ggml internal] load build definition from ggml.Dockerfile                                                                                                                             0.0s
 => => transferring dockerfile: 958B                                                                                                                                                                          0.0s
 => [llama-gpt-ui internal] load metadata for ghcr.io/ufoscout/docker-compose-wait:latest                                                                                                                     0.7s
 => [llama-gpt-ui internal] load metadata for docker.io/library/node:19-alpine                                                                                                                                1.2s
 => [llama-gpt-api-cuda-ggml internal] load metadata for docker.io/nvidia/cuda:12.1.1-devel-ubuntu22.04                                                                                                       1.2s
 => [llama-gpt-api-cuda-ggml 1/5] FROM docker.io/nvidia/cuda:12.1.1-devel-ubuntu22.04@sha256:7012e535a47883527d402da998384c30b936140c05e2537158c80b8143ee7425                                                 0.0s
 => [llama-gpt-api-cuda-ggml internal] load build context                                                                                                                                                     0.0s
 => => transferring context: 3.62kB                                                                                                                                                                           0.0s
 => [llama-gpt-ui base 1/3] FROM docker.io/library/node:19-alpine@sha256:8ec543d4795e2e85af924a24f8acb039792ae9fe8a42ad5b4bf4c277ab34b62e                                                                     0.0s
 => [llama-gpt-ui internal] load build context                                                                                                                                                                0.1s
 => => transferring context: 1.28MB                                                                                                                                                                           0.0s
 => [llama-gpt-ui] FROM ghcr.io/ufoscout/docker-compose-wait:latest@sha256:ee1b58447dcf9ae2aaf84e5904ffc00ed5a983bf986535b19aeb6f2d4a7ceb8a                                                                   0.0s
 => CACHED [llama-gpt-api-cuda-ggml 2/5] RUN apt-get update && apt-get upgrade -y     && apt-get install -y git build-essential     python3 python3-pip gcc wget     ocl-icd-opencl-dev opencl-headers clinf  0.0s
 => CACHED [llama-gpt-api-cuda-ggml 3/5] COPY . .                                                                                                                                                             0.0s
 => CACHED [llama-gpt-api-cuda-ggml 4/5] RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings                                        0.0s
 => CACHED [llama-gpt-api-cuda-ggml 5/5] RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78                                                                                0.0s
 => [llama-gpt-api-cuda-ggml] exporting to image                                                                                                                                                              0.0s
 => => exporting layers                                                                                                                                                                                       0.0s
 => => writing image sha256:66bb4b0e40422bc1e57962061a253e7354b1673361d6f82f89dc521992b47272                                                                                                                  0.0s
 => => naming to docker.io/library/llama-gpt-llama-gpt-api-cuda-ggml                                                                                                                                          0.0s
 => CACHED [llama-gpt-ui base 2/3] WORKDIR /app                                                                                                                                                               0.0s
 => CACHED [llama-gpt-ui base 3/3] COPY package*.json ./                                                                                                                                                      0.0s
 => CACHED [llama-gpt-ui dependencies 1/1] RUN npm ci                                                                                                                                                         0.0s
 => CACHED [llama-gpt-ui production 3/9] COPY --from=dependencies /app/node_modules ./node_modules                                                                                                            0.0s
 => CACHED [llama-gpt-ui build 1/2] COPY . .                                                                                                                                                                  0.0s
 => CACHED [llama-gpt-ui build 2/2] RUN npm run build                                                                                                                                                         0.0s
 => CACHED [llama-gpt-ui production 4/9] COPY --from=build /app/.next ./.next                                                                                                                                 0.0s
 => CACHED [llama-gpt-ui production 5/9] COPY --from=build /app/public ./public                                                                                                                               0.0s
 => CACHED [llama-gpt-ui production 6/9] COPY --from=build /app/package*.json ./                                                                                                                              0.0s
 => CACHED [llama-gpt-ui production 7/9] COPY --from=build /app/next.config.js ./next.config.js                                                                                                               0.0s
 => CACHED [llama-gpt-ui production 8/9] COPY --from=build /app/next-i18next.config.js ./next-i18next.config.js                                                                                               0.0s
 => CACHED [llama-gpt-ui production 9/9] COPY --from=ghcr.io/ufoscout/docker-compose-wait:latest /wait /wait                                                                                                  0.0s
 => [llama-gpt-ui] exporting to image                                                                                                                                                                         0.0s
 => => exporting layers                                                                                                                                                                                       0.0s
 => => writing image sha256:54f25f18841b9f9e211026f055d2acd5d7400cd229148d550b390c13c71b2f58                                                                                                                  0.0s
 => => naming to docker.io/library/llama-gpt-llama-gpt-ui                                                                                                                                                     0.0s
[+] Running 2/2
 ✔ Container llama-gpt-llama-gpt-api-cuda-ggml-1  Created                                                                                                                                                     0.1s 
 ✔ Container llama-gpt-llama-gpt-ui-1             Created                                                                                                                                                     0.1s 
Attaching to llama-gpt-llama-gpt-api-cuda-ggml-1, llama-gpt-llama-gpt-ui-1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait]  docker-compose-wait 2.12.1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] ---------------------------
llama-gpt-llama-gpt-ui-1             | [DEBUG wait] Starting with configuration:
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Hosts to be waiting for: [llama-gpt-api-cuda-ggml:8000]
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Paths to be waiting for: []
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Timeout before failure: 3600 seconds 
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - TCP connection timeout before retry: 5 seconds 
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time before checking for hosts/paths availability: 0 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time once all hosts/paths are available: 0 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time between retries: 1 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Checking availability of host [llama-gpt-api-cuda-ggml:8000]
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | ==========
llama-gpt-llama-gpt-api-cuda-ggml-1  | == CUDA ==
llama-gpt-llama-gpt-api-cuda-ggml-1  | ==========
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | CUDA Version 12.1.1
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
llama-gpt-llama-gpt-api-cuda-ggml-1  | By pulling and using the container, you accept the terms and conditions of this license:
llama-gpt-llama-gpt-api-cuda-ggml-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | Model file not found. Downloading...
llama-gpt-llama-gpt-api-cuda-ggml-1  | curl is not installed. Installing...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
llama-gpt-llama-gpt-api-cuda-ggml-1  | Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1576 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1304 kB]
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Fetched 3217 kB in 2s (1546 kB/s)
llama-gpt-llama-gpt-api-cuda-ggml-1  | Reading package lists...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Reading package lists...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Building dependency tree...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Reading state information...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | The following additional packages will be installed:
llama-gpt-llama-gpt-api-cuda-ggml-1  |   libcurl4
llama-gpt-llama-gpt-api-cuda-ggml-1  | The following NEW packages will be installed:
llama-gpt-llama-gpt-api-cuda-ggml-1  |   curl libcurl4
llama-gpt-llama-gpt-api-cuda-ggml-1  | 0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Need to get 483 kB of archives.
llama-gpt-llama-gpt-api-cuda-ggml-1  | After this operation, 1260 kB of additional disk space will be used.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libcurl4 amd64 7.81.0-1ubuntu1.15 [289 kB]
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 curl amd64 7.81.0-1ubuntu1.15 [194 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | debconf: delaying package configuration, since apt-utils is not installed
llama-gpt-llama-gpt-api-cuda-ggml-1  | Fetched 483 kB in 1s (403 kB/s)
llama-gpt-llama-gpt-api-cuda-ggml-1  | Selecting previously unselected package libcurl4:amd64.
(Reading database ... 18739 files and directories currently installed.)
llama-gpt-llama-gpt-api-cuda-ggml-1  | Preparing to unpack .../libcurl4_7.81.0-1ubuntu1.15_amd64.deb ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Unpacking libcurl4:amd64 (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Selecting previously unselected package curl.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Preparing to unpack .../curl_7.81.0-1ubuntu1.15_amd64.deb ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Unpacking curl (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Setting up libcurl4:amd64 (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Setting up curl (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Processing triggers for libc-bin (2.35-0ubuntu3.5) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
llama-gpt-llama-gpt-api-cuda-ggml-1  |                                  Dload  Upload   Total   Spent    Left  Speed
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
100  1258  100  1258    0     0   2274      0 --:--:-- --:--:-- --:--:--  2278
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
100 3616M  100 3616M    0     0  21.7M      0  0:02:45  0:02:45 --:--:-- 20.3M
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-api-cuda-ggml-1  | ggml_init_cublas: found 1 CUDA devices:
llama-gpt-llama-gpt-api-cuda-ggml-1  |   Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | /usr/local/lib/python3.10/dist-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-cuda-ggml-1  |   warnings.warn(
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama.cpp: loading model from /models/llama-2-7b-chat.bin
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: format     = ggjt v3 (latest)
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_vocab    = 32000
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_ctx      = 4096
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_embd     = 4096
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_mult     = 5504
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_head     = 32
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_head_kv  = 32
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_layer    = 32
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_rot      = 128
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_gqa      = 1
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: rnorm_eps  = 5.0e-06
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_ff       = 11008
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: freq_base  = 10000.0
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: freq_scale = 1
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: model size = 7B
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: ggml ctx size =    0.08 MB
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: using CUDA for GPU acceleration
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: mem required  = 3055.79 MB (+ 2048.00 MB per state)
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 512 MB VRAM for the scratch buffer
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: offloading 10 repeating layers to GPU
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: offloaded 10/35 layers to GPU
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: total VRAM used: 1598 MB
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_new_context_with_model: kv self size  = 2048.00 MB
llama-gpt-llama-gpt-api-cuda-ggml-1  | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Started server process [1]
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Waiting for application startup.
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Application startup complete.
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] is now available!
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait] docker-compose-wait - Everything's fine, the application can now start!
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | 
llama-gpt-llama-gpt-ui-1             | > ai-chatbot-starter@0.1.0 start
llama-gpt-llama-gpt-ui-1             | > next start
llama-gpt-llama-gpt-ui-1             | 
llama-gpt-llama-gpt-ui-1             | ready - started server on 0.0.0.0:3000, url: http://localhost:3000