server: pinned_memory_manager Killed

Description I want to deploy Triton server via Azure Kubernetes Service. Target node is ND96asr v4 which is equipped with 8 A100 GPUs. Triton server without loading any models cannot startup successfully.

Triton Information

  • triton: nvcr.io/nvidia/tritonserver:21.07-py3
  • azure: ND96asr v4

To Reproduce

  1. prepare cluster To create cluster you follow the procedure of the azure gpu-cluster article https://docs.microsoft.com/ja-jp/azure/aks/gpu-cluster.
az aks nodepool add \
   --resource-group myResourceGroup \
   --cluster-name myAKSCluster \
   --name gpunp \
   --node-count 1 \
   --node-vm-size Standard_NC6 \
   --node-taints sku=gpu:NoSchedule \
   --aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true
  1. deploy via deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-triton-ft
  namespace: modules-gpt3-6b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: sample
        image: nvcr.io/nvidia/tritonserver:21.07-py3
        command: ["/bin/sh"]
        args: ["-c", "while true; do sleep 10;done"]
      tolerations:
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
  1. login the pod and run mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/

  2. confirm outputs

root@sample2-7cb48985d9-lgzfc:/opt/tritonserver# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/
I0404 12:44:49.449929 92 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450370 92 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450406 92 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450431 92 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450454 92 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450483 92 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450504 92 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A100-SXM4-40GB
I0404 12:44:49.450531 92 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A100-SXM4-40GB
I0404 12:44:50.485665 92 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 12:44:50.485729 92 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 12:44:50.485738 92 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 12:44:51.056099: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 12:44:51.247146 92 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 12:44:51.247200 92 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.247209 92 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 12:44:51.247216 92 tensorflow.cc:2209] backend configuration:
{}
I0404 12:44:51.249647 92 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 12:44:51.249678 92 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.249687 92 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 12:44:51.343681 92 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 12:44:51.343707 92 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 12:44:51.343715 92 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node sample2-7cb48985d9-lgzfc exited on signal 9 (Killed).
--------------------------------------------------------------------------

When startup without mpirun, Killed is observed.

root@sample2-7cb48985d9-lgzfc:/opt/tritonserver# tritonserver --model-repository=/a
I0404 12:57:33.566547 197 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566814 197 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566832 197 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566844 197 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566856 197 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566870 197 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566880 197 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A100-SXM4-40GB
I0404 12:57:33.566893 197 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A100-SXM4-40GB
I0404 12:57:34.057968 197 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 12:57:34.058020 197 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.058025 197 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 12:57:34.267157: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 12:57:34.351845 197 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 12:57:34.351893 197 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.351908 197 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 12:57:34.351912 197 tensorflow.cc:2209] backend configuration:
{}
I0404 12:57:34.353170 197 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 12:57:34.353190 197 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.353200 197 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 12:57:34.376199 197 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 12:57:34.376221 197 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 12:57:34.376225 197 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
Killed

Expected behavior startup successfully. The following output is node with 1 gpu.

root@gpt1b:/workspace# mpirun -n 1 --allow-run-as-root tritonserver --model-repository=/a
I0404 11:55:52.082112 69 metrics.cc:290] Collecting metrics for GPU 0: Tesla V100-PCIE-16GB
I0404 11:55:52.375557 69 libtorch.cc:998] TRITONBACKEND_Initialize: pytorch
I0404 11:55:52.375599 69 libtorch.cc:1008] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.375605 69 libtorch.cc:1014] 'pytorch' TRITONBACKEND API version: 1.4
2022-04-04 11:55:52.524003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0404 11:55:52.570841 69 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0404 11:55:52.570874 69 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.570880 69 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I0404 11:55:52.570884 69 tensorflow.cc:2209] backend configuration:
{}
I0404 11:55:52.573942 69 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0404 11:55:52.573973 69 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.573979 69 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0404 11:55:52.595485 69 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0404 11:55:52.595508 69 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0404 11:55:52.595513 69 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
I0404 11:55:53.062644 69 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f945c000000' with size 268435456
I0404 11:55:53.063056 69 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0404 11:55:53.063869 69 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0404 11:55:53.063923 69 server.cc:543]
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0404 11:55:53.063941 69 server.cc:586]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0404 11:55:53.064038 69 tritonserver.cc:1718]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.12.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /a                                                                                                                                                                                     |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                               |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0404 11:55:53.065759 69 grpc_server.cc:4072] Started GRPCInferenceService at 0.0.0.0:8001
I0404 11:55:53.065984 69 http_server.cc:2795] Started HTTPService at 0.0.0.0:8000
I0404 11:55:53.107932 69 sagemaker_server.cc:134] Started Sagemaker HTTPService at 0.0.0.0:8080
I0404 11:55:53.160626 69 http_server.cc:162] Started Metrics Service at 0.0.0.0:8002

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

[44517.301476] memory: usage 131072kB, limit 131072kB, failcnt 15691

Try adding more memory to your K8s Pod via deployment.yaml to avoid OOM.

Marking it as a bug and will investigate more into why triton is going OOM.

I see. One more experiment. Can you try this command?

tritonserver --model-repository=/workspace --pinned-memory-pool-byte-size=0 --cuda-memory-pool-byte-size=0:0 --cuda-memory-pool-byte-size=1:0 --cuda-memory-pool-byte-size=2:0 --cuda-memory-pool-byte-size=3:0 --cuda-memory-pool-byte-size=4:0 --cuda-memory-pool-byte-size=5:0 --cuda-memory-pool-byte-size=6:0 --cuda-memory-pool-byte-size=7:0

Do you still see the failure? Read more about --cuda-memory-pool-byte-size from here: https://github.com/triton-inference-server/server/blob/main/src/main.cc#L555

64 MB should not be a great deal for 40GB gpus and 900GB machine. Most likely it is an issue with your environment. Trying to narrow down the same with these experiments.

So Triton will attempt to load some backend (framework) libraries when it starts (i.e. I0404 11:55:52.570841 69 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow), and if you remove all the backends shipped in /opt/tritonserver/backends/ (rm -r /opt/tritonserver/backends/*), then Triton will have to start without any backends:

I0408 23:50:11.814732 817 server.cc:576] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

We can narrow down the scope depending on whether Triton starts successfully.

I don’t know why it may cause OOM even when no model is loaded. Would it also worth a try to remove the backend shared libraries so Triton will start without loading any framework libraries?