serving: Unexpectedly high CPU utilization of Docker container; even without models.

System Information

Host OS: Microsoft Windows 10 Pro (Version 10.0.19041 Build 19041)
TensorFlow Serving installed from Docker: tensorflow/serving:latest-devel and Ubuntu:18.04
Docker Desktop: 2.3.0.4 (46911) (using Linux containers)

The Problem

TensorFlow Servings docker container uses significantly more CPU than expected/desirable.

Evaluation

docker stats shows the following:

CONTAINER ID        NAME                                            CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
0e5fca607055        linguistic-intelligence_tensorflow-servings_1   14.75%              24.26MiB / 12.43GiB   0.19%               1.19kB / 0B         0B / 0B             29

Notice the container is reported as using 14.75% CPU. This is after running for 15 minutes.

And here’s the issue: This is with zero models loaded. Here, the sustained CPU utilization is around 6-17%. (If models had been loaded the sustained CPU utilization is around 25%.)

Expected Behaviour:

Of course, since TSF has been delegated zero work to do, it’s expected that CPU utilization would be minimal.

Exact Steps to Reproduce

Running the Docker Container:

docker-compose up tensorflow-servings

docker-compose.yaml:

  tensorflow-servings:
    build:
      context: .
      dockerfile: tensorflow-servings.dockerfile
    image: tensorflow-servings
    ports:
      - 8500:8500
      - 8501:8501
    volumes:
      - ./ml-models/:/models/:ro
      - ./ml-servings-config/:/config/:ro
    deploy:
      resources:
        limits:
          memory: 4Gb
    environment:
      - MODEL_CONFIG_FILE_POLL_WAIT_SECONDS=0

tensorflow-servings.dockerfile:

ARG TF_SERVING_VERSION=latest
ARG TF_SERVING_BUILD_IMAGE=tensorflow/serving:${TF_SERVING_VERSION}-devel

FROM ${TF_SERVING_BUILD_IMAGE} as build_image
FROM ubuntu:18.04

ARG TF_SERVING_VERSION_GIT_BRANCH=master
ARG TF_SERVING_VERSION_GIT_COMMIT=head

LABEL tensorflow_serving_github_branchtag=${TF_SERVING_VERSION_GIT_BRANCH}
LABEL tensorflow_serving_github_commit=${TF_SERVING_VERSION_GIT_COMMIT}

RUN apt-get update && apt-get install -y --no-install-recommends \
        ca-certificates \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install TF Serving pkg
COPY --from=build_image /usr/local/bin/tensorflow_model_server /usr/bin/tensorflow_model_server

# PORTS:
# gRPC
EXPOSE 8500
# REST
EXPOSE 8501

# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models/

# Embed the models
COPY ./ml-models /models
COPY ./ml-servings-config /config

# Create a script that runs the model server so we can use environment variables
# while also passing in arguments from the docker command line
RUN echo '#!/bin/bash \n\n\
tensorflow_model_server \
--allow_version_labels_for_unavailable_models=true \
--batching_parameters_file=/config/batching_parameters.txt \
--enable_batching=true \
--model_base_path=${MODEL_BASE_PATH} \
--model_config_file=/config/all_models.config \
--model_config_file_poll_wait_seconds=60 \
--monitoring_config_file=/config/monitoring_config.txt \
--port=8500 \
--rest_api_port=8501 \
--rest_api_timeout_in_ms=30000 \
"$@"' > /usr/bin/tf_serving_entrypoint.sh \
&& chmod +x /usr/bin/tf_serving_entrypoint.sh

ENTRYPOINT ["/usr/bin/tf_serving_entrypoint.sh"]

all_models.config:

model_config_list {

}

Logs: When starting the container:

$ docker-compose up tensorflow-servings
WARNING: Some services (natural-language-server, tensorflow-servings) use the 'deploy' key, which will be ignored. Compose does not support 'deploy' configuration - use `docker stack deploy` to deploy to a swarm.
Creating linguistic-intelligence_tensorflow-servings_1 ... done
Attaching to linguistic-intelligence_tensorflow-servings_1
tensorflow-servings_1      | 2020-09-07 09:27:05.197858: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:27:05.204117: I tensorflow_serving/model_servers/server.cc:367] Running gRPC ModelServer at 0.0.0.0:8500 ...
tensorflow-servings_1      | 2020-09-07 09:27:05.207537: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | [warn] getaddrinfo: address family for nodename not supported
tensorflow-servings_1      | [evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
tensorflow-servings_1      | 2020-09-07 09:27:05.215164: I tensorflow_serving/model_servers/server.cc:387] Exporting HTTP/REST API at:localhost:8501 ...
tensorflow-servings_1      | 2020-09-07 09:28:05.205505: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:29:05.206871: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:30:05.206192: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:31:05.206607: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:32:05.207374: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:33:05.207560: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:34:05.207728: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:35:05.206796: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:36:05.213456: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:37:05.207654: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:38:05.208601: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:39:05.208775: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:40:05.209190: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:41:05.209411: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:42:05.210525: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:43:05.210102: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
tensorflow-servings_1      | 2020-09-07 09:44:05.210034: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.

batching_parameters.txt:

max_batch_size { value: 64 }
batch_timeout_micros { value: 0 }
num_batch_threads { value: 4 }
pad_variable_length_inputs: true

Thanks for your attention.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 15 (5 by maintainers)

Most upvoted comments

I try to find the cause to solve this, here is something I found

perf shows that the serving binary use cpu to call some funtion in stdc++

[Switching to thread 3 (Thread 0x7efbfa59d700 (LWP 29))]
-   96.42%     0.00%  tensorflow_mode  libstdc++.so.6.0.28      [.] 0x00007efc03096d84
   - 0x7efc03096d84
      - 93.85% 0x562e4362cbf8
         - 84.45% 0x562e42038fb1
            - 84.15% 0x562e48342359
               - 84.15% 0x562e48342d33
                  - 57.28% 0x562e48342873
                     - 57.28% 0x562e48343fa9
                        - 47.28% 0x7efbfaec289d
                           - 46.57% system_call_fastpath
                              - 45.26% sys_futex
                                 - 43.58% do_futex
                                    - 40.88% futex_wait
                                       - 34.10% futex_wait_queue_me
                                          - 28.93% schedule
                                             - 28.30% __schedule
                                                - 11.27% apic_timer_interrupt
                                                   + smp_apic_timer_interrupt
                                                + 8.94% finish_task_switch
                                                + 5.44% deactivate_task
                                                  0.55% _raw_qspin_lock
                                               0.63% deactivate_task
                                          + 4.41% hrtimer_start_range_ns
                                       + 2.18% futex_wait_setup
                                       + 1.73% apic_timer_interrupt
                                         1.05% hash_futex
                                         0.72% hrtimer_init
                                      1.94% _raw_qspin_lock
                                    + 0.76% apic_timer_interrupt
                                   0.97% futex_wait
                                1.31% copy_user_enhanced_fast_string
                          1.52% 0x7efbfaec28af
                          1.43% 0x7efbfaec28b6
                          1.39% 0x7efbfafde4cb
                          1.05% 0x562e41cad6a0
                        + 0.84% 0x562e48342ffd
                          0.80% 0x562e48343000
                          0.59% 0x7efbfaec28a6
                  + 11.50% 0x562e48342766
                  + 4.76% 0x562e483427e8
                  + 2.72% 0x562e4834289c

and gdb shows the function name

[Switching to thread 3 (Thread 0x7efbfa59d700 (LWP 29))]
#0  0x00007efbfaec289d in syscall () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007efbfaec289d in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562e48342ff3 in nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) ()
#2  0x0000562e48343fa9 in nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) ()
#3  0x0000562e48342873 in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) ()
#4  0x0000562e48342d33 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) ()
#5  0x0000562e48342359 in tensorflow::internal::wait_until_system_clock(tensorflow::internal::CVData*, tensorflow::internal::MuData*, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) ()
#6  0x0000562e42038fb1 in tensorflow::serving::SharedBatchScheduler<tensorflow::serving::BatchingSessionTask>::ThreadLogic() ()
#7  0x0000562e4362cbf8 in tensorflow::serving::PeriodicFunction::RunLoop(long long) ()
#8  0x00007efc03096d84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007efbfafd0609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007efbfaec9293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

The call stack seems

tensorflow/core/kernels/batching_util/shared_batch_scheduler.h
- SharedBatchScheduler<TaskType>::ThreadLogic
  - WaitForMilliseconds(&l, &schedulable_batch_cv_, kTimeoutMillis); kTimeoutMillis is 1ms
tensorflow/core/platform/mutex.h
- WaitForMilliseconds inline
tensorflow/core/platform/default/mutex.h not sure
- wait_for
tensorflow/core/platform/default/mutex.cc
- wait_until_system_clock

I cannot find why there is such high cpu utilization, hope someone can help.

jasonhancn on Dec 9, 2020

Hi @ericmclachlan, sorry for the inconvenience. We have an internal task tracking this resource inefficiency issue, will keep you posted when it gets fixed. Meanwhile, feel free to contribute the repo directly if you have any ideas about the batch scheduler.

guanxinq on Oct 13, 2020

Eric is right. The unexpected high CPU utilization is caused by the batching configuration. The four threads will keep busy waiting for requests and thus consume resources. The CPU usage and the num_batch_threads are proportional.

guanxinq on Oct 3, 2020