tensorflow: Cannot confine TensorFlow C API to use not more than 1 threads in total

TensorFlow C API is generating at least one thread on each of the available CPUs. The available instructions/guidlines do not take effect to confine TensorFlow C API to only one CPU. I have 8 CPUs and want TensorFlow C API to use only 1, thus generating one and only one thread. How can I confine TensorFlow to use only one CPU (and only one thread) out of available CPUs? Processor (lscpu command on Ubuntu): Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel® Core™ i5-8250U CPU @ 1.60GHz Stepping: 10 CPU MHz: 1122.143 CPU max MHz: 3400.0000 CPU min MHz: 400.0000 BogoMIPS: 3600.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 The following code can reduce the number of threads to 1 per core and the top command cou

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS 7.6.1810
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
TensorFlow installed from (source or binary): https://www.tensorflow.org/install/lang_c
TensorFlow version (use command below): TensorFlow C API 1.15.0
Python version:3.6
Bazel version (if compiling from source):No
GCC/Compiler version (if compiling from source):4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)
CUDA/cuDNN version:NO
GPU model and memory:No

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior TensorFlow C API is generating multiple threads, at least one thread on each of the available CPUs.

Describe the expected behavior One and only one thread for tensorflow, no matter how many cpus, cores or sockets.

Standalone code to reproduce the issue Graph = TF_NewGraph(); Status = TF_NewStatus(); SessionOpts = TF_NewSessionOptions();

// limit number of threads
uint8_t intra_op_parallelism_threads = 1; uint8_t inter_op_parallelism_threads = 1; uint8_t device_count = 1;
uint8_t config[15] = {0xa, 0x7, 0xa, 0x3, 0x43, 0x50, 0x55, 0x10, device_count, 0x10, intra_op_parallelism_threads, 0x28, intra_op_parallelism_threads,0x40, 0x1}; TF_SetConfig(SessionOpts, (void*)config, 13, Status);

if (TF_GetCode(Status)!= TF_OK) std::cout << "\nERROR: " << TF_Message(Status); RunOpts = NULL;

// load model Session = TF_LoadSessionFromSavedModel(SessionOpts, RunOpts, saved_model_dir, &tags, ntags, Graph, NULL, Status); if( TF_GetCode(Status) != TF_OK ) { std::cout << “\nERROR: Failed to load SavedModel.” << TF_Message(Status); return -1; } assert( TF_GetCode(Status) == TF_OK);

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 18 (7 by maintainers)

Links to this issue

python - Keras/TF CPU creating too many threads - Stack Overflow

Most upvoted comments

Maybe some implicit behavior created more threads than expected: I got 22 threads in my below case when set inter/intra to 1, and it will increase to 40 if I set each of them to 10. These 2 options are working well to control work thread number, but TF has created other threads and showed confusing information in top.

My cmd

numactl -C 0-7 -l python conv.py

And I got

Total number of threads: 
22

After increasing inter/intra to 10, I got

Total number of threads: 
40

My case conv.py:

import tensorflow as tf
import os

# reduce number of threads
os.environ['TF_NUM_INTEROP_THREADS'] = '1' 
os.environ['TF_NUM_INTRAOP_THREADS'] = '1' 

print('\n\nTotal number of threads: ')
os.system( "top -H -b -n1 | grep python | wc -l")

def my_model():
  layer_input = tf.random.uniform((1, 2, 3, 3), minval=0, maxval=100, dtype=tf.float32)
  weights = tf.random.uniform((1, 2, 3, 3), minval=0, maxval=100, dtype=tf.float32)

  with tf.compat.v1.variable_scope("output_1"):
    outputs_1 = tf.nn.conv2d(
        input=layer_input, filters=weights, strides=1,
        # padding=[[0, 0], [1, 1], [1, 1], [0, 0]])
        padding='SAME')
    outputs_1 = tf.nn.bias_add(outputs_1, tf.constant([1, 1, 1], dtype=tf.float32))

  return outputs_1

with tf.compat.v1.Session() as sess:
  loss = my_model()
  print('\n\nTotal number of threads: ')
  os.system( "top -H -b -n1 | grep python | wc -l")
  init = tf.compat.v1.initialize_all_variables()
  sess.run(init)
  sess.run((loss))
  print('\n\nTotal number of threads: ')
  os.system( "top -H -b -n1 | grep python | wc -l")

Zantares on Oct 13, 2020

@Zantares @roywei Do you agree with the reason given at the following link? It explains that TensorFlow cannot be a single threaded. https://stackoverflow.com/questions/48696900/why-tensorflow-creates-so-many-cpu-threads

Not exactly, this answer said that GPU has CUDA runtime and GRPC engine to generate multi-threads, but as I said CPU can be executed as “single thread” mode.

You can use “single thread” mode on CPU with python API definitely. but it may become complex with C API. Because I didn’t have your dataset and can’t run your case directly, I don’t know what the issue is in your case. You can roughly modified TF source code to set all thread number to 1 if have no other good solution.

Zantares on Oct 12, 2020