tensorflow: Loading model in Android and No OpKernel was registered to support Op error

I encountered a problem when using a self-trained face-recognition model to make inference on android platform (using c++ api, just like the android demo). The error says something like this:

06-05 16:25:11.322 28605-28605/jp.narr.tensorflowmnist I/native: tensorflow_jni.cc:196 End computing.
06-05 16:25:11.322 28605-28605/jp.narr.tensorflowmnist E/native: tensorflow_jni.cc:199 Error during inference: Invalid argument: No OpKernel was registered to support Op 'Inv' with these attrs
                                                                      [[Node: incept5b/in4_conv1x1_55/batch_norm/moments/moments_1/divisor = Inv[T=DT_FLOAT](incept5b/in4_conv1x1_55/batch_norm/moments/moments/Const)]]
06-05 16:25:11.322 28605-28605/jp.narr.tensorflowmnist A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x10 in tid 28605 (tensorflowmnist)
06-05 16:25:11.423 186-186/? I/DEBUG: *** *** *** *** *** *** *** *** *** *** *** ***

It is similar to the issue #1269

I don’t understand why it causes an error? All the other layers ( from incept3a to incept5a) have almost the same structures, but there’s no error…

Could anyone give me some advice? Thanks a lot!

The structure of the model I use is like this:


def inference_nn4_max_pool_96(images, pool_type, use_lrn, keep_probability, phase_train=True):
  conv1 = _conv(images, 3, 64, 7, 7, 2, 2, 'SAME', 'conv1_7x7', phase_train=phase_train, use_batch_norm=True)
  pool1 = _mpool(conv1,  3, 3, 2, 2, 'SAME')
  if use_lrn:
    lrn1 = tf.nn.local_response_normalization(pool1, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
  else:
    lrn1 = pool1
  conv2 = _conv(lrn1,  64, 64, 1, 1, 1, 1, 'SAME', 'conv2_1x1', phase_train=phase_train, use_batch_norm=True)
  conv3 = _conv(conv2,  64, 192, 3, 3, 1, 1, 'SAME', 'conv3_3x3', phase_train=phase_train, use_batch_norm=True)
  if use_lrn:
    lrn2 = tf.nn.local_response_normalization(conv3, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
  else:
    lrn2 = conv3
  pool3 = _mpool(lrn2,  3, 3, 2, 2, 'SAME')

  incept3a = _inception(pool3,    192, 1, 64, 96, 128, 16, 32, 3, 32, 1, 'MAX', 'incept3a', phase_train=phase_train, use_batch_norm=True)
  incept3b = _inception(incept3a, 256, 1, 64, 96, 128, 32, 64, 3, 64, 1, pool_type, 'incept3b', phase_train=phase_train, use_batch_norm=True)
  incept3c = _inception(incept3b, 320, 2, 0, 128, 256, 32, 64, 3, 0, 2, 'MAX', 'incept3c', phase_train=phase_train, use_batch_norm=True)

  incept4a = _inception(incept3c, 640, 1, 256, 96, 192, 32, 64, 3, 128, 1, pool_type, 'incept4a', phase_train=phase_train, use_batch_norm=True)
  incept4b = _inception(incept4a, 640, 1, 224, 112, 224, 32, 64, 3, 128, 1, pool_type, 'incept4b', phase_train=phase_train, use_batch_norm=True)
  incept4c = _inception(incept4b, 640, 1, 192, 128, 256, 32, 64, 3, 128, 1, pool_type, 'incept4c', phase_train=phase_train, use_batch_norm=True)
  incept4d = _inception(incept4c, 640, 1, 160, 144, 288, 32, 64, 3, 128, 1, pool_type, 'incept4d', phase_train=phase_train, use_batch_norm=True)
  incept4e = _inception(incept4d, 640, 2, 0, 160, 256, 64, 128, 3, 0, 2, 'MAX', 'incept4e', phase_train=phase_train, use_batch_norm=True)

  incept5a = _inception(incept4e,    1024, 1, 384, 192, 384, 0, 0, 3, 128, 1, pool_type, 'incept5a', phase_train=phase_train, use_batch_norm=True)
  incept5b = _inception(incept5a, 896, 1, 384, 192, 384, 0, 0, 3, 128, 1, 'MAX', 'incept5b', phase_train=phase_train, use_batch_norm=True)
  pool6 = _apool(incept5b,  3, 3, 1, 1, 'VALID')

  resh1 = tf.reshape(pool6, [-1, 896])
  affn1 = _affine(resh1, 896, 128)
  if keep_probability<1.0:
    affn1 = control_flow_ops.cond(phase_train,
                                  lambda: tf.nn.dropout(affn1, keep_probability), lambda: affn1)
  norm = tf.nn.l2_normalize(affn1, 1, 1e-10, name='embeddings')

  return norm

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 29 (9 by maintainers)

Most upvoted comments

@Lucky94 , same to u . i solved this issue on android by : change only the TF_CALL_bool macros at lines L#98 and L#125 only and not at any other places.

your name made me lucky , love u guys~!

jakiechris on Sep 15, 2017

Hi, @Lucky94 . Since I change the TF_CALL_bool macros and build a .so file to use in Android. Error likes "Could not create TensorFlow Session: Invalid argument: No OpKernel was registered to support Op ‘Switch’ with these attrs. " were never occured again. The model I use also have to feed phase_train as an input, it work. So, maybe you can check if you does replace the newer .so file in you Android Application.

nesadiankemo on Aug 4, 2017

@Lucky94: I could not get it working with the suggested edits to register_types.h Eventually I was able to get my hands on a .pb file with the problematic nodes removed, but I realize that doesn’t help you much.

Hopefully somebody will be able to provide an alternative solution soon. Sorry!

Mr-Grieves on Aug 2, 2017

Can you be more specific about implementing this fix? Because it is not working for me. I’ve tried changing: line 94: #define TF_CALL_string(m) to #define TF_CALL_string(m) m(string)

Still get the same “No OpKernel was registered to support Op ‘Switch’” error.

I’ve also tried changing: line 98: #define TF_CALL_bool(m) to #define TF_CALL_bool(m) m(bool) To no avail…

@leftstone2015 in your comment you said add m(string) to the TF_CALL_string(m) macro, but you did not specify which one. There is a second one on line 121, but when I try to add m(string) to that one my bazel build fails with:

tensorflow/core/kernels/concat_lib_cpu.cc:68:44: error: duplicate explicit instantiation of ‘void tensorflow::ConcatCPU(tensorflow::DeviceBase*, const std::vector<std::unique_ptr<typename tensorflow::TTypes<T, 2>::ConstMatrix> >&, typename tensorflow::TTypes<T, 2>::Matrix*) [with T = std::basic_string<char>; typename tensorflow::TTypes<T, 2>::ConstMatrix = Eigen::TensorMap<Eigen::Tensor<const std::basic_string<char>, 2, 1, int>, 16, Eigen::MakePointer>; typename tensorflow::TTypes<T, 2>::Matrix = Eigen::TensorMap<Eigen::Tensor<std::basic_string<char>, 2, 1, int>, 16, Eigen::MakePointer>]’ [-fpermissive] typename TTypes<T, 2>::Matrix* output); ^ tensorflow/core/kernels/concat_lib_cpu.cc:79:1: note: in expansion of macro ‘REGISTER’ REGISTER(string);

I have been struggling with this bug for a while now, and from what I’ve read in similar posts, the only solution is to freeze your model without is_training. Unfortunately this is not an option for me as I do not have any way of modifying my graph.pb file. So any help would be appreciated!

Mr-Grieves on Jun 28, 2017

It’s been fixed. Solutions are as follows: check the file control_flow_ops.cc where ‘Switch’ registered. TF_CALL_ALL_TYPES are called to register ‘Switch’. Then TF_CALL_ALL_TYPES(m) calls TF_CALL_string(m), while TF_CALL_string(m) will not actually called in Mobile platforms. So modify TF_CALL_string(m) to call m(string) for all platforms in Register_types.h (tensorflow\core\framework),

leftstone2015 on Aug 4, 2016