tensorflow: Loading model in Android and No OpKernel was registered to support Op error

I encountered a problem when using a self-trained face-recognition model to make inference on android platform (using c++ api, just like the android demo). The error says something like this:

06-05 16:25:11.322 28605-28605/jp.narr.tensorflowmnist I/native: tensorflow_jni.cc:196 End computing.
06-05 16:25:11.322 28605-28605/jp.narr.tensorflowmnist E/native: tensorflow_jni.cc:199 Error during inference: Invalid argument: No OpKernel was registered to support Op 'Inv' with these attrs
                                                                      [[Node: incept5b/in4_conv1x1_55/batch_norm/moments/moments_1/divisor = Inv[T=DT_FLOAT](incept5b/in4_conv1x1_55/batch_norm/moments/moments/Const)]]
06-05 16:25:11.322 28605-28605/jp.narr.tensorflowmnist A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x10 in tid 28605 (tensorflowmnist)
06-05 16:25:11.423 186-186/? I/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** 

It is similar to the issue #1269

I don’t understand why it causes an error? All the other layers ( from incept3a to incept5a) have almost the same structures, but there’s no error…

Could anyone give me some advice? Thanks a lot!

The structure of the model I use is like this:


def inference_nn4_max_pool_96(images, pool_type, use_lrn, keep_probability, phase_train=True):
  conv1 = _conv(images, 3, 64, 7, 7, 2, 2, 'SAME', 'conv1_7x7', phase_train=phase_train, use_batch_norm=True)
  pool1 = _mpool(conv1,  3, 3, 2, 2, 'SAME')
  if use_lrn:
    lrn1 = tf.nn.local_response_normalization(pool1, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
  else:
    lrn1 = pool1
  conv2 = _conv(lrn1,  64, 64, 1, 1, 1, 1, 'SAME', 'conv2_1x1', phase_train=phase_train, use_batch_norm=True)
  conv3 = _conv(conv2,  64, 192, 3, 3, 1, 1, 'SAME', 'conv3_3x3', phase_train=phase_train, use_batch_norm=True)
  if use_lrn:
    lrn2 = tf.nn.local_response_normalization(conv3, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
  else:
    lrn2 = conv3
  pool3 = _mpool(lrn2,  3, 3, 2, 2, 'SAME')

  incept3a = _inception(pool3,    192, 1, 64, 96, 128, 16, 32, 3, 32, 1, 'MAX', 'incept3a', phase_train=phase_train, use_batch_norm=True)
  incept3b = _inception(incept3a, 256, 1, 64, 96, 128, 32, 64, 3, 64, 1, pool_type, 'incept3b', phase_train=phase_train, use_batch_norm=True)
  incept3c = _inception(incept3b, 320, 2, 0, 128, 256, 32, 64, 3, 0, 2, 'MAX', 'incept3c', phase_train=phase_train, use_batch_norm=True)

  incept4a = _inception(incept3c, 640, 1, 256, 96, 192, 32, 64, 3, 128, 1, pool_type, 'incept4a', phase_train=phase_train, use_batch_norm=True)
  incept4b = _inception(incept4a, 640, 1, 224, 112, 224, 32, 64, 3, 128, 1, pool_type, 'incept4b', phase_train=phase_train, use_batch_norm=True)
  incept4c = _inception(incept4b, 640, 1, 192, 128, 256, 32, 64, 3, 128, 1, pool_type, 'incept4c', phase_train=phase_train, use_batch_norm=True)
  incept4d = _inception(incept4c, 640, 1, 160, 144, 288, 32, 64, 3, 128, 1, pool_type, 'incept4d', phase_train=phase_train, use_batch_norm=True)
  incept4e = _inception(incept4d, 640, 2, 0, 160, 256, 64, 128, 3, 0, 2, 'MAX', 'incept4e', phase_train=phase_train, use_batch_norm=True)

  incept5a = _inception(incept4e,    1024, 1, 384, 192, 384, 0, 0, 3, 128, 1, pool_type, 'incept5a', phase_train=phase_train, use_batch_norm=True)
  incept5b = _inception(incept5a, 896, 1, 384, 192, 384, 0, 0, 3, 128, 1, 'MAX', 'incept5b', phase_train=phase_train, use_batch_norm=True)
  pool6 = _apool(incept5b,  3, 3, 1, 1, 'VALID')

  resh1 = tf.reshape(pool6, [-1, 896])
  affn1 = _affine(resh1, 896, 128)
  if keep_probability<1.0:
    affn1 = control_flow_ops.cond(phase_train,
                                  lambda: tf.nn.dropout(affn1, keep_probability), lambda: affn1)
  norm = tf.nn.l2_normalize(affn1, 1, 1e-10, name='embeddings')

  return norm

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 29 (9 by maintainers)

Most upvoted comments

@Lucky94 , same to u . i solved this issue on android by : change only the TF_CALL_bool macros at lines L#98 and L#125 only and not at any other places.

your name made me lucky , love u guys~!

Hi, @Lucky94 . Since I change the TF_CALL_bool macros and build a .so file to use in Android. Error likes "Could not create TensorFlow Session: Invalid argument: No OpKernel was registered to support Op ‘Switch’ with these attrs. " were never occured again. The model I use also have to feed phase_train as an input, it work. So, maybe you can check if you does replace the newer .so file in you Android Application.

@Lucky94: I could not get it working with the suggested edits to register_types.h Eventually I was able to get my hands on a .pb file with the problematic nodes removed, but I realize that doesn’t help you much.

Hopefully somebody will be able to provide an alternative solution soon. Sorry!

Can you be more specific about implementing this fix? Because it is not working for me. I’ve tried changing: line 94: #define TF_CALL_string(m) to #define TF_CALL_string(m) m(string)

Still get the same “No OpKernel was registered to support Op ‘Switch’” error.

I’ve also tried changing: line 98: #define TF_CALL_bool(m) to #define TF_CALL_bool(m) m(bool) To no avail…

@leftstone2015 in your comment you said add m(string) to the TF_CALL_string(m) macro, but you did not specify which one. There is a second one on line 121, but when I try to add m(string) to that one my bazel build fails with:

tensorflow/core/kernels/concat_lib_cpu.cc:68:44: error: duplicate explicit instantiation of ‘void tensorflow::ConcatCPU(tensorflow::DeviceBase*, const std::vector<std::unique_ptr<typename tensorflow::TTypes<T, 2>::ConstMatrix> >&, typename tensorflow::TTypes<T, 2>::Matrix*) [with T = std::basic_string<char>; typename tensorflow::TTypes<T, 2>::ConstMatrix = Eigen::TensorMap<Eigen::Tensor<const std::basic_string<char>, 2, 1, int>, 16, Eigen::MakePointer>; typename tensorflow::TTypes<T, 2>::Matrix = Eigen::TensorMap<Eigen::Tensor<std::basic_string<char>, 2, 1, int>, 16, Eigen::MakePointer>]’ [-fpermissive] typename TTypes<T, 2>::Matrix* output); ^ tensorflow/core/kernels/concat_lib_cpu.cc:79:1: note: in expansion of macro ‘REGISTER’ REGISTER(string);

I have been struggling with this bug for a while now, and from what I’ve read in similar posts, the only solution is to freeze your model without is_training. Unfortunately this is not an option for me as I do not have any way of modifying my graph.pb file. So any help would be appreciated!

It’s been fixed. Solutions are as follows: check the file control_flow_ops.cc where ‘Switch’ registered. TF_CALL_ALL_TYPES are called to register ‘Switch’. Then TF_CALL_ALL_TYPES(m) calls TF_CALL_string(m), while TF_CALL_string(m) will not actually called in Mobile platforms. So modify TF_CALL_string(m) to call m(string) for all platforms in Register_types.h (tensorflow\core\framework),