tensorflow: Error message for running tf.nn.max_pool_with_argmax() on CPU
Running tf.nn.max_pool_with_argmax() on CPU gives a very obscure error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'MaxPoolWithArgmax' with these attrs. Registered devices: [CPU], Registered kernels: <no registered kernels>
From this line:
I think it’s useful to mention tf.nn.max_pool_with_argmax() is only implemented for GPU instead.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 2
- Comments: 56 (7 by maintainers)
Is there an alternative for tf.nn.max_pool_with_argmax? Because now I only have CPUs.
Is there an alternative for tf.nn.max_pool_with_argmax? Because now I only want to test my trained model on CPU env.
Yes, would be great to have tf.nn.max_pool_with_argmax working on CPU.
This ops seems to be very useful for fast encoder-decoder architectures (I see it two segmentation networks – ENet and SegNet) which were designed for ‘real time’ image segmentation. So supposedly people look at performance with a view of running these networks on CPUs or mobile.
@tfboyd any chance to have it available also on CPU? I’d like to contribute if I could, but I never worked on tensorflow source code. Should this issue be reopen or should we file a new one?
I was able to make a workaround for CPU. Warning: it is slow, bloated, and basically a last resort. But I can get ENet to run on my cpu at 5s/image… Right now it is only for 2x2 maxpooling, but you could change the numbers and add things to expand upon this example.
becomes
Also, that part of the gpu kernel is on this line. Simply copying that code from the gpu kernel to the cpu kernel may not be so bad for this operation. It is a nice operation to have, and even without optimization, it would probably still be much faster than what I ended up doing 😛
@mmpinso No, if you look at the code, the LaunchMaxPoolingWithArgmax function will call the cuda code to compute maxPoolingWithArgmax on GPU. The problem is the unpooling operation that uses the scatter function and looks like everybody need the unpooling operation. Can somebody maybe implement this operation for CPU and share the code?
Here is a sample in python, we need to create it in CPP:
@DenisN03,hi, cuz ENet seems to be mush faster then segnet on Desktop,if I try to run it on android in real time ,latency is the most important thing. @mmpinso Yor are ahead of me , I haven’t succeeded to run ENet on CPU even on my Desktop . I just see the full maxpooling_op.cc shared by @saeed68gm,there are some differences between mine . I will try more in the next few days ,once there’s some progress, and I’ll be in sync 😃
@tfboyd seems still not working on CPUs. Can you reopen this?
@DenisN03 We applied the solution I mentioned before and we have it running. Inference time is around 3 seconds on a Huawei P10 Lite.
@liangxiao05 did you get the following error? No OpKernel was registered to support Op ‘ScatterNd’ with these attrs.
[[Node: ENet/unpool/ScatterNd = ScatterNd[T=DT_FLOAT, Tindices=DT_INT32](ENet/unpool/transpose, ENet/unpool/Reshape_2, ENet/unpool/ScatterNd/shape)]]
@saeed68gm , could this be because of
Inside Compute() for MaxPoolingGradWithArgmaxOp lines 1069-1070 // SpatialMaxPoolWithArgMaxHelper<CPUDevice, T>( // context, grad_out, &argmax, grad_in, params, padding_);
The intuition behind is that ScatterNd is used by unpool, which uses the pooling indices that had been computed by max_pooling_with_argmax during the downsampling phase. If such indices had not been computed or “wrongly” computed, this might have an impact on the unpool operation using such indices.
@saeed68gm Ok I think I understood the context: I should be using this custom tf version only to obtain the android inference library and run the model on the phone. So I’ll install it in a separate env and keep my usual tf installation in another one for the trainings and everything else. I reviewed the config accordingly and now it compiles. ( @liangxiao05 I haven’t forgot regarding the inference time )
@mmpinso, @liangxiao05 why do you run enet instead of segnet? Did you run enet on cpu?
@saeed68gm,thanks,I will follow your instructions next. @mmpinso ,same with you,I’m doing work with ENet on Android. So how’s the latency you run it on your mobile ?