PINTO_model_zoo: Request: FaceMesh-with-Attention model conversion (unsupported custom ops)

MediaPipe has released a new FaceMesh-with-Attention model

That’s basically an old FaceMesh model augmented with 3 additional new attention models that refine results, all inside single TFlite model:

I’ve tried converting it:

tflite2tensorflow --model_path face_landmark_with_attention.tflite --flatc_path ./flatc --schema_path schema.fbs --output_pb

but it fails with

RuntimeError: Encountered unresolved custom op: Landmarks2TransformMatrix.Node number 192 (Landmarks2TransformMatrix) failed to prepare.

It seems that TFLite model is using custom ops to link different execution paths inside it - that is beyond me…

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 31 (22 by maintainers)

Most upvoted comments

fyi, i’ve just added facemesh attention to https://vladmandic.github.io/human,
works nice but it is ~%25 slower than facemesh + iris models combined

i’ve also added keypoint mapping of new attention keypoints back to original mesh keypoints
plus remapping of z-coord since augmented data is 2d only
(in https://github.com/vladmandic/human/blob/main/src/face/attention.ts

anyhow, results from https://github.com/vladmandic/human-motion image

@KenjiAsaba @mayerjTNG Thank you for your cooperation. All have been merged into the main branch. 👍

https://github.com/PINTO0309/tflite2tensorflow

I completed the inplementation of the custom ops. Here is a demo video. It works, but the performance is not very good… Inference took 50ms by CUDA on RTX3070 using ONNX Runtime.

@mayerjTNG @KenjiAsaba Thanks so much for all your efforts, you guys are very much appreciated. I will first try to compile a custom OP for TensorFlow Lite by merging it into a .whl.

https://github.com/PINTO0309/TensorflowLite-bin#2-tensorflow-v230-version-or-later https://github.com/PINTO0309/Tensorflow-bin#build-parameter

After successfully building the .whl, I will read the .tflite of the MediaPipe containing the custom OP with tflite2tensorflow and try to convert it to a semi-standard OP. 😄

Hi @mayerjTNG , I’ ve been working on the same topic for a while, and just yesterday I managed to compile tflite with the custom operators.

My source code of tensorflow can be found in this branch: https://github.com/KenjiAsaba/tensorflow/tree/mediapipe_20220320_customOp My compiled tflite can be found here: https://github.com/KenjiAsaba/tensorflow/releases/tag/v2.8.0-withMediaPipeCustomOp

Hi, @wwdok “actual.bmp” is a 192x192 pixel color image I used as input to the original model to generate the test landmark data at line 49. For your test, you can use any 192x192 pixel image, and compare the result before and after your modification.

Just tested the model, it runs like a charm! Love it! Great work guys! 😃 Again thank you very much for your help, I hope i’ll be able to be of more use and repay the favour next time! Take care! image

I also successfully loaded face_landmark_with_attention.tflite.  image

The process of applying the MediaPipe patch is a bit laborious. 💦

Hi @PINTO0309, @mayerjTNG I am also trying to implement the custom ops. It is still a work in progress, but here is my code just for your information. So far, I have successfully output saved_model.pb and onnx but not tested them yet.

Hi @PINTO0309, the issue here seems to be that my implementation expects a 3x3 (2D) transformation while the mediapipe implementation uses 4x4 (3D) transformation matrices (most likely for compatibility). In the original implementation they then discard the last two rows and do the matrix multiplication via individual dot products. A work-around for this is to slice away the 3rd dimension and ignore sheering transformations like this:

def transform_landmarks_2d(landmarks, transformation_3d):

    transformation_2d = transformation_3d[...,:2,:3]

    landmarks_xy, landmarks_residual = landmarks[..., :2], landmarks[..., 2:]
    landmarks_xyw = tf.pad(landmarks_xy, [[0, 0], [0, 0], [0, 1]], constant_values=1)

    number_of_landmarks = tf.shape(landmarks_xy)[-2]
    broadcasted_matrix = tf.repeat(
        tf.expand_dims(transformation_2d, axis=1),
        number_of_landmarks, axis=1
    )
    transformed_landmarks_xyw = tf.reshape(
        tf.matmul(
            broadcasted_matrix,
            tf.expand_dims(landmarks_xyw, axis=-1)
        ),
        (-1, number_of_landmarks, 2)
    )
    transformed_landmarks_xy = transformed_landmarks_xyw[..., :2]
    return tf.concat([transformed_landmarks_xy, landmarks_residual], axis=-1)

Using the function above gives me following output:

$ python test.py 
2022-03-25 14:42:05.424769: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model: "model"
__________________________________________________________________________________________________
Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)           [(1, 80, 2)]         0           []                               
                                                                                                 
input_2 (InputLayer)           [(1, 4, 4)]          0           []                               
                                                                                                 
tf.__operators__.getitem_1 (Sl  (1, 80, 2)          0           ['input_1[0][0]']                
icingOpLambda)                                                                                   
                                                                                                 
tf.__operators__.getitem (Slic  (1, 2, 3)           0           ['input_2[0][0]']                
ingOpLambda)                                                                                     
                                                                                                 
tf.compat.v1.shape (TFOpLambda  (3,)                0           ['tf.__operators__.getitem_1[0][0
)                                                               ]']                              
                                                                                                 
tf.expand_dims (TFOpLambda)    (1, 1, 2, 3)         0           ['tf.__operators__.getitem[0][0]'
                                                                ]                                
                                                                                                 
tf.__operators__.getitem_3 (Sl  ()                  0           ['tf.compat.v1.shape[0][0]']     
icingOpLambda)                                                                                   
                                                                                                 
tf.compat.v1.pad (TFOpLambda)  (1, 80, 3)           0           ['tf.__operators__.getitem_1[0][0
                                                                ]']                              
                                                                                                 
tf.repeat (TFOpLambda)         (1, 80, 2, 3)        0           ['tf.expand_dims[0][0]',         
                                                                 'tf.__operators__.getitem_3[0][0
                                                                ]']                              
                                                                                                 
tf.expand_dims_1 (TFOpLambda)  (1, 80, 3, 1)        0           ['tf.compat.v1.pad[0][0]']       
                                                                                                 
tf.linalg.matmul (TFOpLambda)  (1, 80, 2, 1)        0           ['tf.repeat[0][0]',              
                                                                 'tf.expand_dims_1[0][0]']       
                                                                                                 
tf.reshape (TFOpLambda)        (1, 80, 2)           0           ['tf.linalg.matmul[0][0]',       
                                                                 'tf.__operators__.getitem_3[0][0
                                                                ]']                              
                                                                                                 
tf.__operators__.getitem_4 (Sl  (1, 80, 2)          0           ['tf.reshape[0][0]']             
icingOpLambda)                                                                                   
                                                                                                 
tf.__operators__.getitem_2 (Sl  (1, 80, 0)          0           ['input_1[0][0]']                
icingOpLambda)                                                                                   
                                                                                                 
tf.concat (TFOpLambda)         (1, 80, 2)           0           ['tf.__operators__.getitem_4[0][0
                                                                ]',                              
                                                                 'tf.__operators__.getitem_2[0][0
                                                                ]']                              
                                                                                                 
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________

Which looks about right. This solution still has a lot of room for improvement though.

As mentioned previously, my suggestions were pretty much taken from our current code base with no attempt to adjust it to the exact input specs of the mediapipe ops just yet. I also honestly didn’t expect you guys to come up with a solution this quick, so again, you are the best. 😃 I’ll happily whip up some clean implementations for the three missing layers but unfortunately i’m afk from my proper workstation for the weekend and won’t be able to work properly on this until monday. 😃

Hi, first of all, thanks @PINTO0309 for all the work you’ve done for the community, you’ve been a real life saviour more than once to me! 😃 I’m currently attempting to build something similar to mediapipe holistic in pure tensorflow so that it will run properly on desktop GPUs. During this, i’ve done a bunch of reverse engineering of mediapipe functionality and reimplemented a bunch of stuff that comes awfully close to the missing operations in question. I’ve also found that the standard face mesh w/o attention just won’t do the cut for me in terms of quality. I found this issue a couple of weeks ago while trying to convert the tflite model. While this is obviously not great news for me, i was still determined to get it to work. First, to compile tflite with the custom ops and then write custom layers for the ops and do trial and error until it finally converts properly. I followed your tutorial trying to add the three layers to tflite. I’ve been trying for a bit over a week now, getting the ops to compile and registering them to the runtime but no matter how i tried to achieve this, the final .whl still seems to be missing the custom ops. The issue here is that i don’t really know too much about what i’m doing and i’m starting to think about tossing the towel on this one, failing to do the supposed “easy” part of integrating existing layers from mediapipe back into tensorflow. So my question is: Is there any chance, that you could compile a version of tensorflow/tfliteruntime that supports the custom ops? I recon that you would know quite a lot about all of this at this point. 😃 I’d be more than happy to take it from there and file a PR with the custom layer implementations in tensorflow. To show you that i’m not bullshitting, here’s a draft for the layers:

transform_landmarks

    def transform_landmarks_2d(landmarks, transformation):
        landmarks_xy, landmarks_residual = landmarks[..., :2], landmarks[..., 2:]
        landmarks_xyw = tf.pad(landmarks_xy, [[0, 0], [0, 0], [0, 1]], constant_values=1)
        
        number_of_landmarks = tf.shape(landmarks_xyw)[-2]
        broadcasted_matrix = tf.repeat(tf.expand_dims(transformation, axis=1), number_of_landmarks, axis=1)
        
        transformed_landmarks_xyw = tf.reshape(tf.matmul(broadcasted_matrix, tf.expand_dims(landmarks_xyw, axis=-1)),
                                               (-1, number_of_landmarks, 3))
                                               
        transformed_landmarks_xy = transformed_landmarks_xyw[..., :2]
        return tf.concat([transformed_landmarks_xy, landmarks_residual], axis=-1)

transform_tensor_bilinear (using tensorflow-addons)

    def crop_image(self, image, resolution):
            crop_transformation = tfa.image.transform_ops.matrices_to_flat_transforms(
                tf.linalg.inv(self.get_crop_matrix(resolution)))
            batch_dimension = tf.shape(crop_transformation)[:-1]
            
            broadcasted_image = tf.broadcast_to(image,
                                                tf.concat([batch_dimension, tf.shape(image)[-3:]], axis=-1))

            cropped_image = tfa.image.transform(
                broadcasted_image,
                crop_transformation,
                interpolation="BILINEAR",
                output_shape=resolution,
                fill_mode="nearest",
            )
            return cropped_image

landmarks_to_transform_matrix This one is interesting, it first estimates an AA bounding box of the landmarks, the rotates it and applies some scale to it and computes a corresponding transformation matrix from it. I have a couple of 100locs that do all of this but i don’t really see the benefit of posting it here in its rudimentary state.

i though so too - thanks for confirming. i’m closing this request.