addons: tfa.activations.mish doesn't work in Keras

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows 10
TensorFlow version and how it was installed (source or binary): 2.1
TensorFlow-Addons version and how it was installed (source or binary): 7.1, pip installed
Python version: 3.7
Is GPU used? (yes/no): yes

Describe the bug

when using tfa.activations.mish in keras , training halt at begining.

Train for 353 steps, validate for 40 steps

Learning rate: 0.001 Epoch 1/60 10/353 […] - ETA: 2:58:34 - loss: 8.9578 - dense_1_loss: 4.9835 - dense_2_loss: 2.2109 - dense_3_loss: 1.7634 - dense_1_accuracy: 0.0195 - dense_2_accuracy: 0.1937 - dense_3_accuracy: 0.4625

Code to reproduce the issue

import tensorflow as tf
from tensorflow import keras
import tensorflow_addons as tfa
from tensorflow.keras.layers import Dense,Conv2D,Flatten,MaxPool2D,Dropout,BatchNormalization, Input,Activation,AveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
class Mish(Activation):

    def __init__(self, activation, **kwargs):

        super(Mish, self).__init__(activation, **kwargs)

        self.__name__ = 'Mish'
get_custom_objects().update({'Mish': Mish(tfa.activations.mish)})

def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='Mish',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)

    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=None)#l2(1e-4))# change to Weight decay

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]

    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
    bottleneck layer
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filter maps is
    doubled. Within each stage, the layers have the same number filters and the
    same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
    # Start model definition.
    num_filters_in = 32
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True,
                    kernel_size=5,
                    strides=2)

    # Instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'Mish'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                if res_block == 0:  # first layer and first stage
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                if res_block == 0:  # first layer but not first stage
                    strides = 2    # downsample

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])

        num_filters_in = num_filters_out

    # Add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('Mish')(x)
    x = AveragePooling2D(pool_size=8)(x)
    #x = keras.layers.GlobalAveragePooling2D()(x)
    y = Flatten()(x)
    y = Dense(512,activation = "Mish",kernel_initializer='he_normal')(y)
    
    out = Dense(168, activation = 'softmax',kernel_initializer='he_normal',dtype='float32',name = "dense_1")(y)
    
    # Instantiate model.
    model = Model(inputs=inputs, outputs=out )
    return model

# Model parameter
# ----------------------------------------------------------------------------
#           |      | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model     |  n   | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
#           |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20  | 3 (2)| 92.16     | 91.25     | -----     | -----     | 35 (---)
# ResNet32  | 5(NA)| 92.46     | 92.49     | NA        | NA        | 50 ( NA)
# ResNet44  | 7(NA)| 92.50     | 92.83     | NA        | NA        | 70 ( NA)
# ResNet56  | 9 (6)| 92.71     | 93.03     | 93.01     | NA        | 90 (100)
# ResNet110 |18(12)| 92.65     | 93.39+-.16 | 93.15     | 93.63     | 165(180)
# ResNet164 |27(18)| -----     | 94.07     | -----     | 94.54     | ---(---)
# ResNet1001| (111)| -----     | 92.39     | -----     | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 2

# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 2

# Computed depth from supplied model parameter n
input_shape = [IMG_SIZE,IMG_SIZE,N_CHANNELS]

depth = n * 9 + 2
model_type = 'ResNet%dv%d' % (depth, version)
# In[ ]:
model = resnet_v2(input_shape=input_shape, depth=depth)

Other info / logs when using Activation(‘Addons>mish’) ,I have the same problem, training halted at beginning.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 16 (7 by maintainers)

Most upvoted comments

Hey @seanpmorgan can you link that RFC when it’s available?

Also, just for absolute clarity: what you’re saying is that Mish is not yet available for usage with Keras. Is that correct?

Thanks.

Hi @willbattel, no, it is still available but the implementation is going to be pure python ops (probably in Addons 0.12).

WindQAQ on Jul 20, 2020

sorry for delay, the full code have 10Gdata, so I tried to make a cifar-10 case,with my model. will be back soon

I mean the program code. 😄 Although the remote access to the isolated virtual machine with the full-blown model deployed would be even better.

failure-to-thrive on Feb 13, 2020

cc: @digantamisra98

AakashKumarNain on Feb 11, 2020