addons: tfa.activations.mish doesn't work in Keras
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows 10
- TensorFlow version and how it was installed (source or binary): 2.1
- TensorFlow-Addons version and how it was installed (source or binary): 7.1, pip installed
- Python version: 3.7
- Is GPU used? (yes/no): yes
Describe the bug
when using tfa.activations.mish in keras , training halt at begining.
Train for 353 steps, validate for 40 steps
Learning rate: 0.001 Epoch 1/60 10/353 […] - ETA: 2:58:34 - loss: 8.9578 - dense_1_loss: 4.9835 - dense_2_loss: 2.2109 - dense_3_loss: 1.7634 - dense_1_accuracy: 0.0195 - dense_2_accuracy: 0.1937 - dense_3_accuracy: 0.4625
Code to reproduce the issue
import tensorflow as tf
from tensorflow import keras
import tensorflow_addons as tfa
from tensorflow.keras.layers import Dense,Conv2D,Flatten,MaxPool2D,Dropout,BatchNormalization, Input,Activation,AveragePooling2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
class Mish(Activation):
def __init__(self, activation, **kwargs):
super(Mish, self).__init__(activation, **kwargs)
self.__name__ = 'Mish'
get_custom_objects().update({'Mish': Mish(tfa.activations.mish)})
def resnet_layer(inputs,
num_filters=16,
kernel_size=3,
strides=1,
activation='Mish',
batch_normalization=True,
conv_first=True):
"""2D Convolution-Batch Normalization-Activation stack builder
# Arguments
inputs (tensor): input tensor from input image or previous layer
num_filters (int): Conv2D number of filters
kernel_size (int): Conv2D square kernel dimensions
strides (int): Conv2D square stride dimensions
activation (string): activation name
batch_normalization (bool): whether to include batch normalization
conv_first (bool): conv-bn-activation (True) or
bn-activation-conv (False)
# Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(num_filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=None)#l2(1e-4))# change to Weight decay
x = inputs
if conv_first:
x = conv(x)
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
else:
if batch_normalization:
x = BatchNormalization()(x)
if activation is not None:
x = Activation(activation)(x)
x = conv(x)
return x
def resnet_v2(input_shape, depth, num_classes=10):
"""ResNet Version 2 Model builder [b]
Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
bottleneck layer
First shortcut connection per layer is 1 x 1 Conv2D.
Second and onwards shortcut connection is identity.
At the beginning of each stage, the feature map size is halved (downsampled)
by a convolutional layer with strides=2, while the number of filter maps is
doubled. Within each stage, the layers have the same number filters and the
same filter map sizes.
Features maps sizes:
conv1 : 32x32, 16
stage 0: 32x32, 64
stage 1: 16x16, 128
stage 2: 8x8, 256
# Arguments
input_shape (tensor): shape of input image tensor
depth (int): number of core convolutional layers
num_classes (int): number of classes (CIFAR10 has 10)
# Returns
model (Model): Keras model instance
"""
if (depth - 2) % 9 != 0:
raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
# Start model definition.
num_filters_in = 32
num_res_blocks = int((depth - 2) / 9)
inputs = Input(shape=input_shape)
# v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
x = resnet_layer(inputs=inputs,
num_filters=num_filters_in,
conv_first=True,
kernel_size=5,
strides=2)
# Instantiate the stack of residual units
for stage in range(3):
for res_block in range(num_res_blocks):
activation = 'Mish'
batch_normalization = True
strides = 1
if stage == 0:
num_filters_out = num_filters_in * 4
if res_block == 0: # first layer and first stage
activation = None
batch_normalization = False
else:
num_filters_out = num_filters_in * 2
if res_block == 0: # first layer but not first stage
strides = 2 # downsample
# bottleneck residual unit
y = resnet_layer(inputs=x,
num_filters=num_filters_in,
kernel_size=1,
strides=strides,
activation=activation,
batch_normalization=batch_normalization,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_in,
conv_first=False)
y = resnet_layer(inputs=y,
num_filters=num_filters_out,
kernel_size=1,
conv_first=False)
if res_block == 0:
# linear projection residual shortcut connection to match
# changed dims
x = resnet_layer(inputs=x,
num_filters=num_filters_out,
kernel_size=1,
strides=strides,
activation=None,
batch_normalization=False)
x = keras.layers.add([x, y])
num_filters_in = num_filters_out
# Add classifier on top.
# v2 has BN-ReLU before Pooling
x = BatchNormalization()(x)
x = Activation('Mish')(x)
x = AveragePooling2D(pool_size=8)(x)
#x = keras.layers.GlobalAveragePooling2D()(x)
y = Flatten()(x)
y = Dense(512,activation = "Mish",kernel_initializer='he_normal')(y)
out = Dense(168, activation = 'softmax',kernel_initializer='he_normal',dtype='float32',name = "dense_1")(y)
# Instantiate model.
model = Model(inputs=inputs, outputs=out )
return model
# Model parameter
# ----------------------------------------------------------------------------
# | | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model | n | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
# |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20 | 3 (2)| 92.16 | 91.25 | ----- | ----- | 35 (---)
# ResNet32 | 5(NA)| 92.46 | 92.49 | NA | NA | 50 ( NA)
# ResNet44 | 7(NA)| 92.50 | 92.83 | NA | NA | 70 ( NA)
# ResNet56 | 9 (6)| 92.71 | 93.03 | 93.01 | NA | 90 (100)
# ResNet110 |18(12)| 92.65 | 93.39+-.16 | 93.15 | 93.63 | 165(180)
# ResNet164 |27(18)| ----- | 94.07 | ----- | 94.54 | ---(---)
# ResNet1001| (111)| ----- | 92.39 | ----- | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 2
# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 2
# Computed depth from supplied model parameter n
input_shape = [IMG_SIZE,IMG_SIZE,N_CHANNELS]
depth = n * 9 + 2
model_type = 'ResNet%dv%d' % (depth, version)
# In[ ]:
model = resnet_v2(input_shape=input_shape, depth=depth)
Other info / logs when using Activation(‘Addons>mish’) ,I have the same problem, training halted at beginning.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (7 by maintainers)
Hi @willbattel, no, it is still available but the implementation is going to be pure python ops (probably in Addons 0.12).
I mean the program code. 😄 Although the remote access to the isolated virtual machine with the full-blown model deployed would be even better.
cc: @digantamisra98