keras: Accuracy, fmeasure, precision, and recall all the same for binary classification problem (cut and paste example provided)
keras 1.2.2, tf-gpu -.12.1
Example code to show issue:
'''Trains a simple convnet on the MNIST dataset.
Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''
#from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
batch_size = 128
nb_classes = 10
nb_epoch = 12
# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# make 2 categories
y_train = y_train>=5
y_test = y_test>=5
if K.image_dim_ordering() == 'th':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, 2)
Y_test = np_utils.to_categorical(y_test, 2)
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy', 'f1score', 'precision', 'recall'])
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
yields output:
Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN Black
major: 3 minor: 5 memoryClockRate (GHz) 0.98
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 5.85GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:01:00.0)
128/60000 [..............................] - ETA: 1686s - loss: 0.7091 - acc: 0.4688 - fmeasure: 0.4687 - precision: 0.4688 - recall: 0.4688
384/60000 [..............................] - ETA: 567s - loss: 0.6981 - acc: 0.4922 - fmeasure: 0.4922 - precision: 0.4922 - recall: 0.4922
640/60000 [..............................] - ETA: 343s - loss: 0.6845 - acc: 0.5609 - fmeasure: 0.5609 - precision: 0.5609 - recall: 0.5609
1024/60000 [..............................] - ETA: 217s - loss: 0.6654 - acc: 0.6143 - fmeasure: 0.6143 - precision: 0.6143 - recall: 0.6143
1408/60000 [..............................] - ETA: 159s - loss: 0.6427 - acc: 0.6456 - fmeasure: 0.6456 - precision: 0.6456 - recall: 0.6456
1792/60000 [..............................] - ETA: 126s - loss: 0.6226 - acc: 0.6629 - fmeasure: 0.6629 - precision: 0.6629 - recall: 0.6629
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 16
- Comments: 60 (2 by maintainers)
for those who will come here later, since Keras 2.0 metrics fmeasure, precision, and recall have been removed.
if you want to use them, you can check history of the repo or add this code:
I’ve created a pull request to solve the problem(https://github.com/netrack/keras-metrics/pull/4), I hope it’ll be accepted soon. For those who wanna use custom method, I corrected the unnir’s code as following
Same problem. I customized metrics – precision, recall and F1-measure. The model.fit_generator and model.evaluate_generator also gives the same precision, recall and F1-measure.
keras==2.0.0 on Mac OS Sierra 10.12.4
Epoch 8/10 0s - loss: 0.0269 - binary_accuracy: 0.8320 - f1score: 0.8320 - precision: 0.8320 - recall: 0.8320 Epoch 9/10 0s - loss: 0.0488 - binary_accuracy: 0.6953 - f1score: 0.6953 - precision: 0.6953 - recall: 0.6953 Epoch 10/10 0s - loss: 0.0457 - binary_accuracy: 0.7148 - f1score: 0.7148 - precision: 0.7148 - recall: 0.7148 Start to evaluate. binary_accuracy: 76.06% f1score: 76.06% precision: 76.06% recall: 76.06%
i am also seeing the same scores coming through for custom metrics. the below gave the following output for an epoch:
@nsarafianos Only do this per-batch as the values are reported on a per-batch basis by keras callbacks. Once you’re trained, you can just use
mode.predict
to go over the complete test set and compute your metrics in full.@hbb21st halloo, I had the same problem. In my case it caused by using softmax in binary classification problem with output dimension of 2 ([0,1] or [1,0]). So when I changed the output dimension to 1 ([0] or [1]) with sigmoid activation function, then it worked just fine.
@unnir if i use ‘binary_crossentropy’, the custom precision is correct. but when use ‘categorical_crossentropy’, it has the same problem as what @moming2k said.
Any update yet ?? @unnir
Any update yet ?? @unnir Did you find anything ?
@unnir i did not mean that they do not work; what i was trying to say is that the numbers that i get don’t make much sense to me. i have indeed normalized my data prior to feeding them into the neural network, and i am doing cross-validation to tune hyper-parameters.
The relavant metrics are no longer supported in keras 2.x. Closing for good housekeeping.
EQUALITY PROBLEM
I had exactly ran into the same problem (accuracy, precision, recall are f1score are equal to each other both on the training set and the validation set for a balanced task) with another dataset which made me look into this, which we can call it the EQUALITY PROBLEM.
I use: tensorflow version: 1.13.1 tensorflow keras version: 2.2.4-tf
I have combined all the replies and tried all the codes above, and finally come up with two versions. The first version is to define precison, recall, and f1score as above. The second version is to use the precison, recall, and f1score defined in keras-metrics (which depends on keras).
CONCLUTION:
The following is the results of the first version, when I try “categorical classfication using softmax with one-hot output”, I HAVE EQUALITY PROBLEM. However, when I try “binary classfication using sigmoid with 0-1 vector output”, I DO NOT have EQUALITY PROBLEM.
Here is all my codes
For the “categorical classfication using softmax with one-hot output”, I get the following results, which shows I have the EQUALITY PROBLEM.
For the “binary classfication using sigmoid with 0-1 vector output”, I get the following results, which shows I DO NOT have the EQUALITY PROBLEM.
I find it very interesting, but I don’t know why, can anyone explain why this happens? Thank you!
I got it. I’ve tried a binary classification on google servers. It is all about how many units one has on the last layer. If you have only one, everything is okay but if you have two of them it’s not working. On the other hand, for the binary classification using two units with a softmax activation function(probably that’s what you do as well) is often suggested for a better convergence as far as I know. You can check my code below, I will create a post under this keras-vis library’s issue. https://colab.research.google.com/drive/1lmQ-hWcN4tsGMicd4dKnSjeTD-BdgJuE Best
I have the same issue which is having the same results for the custom metrics for the binary classification on an unbalanced data and I am very positive that I there is nothing wrong in the model. Looks like best way is to use the keras metrics rather than implementing it on the backend. Let me know if any of you understands what’s wrong here
@baharian I guess it has nothing to do with metrics. Do you have the result for the loss too?
Did you runt he code I provided?
metrics.py is a month old. I just did the pypy pull to get keras 1.2.2. I can’t see how that can be the issue.