keras: Can't use auc_roc_score (or any other metric function based on numpy) as a custom metric?

#1732 says we can’t directly optimise auc as it is not differentiable. Then can’t we have it only as a metric?

I just hoped this might work (without proper understanding…), but have an error message in compilation.

from sklearn import metrics
from keras import backend as K

def auc(y_true, y_pred):
    return metrics.roc_auc_score(K.eval(y_true), K.eval(y_pred))

and the error is…

  File "/Users/gnu/Dropbox/codes/msd_tagging/my_keras_model_essence.py", line 66, in build_convnet_model
    model.compile(loss=loss_function, optimizer=optimiser, metrics=metrics)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/keras/models.py", line 343, in compile
    **kwargs)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/keras/engine/training.py", line 642, in compile
    self.metrics.append(metric_fn(y_true, y_pred))
  File "/Users/gnu/Dropbox/codes/msd_tagging/my_metrics.py", line 9, in auc
    return metrics.roc_auc_score(K.eval(y_true), K.eval(y_pred))
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 71, in eval
    return x.eval()
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/gof/graph.py", line 520, in eval
    self._fn_cache[inputs] = theano.function(inputs, self)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/compile/function.py", line 320, in function
    output_keys=output_keys)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 479, in pfunc
    output_keys=output_keys)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 1776, in orig_function
    output_keys=output_keys).create(
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 1428, in __init__
    accept_inplace)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 177, in std_fgraph
    update_mapping=update_mapping)
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/gof/fg.py", line 171, in __init__
    self.__import_r__(output, reason="init")
  File "/Users/gnu/anaconda/lib/python2.7/site-packages/theano/gof/fg.py", line 367, in __import_r__
    raise MissingInputError("Undeclared input", variable)
theano.gof.fg.MissingInputError: ('Undeclared input', dense_1_target)

The error is from this line.

I would be convenient if numpy-based metric function can be used – if it is not possible now. I am computing it after each iteration, but then there is redundant predictions and I can’t take advantage of callbacks.

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 1
Comments: 36 (6 by maintainers)

Most upvoted comments

Metrics functions must be symbolic functions (built with the Keras backend, or with Theano/TensorFlow).

Also ROC AUC is not a metric that be accumulated in mini-batches, it has to be computed for all the data at once.

The right thing to do is to run predictions on all of your test data at the end of an epoch, then run the sklearn function on your predictions, and display the result. You can do this in a callback.

+41

fchollet on Jul 15, 2016

#-----------------------------------------------------------------------------------------------------------------------------------------------------
# AUC for a binary classifier
def auc(y_true, y_pred):   
    ptas = tf.stack([binary_PTA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
    pfas = tf.stack([binary_PFA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
    pfas = tf.concat([tf.ones((1,)) ,pfas],axis=0)
    binSizes = -(pfas[1:]-pfas[:-1])
    s = ptas*binSizes
    return K.sum(s, axis=0)

#-----------------------------------------------------------------------------------------------------------------------------------------------------
# PFA, prob false alert for binary classifier
def binary_PFA(y_true, y_pred, threshold=K.variable(value=0.5)):
    y_pred = K.cast(y_pred >= threshold, 'float32')
    # N = total number of negative labels
    N = K.sum(1 - y_true)
    # FP = total number of false alerts, alerts from the negative class labels
    FP = K.sum(y_pred - y_pred * y_true)    
    return FP/N
#-----------------------------------------------------------------------------------------------------------------------------------------------------
# P_TA prob true alerts for binary classifier
def binary_PTA(y_true, y_pred, threshold=K.variable(value=0.5)):
    y_pred = K.cast(y_pred >= threshold, 'float32')
    # P = total number of positive labels
    P = K.sum(y_true)
    # TP = total number of correct alerts, alerts from the positive class labels
    TP = K.sum(y_pred * y_true)    
    return TP/P

+27

isaacgerg on Apr 7, 2017

class roc_callback(keras.callbacks.Callback):
    def __init__(self,training_data,validation_data):
        
        self.x = training_data[0]
        self.y = training_data[1]
        self.x_val = validation_data[0]
        self.y_val = validation_data[1]
        
    
    def on_train_begin(self, logs={}):
        return
 
    def on_train_end(self, logs={}):
        return
 
    def on_epoch_begin(self, epoch, logs={}):
        return
 
    def on_epoch_end(self, epoch, logs={}):        
        y_pred = self.model.predict(self.x)
        roc = roc_auc_score(self.y, y_pred)      
        
        y_pred_val = self.model.predict(self.x_val)
        roc_val = roc_auc_score(self.y_val, y_pred_val)      
        
        print('\rroc-auc: %s - roc-auc_val: %s' % (str(round(roc,4)),str(round(roc_val,4))),end=100*' '+'\n')
        return
 
    def on_batch_begin(self, batch, logs={}):
        return
 
    def on_batch_end(self, batch, logs={}):
        return   


callbacks=[roc_callback(training_data=training_data,validation_data=validation_data)]

+22

jamartinh on Jul 31, 2017

@JoshuaC3 The way to make @jamartinh 's solution to work with fit_generator is by making these changes:

In __init__, you should pass in the train and val generators instead.
In on_epoch_end, replace model.predict with model.predict_generator.

Here’s a sketch where only the AUROC of the val dataset is calculated at the end of every epoch:

class roc_callback(keras.callbacks.Callback):
    def __init__(self, val_gen):
        self.val_gen = val_gen
        self.val_reports = []

    def on_epoch_end(self, epoch, logs={}):        
        y_pred = self.model.predict_generator(val_gen, .....)
        y_true = self.val_gen.y
        val_roc = roc_auc_score(y_true , y_pred)
        self.val_reports.append(val_roc)

ShiangYong on Jul 23, 2018

@fchollet

Also ROC AUC is not a metric that be accumulated in mini-batches, it has to be computed for all the data at once.

But its reasonable to have this metric computed on the validation set at the end of each epoch.

The right thing to do is to run predictions on all of your test data at the end of an epoch, then run the sklearn function on your predictions, and display the result. You can do this in a callback.

Correct, but the problem is that since its not in the keras metrics form, you don’t get the output to tensorboard. Am I correct?

isaacgerg on Apr 6, 2017

@solensolen are you sure about the metrics method working? I’m using both the methods and the validation AUROC I get from the metrics method increases with each epoch. Towards the end of training, the metrics AUROC is way higher than the callback one (fyi my dataset is pretty imbalanced)

I thought it’s like what @isaacgerg said that it is calculated batch-by-batch and averaged which is why it is so high as some batches may not have the underpopulated class. So I decided to make my validation data the same size as my batch size so that it is calculated in one go and I am still experiencing the aforementioned phenomenon. Here is a sample from the output (batch size is 1000).

def auc_roc(y_true, y_pred):
	value, update_op = tf.metrics.auc(y_true, y_pred)
	K.get_session().run(tf.local_variables_initializer())
	return update_op

Train on 44088 samples, validate on 1000 samples
Epoch 1/15
44088/44088 [==============================] - 3s 78us/step - loss: 0.6872 - acc: 0.5674 - auc_roc: 0.5267 - val_loss: 0.6672 - val_acc: 0.6770 - val_auc_roc: 0.5473

AUC (from callback) - 0.5823
.
.
.
.

Epoch 5/15
44088/44088 [==============================] - 2s 55us/step - loss: 0.3236 - acc: 0.8919 - auc_roc: 0.9386 - val_loss: 0.5749 - val_acc: 0.7430 - val_auc_roc: 0.9279

AUC (from callback) - 0.5860

The val_auc_roc is calculated by passing the auc_roc function to the model.compile method and the AUC (from callback) is the same as the roc_callback class defined in an above post with only validation data AUC calculated. In epoch 1, the values are similar but by further epochs it’s way higher. Can anyone explain why is the val_auc_roc so different from the AUC (from callback)?

momih on Jul 23, 2018

@fchollet

def auc(y_true, y_pred):   
    keras.backend.get_session().run(tf.global_variables_initializer())
    #return K.variable(value=tf.contrib.metrics.streaming_auc(y_pred, y_true)[0], dtype='float32')
    return tf.contrib.metrics.streaming_auc(y_pred, y_true)[0]

I don’t see why this yields, “tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value auc/true_positives…”

isaacgerg on Apr 6, 2017

@jamartinh - how to get this to work then validation_data is a flow_from_directory generator?

I get the error

TypeError: Error when checking model : data should be a Numpy array, or list/dict of Numpy arrays.
Found: (array([[[[ 0.29411766,  0.34117648,  0.4039216 ], ...

JoshuaC3 on Dec 30, 2017

https://www.kaggle.com/rspadim/gini-keras-callback-earlystopping-validation

gini = 2*auc -1

rspadim on Oct 17, 2017

@NickYi1990 by default the batch size is 32, since the roc auc must use all the validation data to be correct, you need to give a batch size equal to the size of you validation set. In your case:

model.fit(X_train.values, y_train.values, validation_split=0.2, epochs=1, verbose=1, batch_size=X_train.values.shape[0]/5.)

timpx on Jun 29, 2017

@NickYi1990 I had the same issue - I figured out that some of the thresholds had 0 true positives / false negatives. This led to a division by 0 -> return FP / N A decent solution is: return FP / N -> return FP / (N + 1)

import tensorflow as tf
import keras.backend as K
import numpy as np


# -----------------------------------------------------------------------------------------------------------------------------------------------------
# AUC for a binary classifier
def auc(y_true, y_pred):
    ptas = tf.stack([binary_PTA(y_true, y_pred, k) for k in np.linspace(0, 1, 1000)], axis=0)
    pfas = tf.stack([binary_PFA(y_true, y_pred, k) for k in np.linspace(0, 1, 1000)], axis=0)
    pfas = tf.concat([tf.ones((1,)), pfas], axis=0)
    binSizes = -(pfas[1:] - pfas[:-1])
    s = ptas * binSizes
    return K.sum(s, axis=0)


# -----------------------------------------------------------------------------------------------------------------------------------------------------
# PFA, prob false alert for binary classifier
def binary_PFA(y_true, y_pred, threshold=K.variable(value=0.5)):
    y_pred = K.cast(y_pred >= threshold, 'float32')
    # N = total number of negative labels
    N = K.sum(1 - y_true)
    # FP = total number of false alerts, alerts from the negative class labels
    FP = K.sum(y_pred - y_pred * y_true)
    return FP / (N + 1)


# -----------------------------------------------------------------------------------------------------------------------------------------------------
# P_TA prob true alerts for binary classifier
def binary_PTA(y_true, y_pred, threshold=K.variable(value=0.5)):
    y_pred = K.cast(y_pred >= threshold, 'float32')
    # P = total number of positive labels
    P = K.sum(y_true)
    # TP = total number of correct alerts, alerts from the positive class labels
    TP = K.sum(y_pred * y_true)
    return TP / (P + 1)

danFromTelAviv on Dec 10, 2018

Technically AUC ROC can be calculated on mini-batch as long as we have the y_true and y_pred. The only concern should be that too small batch size will reduce the accuracy of this metric and make it less meaningful. Am I right? @fchollet

nkjsy on May 3, 2018

This works:

def auc(y_true, y_pred):   
    ptas = tf.stack([binary_PTA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
    pfas = tf.stack([binary_PFA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
    pfas = tf.concat([tf.ones((1,)) ,pfas],axis=0)
    binSizes = -(pfas[1:]-pfas[:-1])
    s = ptas*binSizes
    return K.sum(s, axis=0)

isaacgerg on Apr 6, 2017