tensorflow: Difference in training accuracy and loss using gradientTape vs model.fit with binary_accuracy: A bug?

Hi all,

I am running a training loop using gradientTape which works well, however I am getting different training accuracy metrics when training using the gradientTape loop vs a straight model.fit method. I apologise if this should be a question for stack overflow, however, to the best of my knowledge the parameters are the same and therefore should be producing exactly the same results (or very close at least)… I therefore think there may be a bug and if any one can help me elucidate this i would really appreciate it!

I have prepared a sequential model as follows:

model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=64, input_dim=5078, activation="relu"))
model.add(tf.keras.layers.Dense(units=32, activation="relu"))
model.add(tf.keras.layers.Dense(units=100, activation="relu"))
model.add(tf.keras.layers.Dense(units=24, activation="sigmoid"))

and for the model.fit method, fit as follows:

model.compile(optimizer="Adam", loss="binary_crossentropy", metrics=["acc"])

model.fit(X_train, y_train,
 batch_size=32,
 epochs=100, verbose=1,
 validation_split=0.15,
 shuffle=True)

This works well and produces the following results (please note 100 epochs is overkill and the model overfits, however this is just to keep the same epochs as the as the gradientTape loop, otherwise there would be an early-stopping callback normally…

The model metrics are as follows:

 32/119 [=======>......................] - ETA: 0s - loss: 0.0699 - acc: 0.9753
119/119 [==============================] - 0s 168us/sample - **loss: 0.0668** - acc: **0.9779** - val_loss: **0.2350** - val_acc: **0.9048**

This is the expected behaviour (minus the overfitting)… Now when I create the gradientTape loop as follows, the accuracy metrics are of by about ~4-5% during the same 100 epochs, and the reason i suspect a bug is because i believe i am using the appropriate metrics:

def random_batch(X,y, batch_size=32):
    idx= np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

##Further split train data to training set and validation set

X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.15, random_state=1)

##Run autodiff on model

n_epochs=100
batch_size=32
n_steps=len(X_train)//batch_size

optimizer=tf.keras.optimizers.Adam()
loss=tf.keras.losses.BinaryCrossentropy()

metricLoss=tf.keras.metrics.BinaryCrossentropy()
metricsAcc=tf.keras.metrics.BinaryAccuracy()

val_acc_metric=tf.keras.metrics.BinaryAccuracy()
val_acc_loss=tf.keras.metrics.BinaryCrossentropy()


train_loss_results = []
train_accuracy_results = []

validation_loss_results = []
validation_accuracy_results = []

# for loop iterate over epochs
for epoch in range(n_epochs):

    print("Epoch {}/{}".format(epoch, n_epochs))

    # for loop iterate over batches
    for step in range(1, n_steps + 1):
        X_batch, y_batch=random_batch(X_train.values, y_train)

        # gradientTape autodiff
        with tf.GradientTape() as tape:
            y_pred=model(X_batch, training=True)
            loss_values=loss(y_batch, y_pred)
        gradients=tape.gradient(loss_values, model.trainable_weights)
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))

        metricLoss(y_batch, y_pred)
        metricsAcc.update_state(y_batch, y_pred)

        # Loss and accuracy
        train_loss_results.append(loss_values)
        train_accuracy_results.append(metricsAcc.result())

        # Read out training results
        readout = 'Epoch {}, Training loss: {}, Training accuracy: {}'
        print(readout.format(epoch + 1, loss_values,
                              metricsAcc.result() * 100))

        metricsAcc.reset_states

        # Run a validation loop at the end of each epoch

    for valbatch in range(1+ n_steps +1):
        X_batchVal, y_batchVal = random_batch(X_val.values, y_val)

        val_logits = model(X_batchVal)
        # Update val metrics
        val_acc_metric(y_batchVal, val_logits)
        val_acc = val_acc_metric.result()

        val_acc_metric.update_state(y_batchVal, val_logits)

        val_loss=val_acc_loss(y_batchVal, val_logits)

        validation_loss_results.append(val_loss)
        validation_accuracy_results.append(val_acc_metric.result())

        # Read out validation results
        print( 'Validation loss: ' , float(val_loss),'Validation acc: %s' % (float(val_acc * 100),) )

        val_acc_metric.reset_states()

When i run this code, it works fine, and the iterations update the states of the accuracy and loss: however, the training accuracy is much lower than the model.fit method, after running also for 100 epochs: showing final epoch result that is printed (each same epoch is iterating over each batch):

Epoch 100, Training loss: 0.027735430747270584, Training accuracy: 93.6534423828125 Epoch 100, Training loss: 0.03832387551665306, Training accuracy: 93.67249298095703 Epoch 100, Training loss: 0.035500235855579376, Training accuracy: 93.69097900390625 Validation loss: 0.3204055726528168 Validation acc: 90.36458587646484 Validation loss: 0.32066160440444946 Validation acc: 89.71354675292969 Validation loss: 0.32083287835121155 Validation acc: 90.49479675292969 Validation loss: 0.3209479749202728 Validation acc: 90.10416412353516 Validation loss: 0.32088229060173035 Validation acc: 90.625

As you can see, the training accuracy is ~4-5% lower compared to the model.fit method. The loss records fine, and also, the validation data looks pretty much just like the validation data in the model.fit method.

Additionally, when i plot accuracy and loss in both model.fit and geadientTape methods, the shape of the curves look pretty much the same, and they both begin to overfit at similar points! but again, there is a huge discrepancy in the training accuracy.

I have specified the adam optimizer as well binary_crossentropy loss in model.fit and gradientTape. For model.fit, when I specific ‘accuracy’ or ‘acc’ for metrics, my understanding is that it will call on the binary_accuracy for calculating the accuracy. So as far as I am aware the parameters are similar that results should be fairly similar.

Additionally, when i call model.compile after training the model with gradientTape just to confirm evaluation, the results are slightly different again and look more like the model.fit method:

**Training**
model.compile(optimizer=optimizer, loss=tf.keras.losses.binary_crossentropy, metrics=['acc'])
print('\n', model.evaluate(X_train, y_train, verbose=1)[1])

32/101 [========>.....................] - ETA: 0s - loss: 0.0336 - acc: 0.9948
101/101 [==============================] - 0s 307us/sample - **loss: 0.0330 - acc: 0.9942**

**Validation**
model.compile(optimizer=optimizer, loss=tf.keras.losses.binary_crossentropy, metrics=['acc'])
print('\n', model.evaluate(X_val, y_val, verbose=1)[1])

18/18 [==============================] - 0s 111us/sample - **loss: 0.3879 - acc: 0.9028**

Now model.evaluate shows a loss and accuracy that are very similar to the model.fit method when i call evaluate on X_train and y_train. This is why i am suspect of a bug? Interestingly, the model.evaluate on validation data look similar to the gradientTape loop which leaves me really confused as i am therefore unsure of the true training accuracy and loss!

If anyone can help i would really appreciate this… I am happy to provide further code upstream of the model etc… Again, apologies if this is not a bug but this seems really confusing to me like an incorrect behaviour…

tensorflow: Difference in training accuracy and loss using gradientTape vs model.fit with binary_accuracy: A bug?

About this issue

Most upvoted comments