keras: Error InvalidArgumentError: Incompatible shapes when using accuracy metric, sparse_categorical_crossentropy, and batch size > 1

⚠️⚠️⚠️
I found out that this issue only happens in Keras version > 2.2.2 (so in 2.2.3 and 2.2.4 up to now).

I downgraded to version 2.2.2 (and Tensorflow 1.10.0) and the error doesn’t happen anymore. But still this should be fixed because I want to be able to use the latest Tensorflow T__T


I found an issue when trying to fit a RNN model with sparse_categorical_crossentropy loss and metrics=["accuracy"]. I created simple example in order to reproduce consistently this error.

This is the input data: a simple fibonacci series where given a 3 number sequence, the model will try to predict the following 3 numbers.

x = np.array([[1, 1, 2], [1, 2, 3], [2, 3, 5], [3, 5, 8], [5, 8, 13], [8, 13, 21]])
y = np.array([[3, 5, 8], [5, 8, 13], [8, 13, 21], [13, 21, 34], [21, 34, 55], [34, 55, 89]])
y = y.reshape((-1, y.shape[1], 1))

It’s just a silly example so I treated the inputs as tokens, like if it was a text to text network.

Now, here’s the model I used

input_layer = Input(shape=x.shape[1:])
rnn = Embedding(90, 200)(input_layer)
rnn = Bidirectional(GRU(64, return_sequences=True))(rnn)
rnn = TimeDistributed(Dense(90))(rnn)
rnn = Activation("softmax")(rnn)

model = Model(inputs=input_layer, outputs=rnn)
model.compile(loss=sparse_categorical_crossentropy, optimizer="adam", metrics=['accuracy'])

model.summary()

It doesn’t really matter what kind of model I use, the importat thing is that this 4 things are true:

  • The model predicts a times series with shape: (BatchSize, SeriesLength, VocabSize) in this case, the shape is (3, 3, 90) as the numbers are treated as tokens so there are 90 possible values (0 to 89).
  • The model uses sparse_categorical_crossentropy as its loss function
  • The model uses accuracy as one of its metrics
  • The batch size is bigger than 1 (if it’s 1 everything works 😮 )

Then, I just fit the model.

model.fit(x, y, epochs=1000, batch_size=3)

After the first batch is processed, when tring to calculate the accuracy I get the following error:

InvalidArgumentError: Incompatible shapes: [9] vs. [3,3]
	 [[{{node metrics_16/acc/Equal}} = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](metrics_16/acc/Reshape, metrics_16/acc/Cast)]]

If I remove the accuracy metric, the model is able to train and predict without any issue (except that I have no feedback about how the model is performing).

I had just done an identical model in a Notebook from a Udacity Nanodegree and there was no such error, so it’s probably something related with either the Keras version, the Tensorflow version (I’m using the last version of both) or something else in my installation, in which case maybe you won’t be able to reproduce it in your machine.

Does anybody have any idea of why is this happening? Thank you.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 24
  • Comments: 41

Most upvoted comments

Changing my metric from sparse_categorical_accuracy to categorical_accuracy avoids the error

It looks like the issue has been fixed in the latest master and it will be most likely included in the next release 2.2.5 (hopefully soon). Until then, you can update to the HEAD of master from pip by doing:

pip3 install git+https://github.com/keras-team/keras.git -U

Even when you set the batch size as 1, you would get the error “InvalidArgumentError: Incompatible shapes” while evaluating the model. It should have raised an error at the time of training process.

Or you can define customized accuracy:

def custom_sparse_categorical_accuracy(y_true, y_pred):
    return K.cast(K.equal(K.max(y_true, axis=-1),
                          K.cast(K.argmax(y_pred, axis=-1), K.floatx())),
                  K.floatx())

and then use metrics = [custom_sparse_categorical_accuracy] along with loss='sparse_categorical_crossentropy'

It’s the issue with python 2.7 version

No it’s not. I was using python 3.6.5 and had the issue. It dissapeared when downgrading to Keras 2.2.2 with Tensorflow 1.10.0

There shouldn’t be a need to use K and perform the transformations by yourself, that’s exactly what Keras should be doing properly when using the sparse_categorical_crossentropy loss & accuracy metric (and it’s doing it until version 2.2.2)

I am able to execute the code for python 3.x. It’s the issue with python 2.7 version. You can work around the issue by creating your custom metric function to get the accuracy. I would recommend you to use the abstract Keras backend ‘K’ as it has a lot of helpful methods.

def get_predictions(x_test):
    preds = model.predict(x_test)
    y_pred = [np.argmax(i, axis=1) for i in preds]
    y_pred = np.array(y_pred)
    y_pred = y_pred.reshape((-1, y.shape[1], 1))
    return y_pred

Then, you can flatten the arrays of the true and predicted values for easy comparison for our metric function. You can use the accuracy score function of sklearn.

accuracy_score(y.flatten(), y_pred.flatten())

The issue is when we add the metrics in the model.compile method for the RNN model, there’s an error while training the model i.e Incompatible shapes: [6] vs. [2,3] where the arrays need to be flattened for the accuracy metric. There’s one more issue that when we set the batch size as 1, it doesn’t raise an error while training but it raises an error during evaluation.

The problem seems to be with the keras callback function while implementing the metric.

Using train data flow from directory I get this: Found 166105 images belonging to 84 classes.. Passing into a neural network generating this error:

Versions:

tensorflow: '2.1.0'- cpu
keras: '2.2.4-tf'
Python: 3.7.6

train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest'
)

train_generator = train_datagen.flow_from_directory(
        '/Users/royakash/Desktop/Images',
        target_size=(100, 100),
        batch_size=1,
        class_mode='categorical'
)

def create_model():
    model = tf.keras.models.Sequential([
#         tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(100, 100, 1)),
#         tf.keras.layers.MaxPooling2D(2, 2),
#         tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
#         tf.keras.layers.MaxPooling2D(2,2),
#         tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
#         tf.keras.layers.MaxPooling2D(2,2),
#         tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
#         tf.keras.layers.MaxPooling2D(2,2),
#         tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(84, activation='softmax')
    ])

    return model

model.compile(optimizer= Adam(), loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

history = model.fit(train_generator, epochs=30, verbose=2)

Generating error:

	 [[node metrics/sparse_categorical_accuracy/Equal (defined at <ipython-input-6-9c043135f786>:3) ]] [Op:__inference_distributed_function_1074]

Function call stack:
distributed_function```

If you are using a RNN careful with return_sequence of your layer arguments. Check you last layer of LSTM. You should not insert return_sequence argument to you last LSTM layer.

When you are insert return_sequence to your LSTM layer, this means your layer outputs will be the inputs of next LSTM layer. However you do not have a LSTM layer after return_sequence layer.

If you delete this layer you will probably get through that error cleanly.

Hope this helps to you.

I’m not sure, but I’d say that using sparse_categorical_crossentropy instead of categorical_crossentropy has other benefits apart from being able to use labels in their tokenized format. I might be wrong but it sounds like the sparse keyword would suggest that the loss is specifically made for cases where one-hot vectors are too big that the model has a very low gradient to learn with (like NLP models where your word dictionary contains maybe a couple hundred of thousand words).

Anyway, I already know those suggestions, I tried a lot of things before posting this issue, that’s why I was so explicit about the conditions for the issue to happen, plus as I told multiple times, I got it working by downgrading keras & tensorflow.

I submitted this issue so it could get fixed, not so I could find a way to bypass it, please stop suggesting hacky or alternative solutions, the only real solution here is to fix the bug causing the issue.

One more way to go around it is to convert the y labels into one hot vectors, then we can mention categorical_crossentropy loss and categorical_accuracy metrics for the model.

from keras.utils.np_utils import to_categorical

categorical_y_labels = to_categorical(y, num_classes=size_of_vocabulary)

The model:

input_layer = Input(shape=x.shape[1:])
rnn = Embedding(90, 200)(input_layer)
rnn = Bidirectional(GRU(64, return_sequences=True))(rnn)
rnn = TimeDistributed(Dense(90))(rnn)
rnn = Activation("softmax")(rnn)

model = Model(inputs=input_layer, outputs=rnn)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['categorical_accuracy'])

Training of the model:

Epoch 1/50 6/6 [==============================] - 1s 130ms/step - loss: 4.4943 - categorical_accuracy: 0.0000e+00 Epoch 2/50 6/6 [==============================] - 0s 2ms/step - loss: 4.4343 - categorical_accuracy: 0.7222 Epoch 3/50 6/6 [==============================] - 0s 3ms/step - loss: 4.3718 - categorical_accuracy: 0.8333 Epoch 4/50 6/6 [==============================] - 0s 3ms/step - loss: 4.3009 - categorical_accuracy: 0.8889 Epoch 5/50 6/6 [==============================] - 0s 3ms/step - loss: 4.2230 - categorical_accuracy: 0.9444 Epoch 6/50 6/6 [==============================] - 0s 3ms/step - loss: 4.1343 - categorical_accuracy: 0.9444 Epoch 7/50 6/6 [==============================] - 0s 3ms/step - loss: 4.0317 - categorical_accuracy: 0.9444

For predictions:

def get_predictions(x_test):
    preds = model.predict(x_test)
    y_pred = [np.argmax(i, axis=1) for i in preds]
    y_pred = np.array(y_pred)
    y_pred = to_categorical(y_pred, num_classes=90)
    return y_pred

Try this, let me know if you come across any erros?

Hi,     I’m very sorry. I haven’t solved this problem. I’m still learning.

------------------ 原始邮件 ------------------ 发件人: “Arjun-Arvindakshan”<notifications@github.com>; 发送时间: 2019年10月31日(星期四) 晚上7:58 收件人: “keras-team/keras”<keras@noreply.github.com>; 抄送: “如若初见”<903258755@qq.com>;“Comment”<comment@noreply.github.com>; 主题: Re: [keras-team/keras] Error InvalidArgumentError: Incompatible shapes when using accuracy metric, sparse_categorical_crossentropy, and batch size > 1 (#11749)

Even when you set the batch size as 1, you would get the error “InvalidArgumentError: Incompatible shapes” while evaluating the model. It should have raised an error at the time of training process.

Hi, I seem to encounter the same problem. And I can’t seem to figure it out. Could you make it a bit more clearer? Thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

@fchollet, I recommend we should always have a regression test for sparse_categorical_crossentropy that includes 3D output such as RNN predictions & transformer nets.