tensorflow: [TFLite C++] Signature calculating CategoricalCrossentropy loss produces wrong result
Click to expand!
Issue Type
Bug
Have you reproduced the bug with TF nightly?
Yes
Source
source
Tensorflow Version
2.13
Custom Code
Yes
OS Platform and Distribution
Windows 10
Mobile device
No response
Python version
No response
Bazel version
5.3.0
GCC/Compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current Behaviour?
I’ve created a simple model in Python (TF version 2.10) and converted it for tflite. The model has two signatures, one for inference and other for training. When I run those signatures in Python, everything works correctly, I get good inference result and good training loss. When I load the converted tflite model with the C++ TFLite API (built from source, from branch r2.13) and run those signatures: inference works as intended, training works as intended (the accuracy on the test set is steadily rising), but the reported loss is totally random. At first I thought that loss might be accumulated since it is rising to five digits, but that is not the case since it rises and falls in a random fashion. It seems like there is some bug in the ops used for CategoricalCrossentropy C++ TFLite implementation.
I’ve tried building tensorflow from r2.12 and r2.13 and I get the same behavior. I’ve tried r2.10 also but then I couldn’t even run the signatures with C++ TFLite API, I was getting bunch of segmentation faults. I couldn’t find anywhere the documentation on what ops for backward prop are available in C++ TFLite API, maybe some of those which are used in CategoricalCrossentropy loss calculation are not yet available, or there is a bug in their implementation.
Standalone code to reproduce the issue
Here is a Python code I am using to create model with signatures:
IMG_SIZE = 28
class Model(tf.Module):
def __init__(self):
self.model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE), name='flatten'),
tf.keras.layers.Dense(
units=10,
kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05),
bias_initializer=tf.keras.initializers.Ones(),
name='dense'
),
])
opt = tf.keras.optimizers.SGD(learning_rate=0.1)
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
self.model.compile(optimizer=opt, loss=loss_fn, metrics=['accuracy'])
# The `train` function takes a batch of input images and labels.
@tf.function(input_signature=[
tf.TensorSpec([32, IMG_SIZE, IMG_SIZE], tf.float32),
tf.TensorSpec([32, 10], tf.float32),
])
def train(self, x, y):
with tf.GradientTape() as tape:
prediction = self.model(x)
loss = self.model.loss(y, prediction)
gradients = tape.gradient(loss, self.model.trainable_variables)
self.model.optimizer.apply_gradients(
zip(gradients, self.model.trainable_variables))
result = {"loss": loss}
return result
@tf.function(input_signature=[
tf.TensorSpec([1, IMG_SIZE, IMG_SIZE], tf.float32),
])
def infer(self, x):
logits = self.model(x)
probabilities = tf.nn.softmax(logits, axis=-1)
return {
"output": probabilities,
"logits": logits
}
And here is the C++ code I am using to run the tflite model:
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromFile(tflite_model_path);
if (model == nullptr)
{
std::cout << "Failed to load model" << std::endl;
return;
}
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder builder(*model, resolver);
std::unique_ptr<tflite::Interpreter> interpreter;
builder(&interpreter);
if (interpreter == nullptr)
{
std::cout << "Failed to create interpreter" << std::endl;
return;
}
if (interpreter->AllocateTensors() != kTfLiteOk)
{
std::cout << "Failed to alocate interpreter tensors" << std::endl;
return;
}
tflite::SignatureRunner* train_runner = interpreter->GetSignatureRunner("train");
TfLiteTensor* input_data_tensor = train_runner->input_tensor(train_runner->input_names()[0]);
float* input_data = input_data_tensor->data.f;
TfLiteTensor* input_labels_tensor = train_runner->input_tensor(train_runner->input_names()[1]);
float* input_labels = input_labels_tensor->data.f;
// Here I fill in the input data and labels, code redacted for brevity.
if (train_runner->Invoke() != kTfLiteOk)
{
std::cout << "Error invoking train interpreter signature" << std::endl;
return;
}
const TfLiteTensor* output_tensor = train_runner->output_tensor(train_runner->output_names()[0]);
float* output = output_tensor->data.f;
std::cout << "Training finished with loss: " << output[0] << std::endl;
Please let me know if you need more details, or full source code.
Relevant log output
Here are the losses from batch to batch, as you can see they are too high and pretty much random. I repeat: the model is training correctly which I can see because the accuracy on the test set is steadily rising, so these loss values do not make sense.
Training of batch 1 finished with loss: 172.813
Training of batch 2 finished with loss: 30406.2
Training of batch 3 finished with loss: 35372.7
Training of batch 4 finished with loss: 30955.9
Training of batch 5 finished with loss: 30645.5
Training of batch 6 finished with loss: 39069.4
Training of batch 7 finished with loss: 25181.5
Training of batch 8 finished with loss: 28106.7
Training of batch 9 finished with loss: 12969.1
Training of batch 10 finished with loss: 3079.69
Training of batch 11 finished with loss: 3693.12
Training of batch 12 finished with loss: 3314.77
Training of batch 13 finished with loss: 4591.12
Training of batch 14 finished with loss: 5880.76
Training of batch 15 finished with loss: 5654.75
Training of batch 16 finished with loss: 10133.1
Training of batch 17 finished with loss: 9301.94
Training of batch 18 finished with loss: 11654.5
Training of batch 19 finished with loss: 11827.8
Training of batch 20 finished with loss: 22028.1
Training of batch 21 finished with loss: 8553.58
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 16
Do you even know how cross entropy loss is calculated, the math behind? Are you aware how big mistakes model should make to get a loss into thousands, how much the model should diverge instead of giving 87% accuracy on the whole test set? Have you seen a loss larger than two digits in successfull training of any ML model known to mankind?
That aside, you don’t find it suspicious that loss is so much smaller just in the first batch?
Have you heard of MNIST dataset? Are you just ignoring the fact that this is classic MNIST dataset and not some random dataset, that the model is a simple one layer neural network used in all of tensorflow examples with pretty much known expected loss/accuracy results? I’ve purposely used the simplest model here for the ease of demonstration, but you constantly pretend like we are talking about training GPT… You say “It’s hard to say w/o more context” when I’ve given you literally all the context possible, but you still talk in hypotheticals.
Accuracy on a whole separate dataset used for testing (MNIST test set). Can you explain how the model can achieve 87% accuracy on a test set but have four digits loss on a training set?
I am creating empty model, converting it to tflite, and then I am training it separately side by side in both python and c++, on the same dataset, calling same signature functions. Of course, I am training python model with python and tflite model with c++ tflite api, because how would I otherwise do it - you can’t train tflite model in python, and I know you will now say “got ya, those are different models!”, but more on that below.
Yes I am aware of that, I’ve worked on a couple of ML tools implementations, and I know it should not give exactly the same results, but they should be close to same not differing 1000x times for Christ’s sake. I still cannot believe what I am reading, how can someone so confidently ignore the obvious… You gotta be just waiting for me to give up so you can close this issue. The scale of difference in loss values between python and c++ tflite doesn’t bother you at all? How do you even test the c++ tflite implementation, what are you comparing it with if not python tensorflow results?