tensorflow: Keras models train correctly with or without tf.function decorator but this is not correct for custom models
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): - TensorFlow version (use command below):
- Python version: - Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: - GPU model and memory:
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
Describe the expected behavior
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (11 by maintainers)
Please do not do softmax and then cross-entropy; use softmax_cross_entropy_with_logits instead
On Mon, Mar 16, 2020 at 2:47 PM Milad Toutounchian notifications@github.com wrote:
–
Hi, I was able to replicate the issue, thanks for clarifying. I have two distinct answers, and I will start with the obvious but unsatisfactory one, before moving to what appears like an actual issue.
tf.function
decoratespropagate
. I do not get why, but here is the test I ran:So, for some reason, it appears that
tf.function
decoration changes the way gradients are computed, which might be the cause of the model’s lack of convergence. As a matter of fact, when not decorated, the computed gradients tend to be very sparse, i.e. there are a lot of zero values resulting in most weights not being updated during the training step. I do not know why this is the case; it would seem that part of the computation is not properly tracked?Now, the reason why the keras Model trains better is also that in spite of my forcing the use of random normal weights initializers, the initial weights (when not forcefully replaced as in the previous test) are smaller than that generated in the custom model, which seems to result in smoother initial predictions and may explain why it is easier to train.
@Saduf2019 Decorating with
@tf.function
may indeed have benefits as to execution runtime, however it should not have any effect on the accuracy reached, unless there is either a tensorflow bug, or some error-inducing side effects within @miladtoutounchian’s code.@miladtoutounchian I have run the code shared by you on tf 2.1 with and with out @tf.function and did not face any issues,please find the gist for the same. the same code runs without any issues on nightly as well. in case your still facing issue please share a gist where the error is seen along with error logs if any.