tensorflow: Gradient computation erroneously returns None
In [5]: tf.gradients(tf.constant(5), tf.Variable(0))
Out[5]: [None]
The derivative of 5 with respect to x should be 0.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 15
- Comments: 20 (10 by maintainers)
Commits related to this issue
- Merge pull request #783 from dweekly/master Update sum_of_squares (TF 0.10) to mean_squared_error (TF 0.12) — committed to tarasglek/tensorflow by nealwu 7 years ago
- Eager/g3doc: Gradients with respect to constants are None and not 0. Same behavior as tf.gradients() for graphs. Some discussion of this choice in #783 PiperOrigin-RevId: 190096919 — committed to benoitsteiner/tensorflow by asimshankar 6 years ago
If it’s called “gradients” then I expect it to compute gradients… The gradient of e.g. 5 with respect to some vector v is most certainly the zero vector.
Also just from a practicality standpoint, you should be able to compute gradients and then perform mathematical operations with them without having to worry about something unexpectedly becoming a non-
Tensorand causing an exception to be raised. In some TF code I wrote recently I had to make the following function to avoid this bug:Actually, there’s a wrinkle:
Noneis used to indicate a variety of different things:(3) in particular would be very bad to replace with zeros.
Hmm, I get the efficiency argument. I agree that a keyword arg could be a suitable workaround. E.g.
I’m not sure this isn’t the desired behavior. Returning
Nonemakes it explicit that there is no graph connection between the two.I’ve been using this work-around
def replace_none_with_zero(l): return [0 if i==None else i for i in l]
grads = replace_none_with_zero(tf.gradients([grads[0]], [x, y]))
On Tue, Aug 9, 2016 at 4:39 PM, Geoffrey Irving notifications@github.com wrote:
It looks like there are hundreds of direct uses of
tf.gradientswithin Google, so I don’t think a silent performance breaking change is okay. If we’re going to change the default behavior, I think the only way would be to make the specialZerosclass and give it suitable arithmetic overloads. That way, anyone who doesn’t realize and treats it as a normal tensor in a way that doesn’t take advantage of zeros will throw an exception.For now, how about a
return_zerosargument that defaults toFalse?Then I take it the official documentation of tf.gradients() is erroneous?
There doesn’t seem to be such a keyword argument though.
I think I’ve just ran into this:
=\
Yep, I agree that this is a bug; my comment was poorly worded. @mrry: Do we use the
Nonefeature anywhere that changing this would break? It could certainly cause some code to get slower, which is a potential concern.