tensorflow: Backprop through conv2d with large tensors fails
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): 1.1.0
- Python version: 3.5.2
- Bazel version (if compiling from source): 0.4.5
- CUDA/cuDNN version: CUDA Version 8.0.61
- GPU model and memory: GeForce GTX 1080, 8GB
- Exact command to reproduce:
import tensorflow as tf
proj = tf.Variable( tf.random_normal([720,1,400,600], stddev = 2) )
kernel = tf.Variable( tf.random_normal([1, 401, 1, 1], stddev = .5), trainable = True )
proj = tf.nn.conv2d(
input = proj,
filter = kernel,
strides = [ 1, 1, 1, 1 ],
padding = 'SAME',
data_format = 'NCHW',
name = 'ramlak-filter'
)
grad = tf.gradients( proj, kernel, proj )
with tf.Session() as sess:
sess.run( tf.global_variables_initializer() )
print( sess.run( grad ) )
Describe the problem
Backpropagation through conv2d for large tensors produces an error for me (see log below). This is true for the code above. Lowering the batch size in proj from 720 to 100 eliminates the error. The issue has been confirmed by another user at stackoverflow.
Source code / logs
NotFoundError (see above for traceback): No algorithm without scratch worked!
[[Node: gradients/ramlak-filter_grad/Conv2DBackpropFilter = Conv2DBackpropFilter[T=DT_FLOAT, data_format="NCHW", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable/read, gradients/ramlak-filter_grad/Shape_1, ramlak-filter)]]
[log.txt](https://github.com/tensorflow/tensorflow/files/1128268/log.txt)
Please find the full log attached.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 29 (1 by maintainers)
Hi @ma0ho Some updates from our side:
Just updated to TF 1.2.1 (built from source), but the issue persists.
Same error to conv3d when setting one dimension of the filter to 1. The real error for me is out of memory, when I change the filter size the error disappear.
@yzhwang: No, exactly as given in the example I’m using a 1D filter width 401 with one in and one output channel… I’m sure this is not an OOM problem.