tensorflow: TypeError: broadcast() takes 1 positional argument but 2 were given
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- TensorFlow installed from (source or binary):source
- TensorFlow version (use command below): b’v1.3.0-rc1-6044-g0b80606’ 1.4.0 (2 days ago)
- Python version: 3.6
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:
- Exact command to reproduce:
Source code / logs
I ran tensorflow/benchmarks
, and got the following error.
Traceback (most recent call last):
File "tf_cnn_benchmarks.py", line 47, in <module>
tf.app.run()
File "/MYHOME/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "tf_cnn_benchmarks.py", line 43, in main
bench.run()
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 956, in run
return self._benchmark_cnn()
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1046, in _benchmark_cnn
self._build_model_single_session())
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1563, in _build_model_single_session all_top_5_ops, phase_train)
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1370, in _build_fetches
self.variable_mgr.preprocess_device_grads(device_grads))
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/variable_mgr.py", line 386, in preprocess_device_grads
agg_small_grads_max_group=self._agg_small_grads_max_group)
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/allreduce.py", line 332, in sum_gradients_all_reduce
if is_hierarchical else aux_device_groups[group_index], num_shards))
File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/allreduce.py", line 236, in sum_grad_and_var_all_reduce
tf.add)
File "/MYHOME/.local/lib/python3.6/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py", line 780, in build_nccl_then_ring
return _build_nccl_hybrid(input_tensors, red_op, upper_level_f)
File "/MYHOME/.local/lib/python3.6/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py", line 748, in _build_nccl_hybrid
send_op, dst_tensors = nccl.broadcast(level_2_output[w], dst_devices)
TypeError: broadcast() takes 1 positional argument but 2 were given
The reason is that the invocation of nccl.broadcast
is different from its signature:
https://github.com/tensorflow/tensorflow/blob/17e725c0558581cba19bd6c409698b2c3f88efe5/tensorflow/contrib/all_reduce/python/all_reduce.py#L748
https://github.com/tensorflow/tensorflow/blob/17e725c0558581cba19bd6c409698b2c3f88efe5/tensorflow/contrib/nccl/python/ops/nccl_ops.py#L173-L182
Problems still exists in current HEAD.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 21 (14 by maintainers)
@poxvoculi thanks, i had already tried same fix with you before this issue. yesterday i tried your code, but failed again at same error: (use tf_benchmark: variable_update: distributed_all_reduce all_reduce_spec: nccl/xring)
UnimplementedError (see above for traceback): This op should be replaced during graph optimization. caused by File python2.7/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py", line 781, in build_nccl_then_ring return _build_nccl_hybrid(input_tensors, red_op, upper_level_f) File “python2.7/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py”, line 749, in _build_nccl_hybrid broadcast_src = nccl.broadcast(array_ops.identity(level_2_output[w])) File “python2.7/site-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py”, line 187, in broadcast return gen_nccl_ops.nccl_broadcast(input=tensor, shape=tensor.shape) File “python2.7/site-packages/tensorflow/contrib/nccl/ops/gen_nccl_ops.py”, line 98, in nccl_broadcast “NcclBroadcast”, input=input, shape=shape, name=name)
which i guess rewrite graph maybe happen in nccl send/recv op.
IndexedSlices are a sparse representation, right? Yes, excuse me for that i sayed that nccl in the issue above, actually it’s contrib/all_reduce module, i fix the issue content error now, thanks.