tensorflow: TypeError: broadcast() takes 1 positional argument but 2 were given

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow installed from (source or binary):source
  • TensorFlow version (use command below): b’v1.3.0-rc1-6044-g0b80606’ 1.4.0 (2 days ago)
  • Python version: 3.6
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
  • Exact command to reproduce:

Source code / logs

I ran tensorflow/benchmarks, and got the following error.

Traceback (most recent call last):
  File "tf_cnn_benchmarks.py", line 47, in <module>
    tf.app.run()
  File "/MYHOME/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "tf_cnn_benchmarks.py", line 43, in main
    bench.run()
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 956, in run
    return self._benchmark_cnn()
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1046, in _benchmark_cnn
    self._build_model_single_session())
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1563, in _build_model_single_session                                                                      all_top_5_ops, phase_train)
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1370, in _build_fetches
    self.variable_mgr.preprocess_device_grads(device_grads))
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/variable_mgr.py", line 386, in preprocess_device_grads
    agg_small_grads_max_group=self._agg_small_grads_max_group)
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/allreduce.py", line 332, in sum_gradients_all_reduce
    if is_hierarchical else aux_device_groups[group_index], num_shards))
  File "/MYHOME/tf-benchmarks/scripts/tf_cnn_benchmarks/allreduce.py", line 236, in sum_grad_and_var_all_reduce
    tf.add)
  File "/MYHOME/.local/lib/python3.6/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py", line 780, in build_nccl_then_ring
    return _build_nccl_hybrid(input_tensors, red_op, upper_level_f)
  File "/MYHOME/.local/lib/python3.6/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py", line 748, in _build_nccl_hybrid
    send_op, dst_tensors = nccl.broadcast(level_2_output[w], dst_devices)
TypeError: broadcast() takes 1 positional argument but 2 were given

The reason is that the invocation of nccl.broadcast is different from its signature: https://github.com/tensorflow/tensorflow/blob/17e725c0558581cba19bd6c409698b2c3f88efe5/tensorflow/contrib/all_reduce/python/all_reduce.py#L748 https://github.com/tensorflow/tensorflow/blob/17e725c0558581cba19bd6c409698b2c3f88efe5/tensorflow/contrib/nccl/python/ops/nccl_ops.py#L173-L182 Problems still exists in current HEAD.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 21 (14 by maintainers)

Commits related to this issue

Most upvoted comments

@poxvoculi thanks, i had already tried same fix with you before this issue. yesterday i tried your code, but failed again at same error: (use tf_benchmark: variable_update: distributed_all_reduce all_reduce_spec: nccl/xring)

UnimplementedError (see above for traceback): This op should be replaced during graph optimization. caused by File python2.7/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py", line 781, in build_nccl_then_ring return _build_nccl_hybrid(input_tensors, red_op, upper_level_f) File “python2.7/site-packages/tensorflow/contrib/all_reduce/python/all_reduce.py”, line 749, in _build_nccl_hybrid broadcast_src = nccl.broadcast(array_ops.identity(level_2_output[w])) File “python2.7/site-packages/tensorflow/contrib/nccl/python/ops/nccl_ops.py”, line 187, in broadcast return gen_nccl_ops.nccl_broadcast(input=tensor, shape=tensor.shape) File “python2.7/site-packages/tensorflow/contrib/nccl/ops/gen_nccl_ops.py”, line 98, in nccl_broadcast “NcclBroadcast”, input=input, shape=shape, name=name)

which i guess rewrite graph maybe happen in nccl send/recv op.

IndexedSlices are a sparse representation, right? Yes, excuse me for that i sayed that nccl in the issue above, actually it’s contrib/all_reduce module, i fix the issue content error now, thanks.