tt-metal: Unable to handle large tensors

Unable to move large tensors to device. In other cases if the large tensors get created from operations we are unable to move them to host.

def test_large_slicing(device):
    torch_a = torch.rand((1, 1, 42, 250880), dtype=torch.bfloat16)
    torch_output = torch_a[:, :, -1, :]
    a = ttnn.from_torch(torch_a)
    a = ttnn.to_device(a, device)
    tt_output = a[:, :, -1, :]
    tt_output = ttnn.from_device(tt_output)
    tt_output = ttnn.to_torch(tt_output)
    assert_with_pcc(torch_output, tt_output, 0.9999)

Large tensor moving to host with ttl_tensor.cpu causes…


Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
TT_ASSERT @ tt_metal/impl/dispatch/command_queue.cpp:317: padded_page_size <= consumer_cb_size
info:
Page is too large to fit in consumer buffer
backtrace:
 --- void tt::assert::tt_assert<char [44]>(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const*, char const (&) [44])
 --- tt::tt_metal::EnqueueReadBufferCommand::assemble_device_command(unsigned int)
 --- tt::tt_metal::EnqueueReadBufferCommand::process()
 --- tt::tt_metal::CommandQueue::enqueue_command(tt::tt_metal::Command&, bool)
 --- tt::tt_metal::CommandQueue::enqueue_read_buffer(tt::tt_metal::Buffer&, std::vector<unsigned int, std::allocator<unsigned int> >&, bool)
 --- tt::tt_metal::EnqueueReadBuffer(tt::tt_metal::CommandQueue&, tt::tt_metal::Buffer&, std::vector<unsigned int, std::allocator<unsigned int> >&, bool)
 --- std::vector<bfloat16, std::allocator<bfloat16> > tt::tt_metal::tensor_impl::read_data_from_device<bfloat16>(tt::tt_metal::Tensor const&, unsigned int)
 --- /home/ubuntu/git/tt-metal/tt_eager/tt_lib/_C.so(+0x925f65) [0x7f1665e60f65]
 --- std::_Function_handler<tt::tt_metal::Tensor (tt::tt_metal::Tensor const&), tt::tt_metal::Tensor (*)(tt::tt_metal::Tensor const&)>::_M_invoke(std::_Any_data const&, tt::tt_metal::Tensor const&)
 --- std::function<tt::tt_metal::Tensor (tt::tt_metal::Tensor const&)>::operator()(tt::tt_metal::Tensor const&) const
 --- tt::tt_metal::tensor_impl::to_host_wrapper(tt::tt_metal::Tensor const&)
 --- tt::tt_metal::Tensor::cpu() const

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

@davorchap @abhullar-tt actually supports this in her completion queue PR.