tensorflow: Getting "Dst tensor is not initialized." when really the problem is out of GPU memory

This is the stack trace we sometimes get when trying to use TensorFlow on a GPU that’s occupied by another process. It would help debugging if the error said something about memory.

@zheng-xq

tf.version: ‘0.12.1-1934-g27fca7d-dirty’ (nightly from last week)

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:04:00.0
Total memory: 11.90GiB
Free memory: 381.44MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:04:00.0)
Traceback (most recent call last):
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: zeros_1266 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [160] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "memory_test.py", line 87, in <module>
    profile_densenet(False)
  File "memory_test.py", line 65, in profile_densenet
    sess.run(net.initializer, {net.x_init: trainx[:init_batch_size]})
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: zeros_1266 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [160] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'zeros_1266', defined at:
  File "memory_test.py", line 87, in <module>
    profile_densenet(False)
  File "memory_test.py", line 59, in profile_densenet
    net = densenet_lib.densenet(init_batch_size, batch_size, layers_per_block, filters_per_layer, save_memory=save_memory)
  File "/home/yaroslav/openai.git/densenet/densenet.py", line 183, in densenet
    optimizer = nn.adamax_updates(all_params, loss, lr=tf_lr)
  File "/home/yaroslav/openai.git/densenet/nn.py", line 41, in adamax_updates
    mg = tf.Variable(tf.zeros(int_shape(p)), p.name + '_adamax_mg')
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1376, in zeros
    output = constant(zero, shape=shape, dtype=dtype, name=name)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/yaroslav/.conda/envs/tim-jan17/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Dst tensor is not initialized.
	 [[Node: zeros_1266 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [160] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 9
Comments: 17 (8 by maintainers)

Links to this issue

python - TensorFlow memory use while running on GPU: why does it look like not all memory is used? - Stack Overflow

Commits related to this issue

When allocating GPU constants, check to see if the destination tensor is intialized early (because we ran out of memory) and report it as such. Fixes #7025. Change: 154603030 — committed to caisq/tensorflow by deleted user 7 years ago
PR #7025: [NVIDIA XLA] Add 2 new attributes to hlo instructions to specify the operation queue to run current instruction. Imported from GitHub PR https://github.com/openxla/xla/pull/7025 In CUDA t... — committed to tensorflow/tensorflow by Tixxx 7 months ago
PR #7025: [NVIDIA XLA] NFC Unify GPU backend configs Imported from GitHub PR https://github.com/openxla/xla/pull/7025 This PR unifies all GPU backend configs into a single one, this is to make it ea... — committed to tensorflow/tensorflow by Tixxx 6 months ago

Most upvoted comments

Sounds like you are running out of GPU memory

On Jan 26, 2017 10:33 AM, “Atul Acharya” notifications@github.com wrote:

Hi @yaroslavvb https://github.com/yaroslavvb @zheng-xq https://github.com/zheng-xq

I’m getting this Dst Tensor Not Initialized error.

(See my comment (the last one) in this issue elsewhere: aymericdamien/TensorFlow-Examples#38 https://github.com/aymericdamien/TensorFlow-Examples/issues/38)

I’m reproducing the stack trace here in case it helps diagnose the issue:

▶ python imagenet_inference.py I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GT 750M major: 3 minor: 0 memoryClockRate (GHz) 0.9255 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 305.92MiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0) I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 1, Chunks in use: 0 97.01MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 144.00MiB was 128.00MiB, Chunk State: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60000 of size 1280 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60500 of size 139520 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82600 of size 512 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82800 of size 1228800 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700bae800 of size 1024 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700baec00 of size 3538944 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0ec00 of size 1536 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0f200 of size 2654208 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197200 of size 1536 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197800 of size 1769472 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701347800 of size 1024 I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x701347c00 of size 101725184 I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 512 totalling 512B I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1024 totalling 2.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1536 totalling 3.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 139520 totalling 136.2KiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1228800 totalling 1.17MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1769472 totalling 1.69MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2654208 totalling 2.53MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3538944 totalling 3.38MiB I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 8.91MiB I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: Limit: 111063040 InUse: 9337856 MaxInUse: 9337856 NumAllocs: 11 MaxAllocSize: 3538944

W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********___________________________________________________________________________________________ W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 144.00MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized. E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized. [[Node: Variable_10/initial_value = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]…>, _device=“/job:localhost/replica:0/task:0/gpu:0”]] Traceback (most recent call last): File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1021, in _do_call return fn(*args) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1003, in _run_fn status, run_metadata) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/contextlib.py”, line 66, in exit next(self.gen) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py”, line 469, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized. [[Node: Variable_10/initial_value = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]…>, _device=“/job:localhost/replica:0/task:0/gpu:0”]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “imagenet_inference.py”, line 19, in <module> sess.run(init) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 766, in run run_metadata_ptr) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 964, in _run feed_dict_string, options, run_metadata) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1014, in _do_run target_list, options, run_metadata) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1034, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized. [[Node: Variable_10/initial_value = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]…>, _device=“/job:localhost/replica:0/task:0/gpu:0”]]

Caused by op ‘Variable_10/initial_value’, defined at: File “imagenet_inference.py”, line 16, in <module> probs = AlexNet(x, feature_extract=False) File “/Users/aa/Developer/courses/self_driving_carnd/traffic-signs/CarND-Alexnet-Feature-Extraction/alexnet.py”, line 139, in AlexNet fc6W = tf.Variable(net_data[“fc6”][0]) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py”, line 224, in init expected_shape=expected_shape) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py”, line 333, in _init_from_args initial_value, name=“initial_value”, dtype=dtype) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 669, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py”, line 176, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py”, line 169, in constant attrs={“value”: tensor_value, “dtype”: dtype_value}, name=name).outputs[0] File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 2240, in create_op original_op=self._default_original_op, op_def=op_def) File “/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1128, in init self._traceback = _extract_stack()

InternalError (see above for traceback): Dst tensor is not initialized. [[Node: Variable_10/initial_value = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]…>, _device=“/job:localhost/replica:0/task:0/gpu:0”]]

Here’s deviceQuery successfully reporting seeing the GPU:

py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_HOME /usr/local/cuda py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_VISIBLE_DEVICES

py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ ./deviceQuery ./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GT 750M” CUDA Driver Version / Runtime Version 8.0 / 8.0 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 2048 MBytes (2147024896 bytes) ( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores GPU Max Clock rate: 926 MHz (0.93 GHz) Memory Clock rate: 2508 Mhz Memory Bus Width: 128-bit L2 Cache Size: 262144 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 750M Result = PASS py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ ./bandwidthTest [CUDA Bandwidth Test] - Starting… Running on…

Device 0: GeForce GT 750M Quick Mode

Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3633.5

Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6343.5

Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 42554.1

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

My System:

MacBook Pro (Retina, 15-inch, Late 2013) 2.3 GHz Intel Core i7 16 GB 1600 MHz DDR3 NVIDIA GeForce GT 750M 2048 MB

from System Report > Graphics NVIDIA GeForce GT 750M:

Chipset Model: NVIDIA GeForce GT 750M Type: GPU Bus: PCIe PCIe Lane Width: x8 VRAM (Total): 2048 MB Vendor: NVIDIA (0x10de) Device ID: 0x0fe9 Revision ID: 0x00a2 ROM Revision: 3776 gMux Version: 4.0.8 [3.2.8] Displays: Color LCD: Display Type: Retina LCD Resolution: 2880 x 1800 Retina Retina: Yes Pixel Depth: 32-Bit Color (ARGB8888) Main Display: Yes Mirror: Off Online: Yes Built-In: Yes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/7025#issuecomment-275470982, or mute the thread https://github.com/notifications/unsubscribe-auth/AABaHJK2XuJg9IHUT3Rb63Nbtahdgr8sks5rWObHgaJpZM4Lrr-s .

+17

yaroslavvb on Jan 26, 2017

Still experiencing this issue with recent version 068fd9c936dbf8c9ace9edae9e7bb9e64256d381. (I can confirm this is due to an OOM issue.)

tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: _arg_q_actions_0_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_339__arg_q_actions_0_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: loss/assert_broadcastable/AssertGuard/Assert/Switch/_33 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_91_loss/assert_broadcastable/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Androbin on May 21, 2018

Getting this in TF 1.9.0 when I increase the size of my test set to a large number of samples (possibly a memory issue).

javadnoorb on Aug 1, 2018

If not solved yet… free memory… previously generated not-used embedding, models etc…

del all_embs, (model_names…), (model_input_names) import gc; gc.collect() time.sleep(10)

galpaydin on Dec 14, 2018

I am also getting the same error in TF 1.8.0 also in the latest version. My machine is 3x NVidia Tesla P40 22GB

raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized. [[Node: conv4_22_1x1_increase/Conv2D-0-1-TransposeNCHWToNHWC-LayoutOptimizer/_2025 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/device:GPU:1”, send_device=“/job:localhost/replica:0/task:0/device:GPU:0”, send_device_incarnation=1, tensor_name=“edge_3008_conv4_22_1x1_increase/Conv2D-0-1-TransposeNCHWToNHWC-LayoutOptimizer”, tensor_type=DT_FLOAT, _device=“/job:localhost/replica:0/task:0/device:GPU:1”]] [[Node: adversarial/Mean/_3087 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/device:CPU:0”, send_device=“/job:localhost/replica:0/task:0/device:GPU:1”, send_device_incarnation=1, tensor_name=“edge_5391_adversarial/Mean”, tensor_type=DT_FLOAT, _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]

hosnasattar on Sep 4, 2018

Yeah, verified I have a CL that fixes this, submitting it internally now.

vrv on Apr 28, 2017