tensorflow: tf.distribute.MirroredStrategy leads to an infinite polling cycle with 4 GPUs
System information
A physical tower with 4 GPUs running Ubuntu 18.04 over Kubernetes
- 256 GB of RAM
- TensorFlow: tested on
tf-nightly-gpu-2.0-preview==2.0.0.dev20190902
totf-nightly-gpu-2.0-preview==2.0.0.dev20190918
- Python 3.6.8
- CUDA 10.0, cuDNN 7.6.3.30 (also tested with cuDNN 7.5.0.56)
- NVIDIA GTX 1080
nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A | | 53% 70C P2 79W / 250W | 10889MiB / 11178MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A | | 52% 69C P2 76W / 250W | 10893MiB / 11178MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:82:00.0 Off | N/A | | 48% 65C P2 78W / 250W | 10889MiB / 11178MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:83:00.0 Off | N/A | | 45% 62C P2 76W / 250W | 10893MiB / 11178MiB | 100% Default | +-------------------------------+----------------------+----------------------+
Problem
I run the following sample code:
#!/usr/bin/env python3
import sys
import tensorflow as tf
def main():
batch_size = 12
features_shape = 372, 558, 3
labels = 10
sample = tf.random.uniform(features_shape)
def with_shape(t, shape):
t = tf.squeeze(t)
t.set_shape(shape)
return t
ds_train = tf.data.Dataset.from_tensors([sample]).map(lambda s: (s, tf.ones((labels,)))) \
.repeat().batch(batch_size).map(lambda s, l: (with_shape(s, (batch_size,) + features_shape),
with_shape(l, (batch_size, labels))))
ds_val = tf.data.Dataset.from_tensors([sample]).map(lambda s: (s, tf.ones((labels,)))) \
.repeat().batch(batch_size).take(10).map(
lambda s, l: (with_shape(s, (batch_size,) + features_shape), with_shape(l, (batch_size, labels))))
with tf.distribute.MirroredStrategy().scope():
model = tf.keras.applications.DenseNet121(
weights=None, input_shape=features_shape, classes=labels)
model.build((batch_size,) + features_shape)
model.summary()
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)
cross_entropy = tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1)
model.compile(optimizer=optimizer, loss=cross_entropy, metrics=["accuracy"])
model.fit(ds_train, validation_data=ds_val, epochs=1, steps_per_epoch=100)
if __name__ == "__main__":
sys.exit(main())
It outputs the following log and hangs for at least 9 hours (I killed it after):
log
2019-09-19 11:22:16.548532: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1 2019-09-19 11:22:16.553080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:02:00.0 2019-09-19 11:22:16.554064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:03:00.0 2019-09-19 11:22:16.555051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 2 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:82:00.0 2019-09-19 11:22:16.555890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 3 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:83:00.0 2019-09-19 11:22:16.556021: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcudart.so.10.0 2019-09-19 11:22:16.556046: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcublas.so.10.0 2019-09-19 11:22:16.556062: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcufft.so.10.0 2019-09-19 11:22:16.556079: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcurand.so.10.0 2019-09-19 11:22:16.556095: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcusolver.so.10.0 2019-09-19 11:22:16.556111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcusparse.so.10.0 2019-09-19 11:22:16.556127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcudnn.so.7 2019-09-19 11:22:16.562745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Adding visible gpu devices: 0, 1, 2, 3 2019-09-19 11:22:16.562815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcudart.so.10.0 2019-09-19 11:22:16.566634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1173] Device interconnect StreamExecutorwith strength 1 edge matrix: 2019-09-19 11:22:16.566650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1179] 0 1 2 3 2019-09-19 11:22:16.566657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 0: N Y N N 2019-09-19 11:22:16.566661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 1: Y N N N 2019-09-19 11:22:16.566666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 2: N N N Y 2019-09-19 11:22:16.566670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 3: N N Y N 2019-09-19 11:22:16.571630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10470 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) 2019-09-19 11:22:16.573706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10470 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1) 2019-09-19 11:22:16.575382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10470 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1) 2019-09-19 11:22:16.576566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10470 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1) WARNING:tensorflow:Entity <function main.<locals>.<lambda> at 0x7fe776f021e0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: expected exactly one node node, found [] 2019-09-19 11:22:17.393146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:02:00.0 2019-09-19 11:22:17.394380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:03:00.0 2019-09-19 11:22:17.395221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 2 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:82:00.0 2019-09-19 11:22:17.396088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1632] Found device 3 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62 pciBusID: 0000:83:00.0 2019-09-19 11:22:17.396168: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcudart.so.10.0 2019-09-19 11:22:17.396202: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcublas.so.10.0 2019-09-19 11:22:17.396218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcufft.so.10.0 2019-09-19 11:22:17.396233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcurand.so.10.0 2019-09-19 11:22:17.396263: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcusolver.so.10.0 2019-09-19 11:22:17.396278: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcusparse.so.10.0 2019-09-19 11:22:17.396293: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcudnn.so.7 2019-09-19 11:22:17.402450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Adding visible gpu devices: 0, 1, 2, 3 2019-09-19 11:22:17.402599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1173] Device interconnect StreamExecutorwith strength 1 edge matrix: 2019-09-19 11:22:17.402611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1179] 0 1 2 3 2019-09-19 11:22:17.402619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 0: N Y N N 2019-09-19 11:22:17.402625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 1: Y N N N 2019-09-19 11:22:17.402631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 2: N N N Y 2019-09-19 11:22:17.402637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192] 3: N N Y N 2019-09-19 11:22:17.407338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/device:GPU:0 with 10470 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) 2019-09-19 11:22:17.408425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/device:GPU:1 with 10470 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1) 2019-09-19 11:22:17.409430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/device:GPU:2 with 10470 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1) 2019-09-19 11:22:17.410293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1318] Created TensorFlow device (/device:GPU:3 with 10470 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1) Model: "densenet121" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 372, 558, 3) 0 __________________________________________________________________________________________________ zero_padding2d (ZeroPadding2D) (None, 378, 564, 3) 0 input_1[0][0] __________________________________________________________________________________________________ conv1/conv (Conv2D) (None, 186, 279, 64) 9408 zero_padding2d[0][0] __________________________________________________________________________________________________ conv1/bn (BatchNormalization) (None, 186, 279, 64) 256 conv1/conv[0][0] __________________________________________________________________________________________________ conv1/relu (Activation) (None, 186, 279, 64) 0 conv1/bn[0][0] __________________________________________________________________________________________________ zero_padding2d_1 (ZeroPadding2D (None, 188, 281, 64) 0 conv1/relu[0][0] __________________________________________________________________________________________________ pool1 (MaxPooling2D) (None, 93, 140, 64) 0 zero_padding2d_1[0][0] __________________________________________________________________________________________________ conv2_block1_0_bn (BatchNormali (None, 93, 140, 64) 256 pool1[0][0] __________________________________________________________________________________________________ conv2_block1_0_relu (Activation (None, 93, 140, 64) 0 conv2_block1_0_bn[0][0] __________________________________________________________________________________________________ conv2_block1_1_conv (Conv2D) (None, 93, 140, 128) 8192 conv2_block1_0_relu[0][0] __________________________________________________________________________________________________ conv2_block1_1_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block1_1_conv[0][0] __________________________________________________________________________________________________ conv2_block1_1_relu (Activation (None, 93, 140, 128) 0 conv2_block1_1_bn[0][0] __________________________________________________________________________________________________ conv2_block1_2_conv (Conv2D) (None, 93, 140, 32) 36864 conv2_block1_1_relu[0][0] __________________________________________________________________________________________________ conv2_block1_concat (Concatenat (None, 93, 140, 96) 0 pool1[0][0] conv2_block1_2_conv[0][0] __________________________________________________________________________________________________ conv2_block2_0_bn (BatchNormali (None, 93, 140, 96) 384 conv2_block1_concat[0][0] __________________________________________________________________________________________________ conv2_block2_0_relu (Activation (None, 93, 140, 96) 0 conv2_block2_0_bn[0][0] __________________________________________________________________________________________________ conv2_block2_1_conv (Conv2D) (None, 93, 140, 128) 12288 conv2_block2_0_relu[0][0] __________________________________________________________________________________________________ conv2_block2_1_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block2_1_conv[0][0] __________________________________________________________________________________________________ conv2_block2_1_relu (Activation (None, 93, 140, 128) 0 conv2_block2_1_bn[0][0] __________________________________________________________________________________________________ conv2_block2_2_conv (Conv2D) (None, 93, 140, 32) 36864 conv2_block2_1_relu[0][0] __________________________________________________________________________________________________ conv2_block2_concat (Concatenat (None, 93, 140, 128) 0 conv2_block1_concat[0][0] conv2_block2_2_conv[0][0] __________________________________________________________________________________________________ conv2_block3_0_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block2_concat[0][0] __________________________________________________________________________________________________ conv2_block3_0_relu (Activation (None, 93, 140, 128) 0 conv2_block3_0_bn[0][0] __________________________________________________________________________________________________ conv2_block3_1_conv (Conv2D) (None, 93, 140, 128) 16384 conv2_block3_0_relu[0][0] __________________________________________________________________________________________________ conv2_block3_1_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block3_1_conv[0][0] __________________________________________________________________________________________________ conv2_block3_1_relu (Activation (None, 93, 140, 128) 0 conv2_block3_1_bn[0][0] __________________________________________________________________________________________________ conv2_block3_2_conv (Conv2D) (None, 93, 140, 32) 36864 conv2_block3_1_relu[0][0] __________________________________________________________________________________________________ conv2_block3_concat (Concatenat (None, 93, 140, 160) 0 conv2_block2_concat[0][0] conv2_block3_2_conv[0][0] __________________________________________________________________________________________________ conv2_block4_0_bn (BatchNormali (None, 93, 140, 160) 640 conv2_block3_concat[0][0] __________________________________________________________________________________________________ conv2_block4_0_relu (Activation (None, 93, 140, 160) 0 conv2_block4_0_bn[0][0] __________________________________________________________________________________________________ conv2_block4_1_conv (Conv2D) (None, 93, 140, 128) 20480 conv2_block4_0_relu[0][0] __________________________________________________________________________________________________ conv2_block4_1_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block4_1_conv[0][0] __________________________________________________________________________________________________ conv2_block4_1_relu (Activation (None, 93, 140, 128) 0 conv2_block4_1_bn[0][0] __________________________________________________________________________________________________ conv2_block4_2_conv (Conv2D) (None, 93, 140, 32) 36864 conv2_block4_1_relu[0][0] __________________________________________________________________________________________________ conv2_block4_concat (Concatenat (None, 93, 140, 192) 0 conv2_block3_concat[0][0] conv2_block4_2_conv[0][0] __________________________________________________________________________________________________ conv2_block5_0_bn (BatchNormali (None, 93, 140, 192) 768 conv2_block4_concat[0][0] __________________________________________________________________________________________________ conv2_block5_0_relu (Activation (None, 93, 140, 192) 0 conv2_block5_0_bn[0][0] __________________________________________________________________________________________________ conv2_block5_1_conv (Conv2D) (None, 93, 140, 128) 24576 conv2_block5_0_relu[0][0] __________________________________________________________________________________________________ conv2_block5_1_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block5_1_conv[0][0] __________________________________________________________________________________________________ conv2_block5_1_relu (Activation (None, 93, 140, 128) 0 conv2_block5_1_bn[0][0] __________________________________________________________________________________________________ conv2_block5_2_conv (Conv2D) (None, 93, 140, 32) 36864 conv2_block5_1_relu[0][0] __________________________________________________________________________________________________ conv2_block5_concat (Concatenat (None, 93, 140, 224) 0 conv2_block4_concat[0][0] conv2_block5_2_conv[0][0] __________________________________________________________________________________________________ conv2_block6_0_bn (BatchNormali (None, 93, 140, 224) 896 conv2_block5_concat[0][0] __________________________________________________________________________________________________ conv2_block6_0_relu (Activation (None, 93, 140, 224) 0 conv2_block6_0_bn[0][0] __________________________________________________________________________________________________ conv2_block6_1_conv (Conv2D) (None, 93, 140, 128) 28672 conv2_block6_0_relu[0][0] __________________________________________________________________________________________________ conv2_block6_1_bn (BatchNormali (None, 93, 140, 128) 512 conv2_block6_1_conv[0][0] __________________________________________________________________________________________________ conv2_block6_1_relu (Activation (None, 93, 140, 128) 0 conv2_block6_1_bn[0][0] __________________________________________________________________________________________________ conv2_block6_2_conv (Conv2D) (None, 93, 140, 32) 36864 conv2_block6_1_relu[0][0] __________________________________________________________________________________________________ conv2_block6_concat (Concatenat (None, 93, 140, 256) 0 conv2_block5_concat[0][0] conv2_block6_2_conv[0][0] __________________________________________________________________________________________________ pool2_bn (BatchNormalization) (None, 93, 140, 256) 1024 conv2_block6_concat[0][0] __________________________________________________________________________________________________ pool2_relu (Activation) (None, 93, 140, 256) 0 pool2_bn[0][0] __________________________________________________________________________________________________ pool2_conv (Conv2D) (None, 93, 140, 128) 32768 pool2_relu[0][0] __________________________________________________________________________________________________ pool2_pool (AveragePooling2D) (None, 46, 70, 128) 0 pool2_conv[0][0] __________________________________________________________________________________________________ conv3_block1_0_bn (BatchNormali (None, 46, 70, 128) 512 pool2_pool[0][0] __________________________________________________________________________________________________ conv3_block1_0_relu (Activation (None, 46, 70, 128) 0 conv3_block1_0_bn[0][0] __________________________________________________________________________________________________ conv3_block1_1_conv (Conv2D) (None, 46, 70, 128) 16384 conv3_block1_0_relu[0][0] __________________________________________________________________________________________________ conv3_block1_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block1_1_conv[0][0] __________________________________________________________________________________________________ conv3_block1_1_relu (Activation (None, 46, 70, 128) 0 conv3_block1_1_bn[0][0] __________________________________________________________________________________________________ conv3_block1_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block1_1_relu[0][0] __________________________________________________________________________________________________ conv3_block1_concat (Concatenat (None, 46, 70, 160) 0 pool2_pool[0][0] conv3_block1_2_conv[0][0] __________________________________________________________________________________________________ conv3_block2_0_bn (BatchNormali (None, 46, 70, 160) 640 conv3_block1_concat[0][0] __________________________________________________________________________________________________ conv3_block2_0_relu (Activation (None, 46, 70, 160) 0 conv3_block2_0_bn[0][0] __________________________________________________________________________________________________ conv3_block2_1_conv (Conv2D) (None, 46, 70, 128) 20480 conv3_block2_0_relu[0][0] __________________________________________________________________________________________________ conv3_block2_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block2_1_conv[0][0] __________________________________________________________________________________________________ conv3_block2_1_relu (Activation (None, 46, 70, 128) 0 conv3_block2_1_bn[0][0] __________________________________________________________________________________________________ conv3_block2_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block2_1_relu[0][0] __________________________________________________________________________________________________ conv3_block2_concat (Concatenat (None, 46, 70, 192) 0 conv3_block1_concat[0][0] conv3_block2_2_conv[0][0] __________________________________________________________________________________________________ conv3_block3_0_bn (BatchNormali (None, 46, 70, 192) 768 conv3_block2_concat[0][0] __________________________________________________________________________________________________ conv3_block3_0_relu (Activation (None, 46, 70, 192) 0 conv3_block3_0_bn[0][0] __________________________________________________________________________________________________ conv3_block3_1_conv (Conv2D) (None, 46, 70, 128) 24576 conv3_block3_0_relu[0][0] __________________________________________________________________________________________________ conv3_block3_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block3_1_conv[0][0] __________________________________________________________________________________________________ conv3_block3_1_relu (Activation (None, 46, 70, 128) 0 conv3_block3_1_bn[0][0] __________________________________________________________________________________________________ conv3_block3_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block3_1_relu[0][0] __________________________________________________________________________________________________ conv3_block3_concat (Concatenat (None, 46, 70, 224) 0 conv3_block2_concat[0][0] conv3_block3_2_conv[0][0] __________________________________________________________________________________________________ conv3_block4_0_bn (BatchNormali (None, 46, 70, 224) 896 conv3_block3_concat[0][0] __________________________________________________________________________________________________ conv3_block4_0_relu (Activation (None, 46, 70, 224) 0 conv3_block4_0_bn[0][0] __________________________________________________________________________________________________ conv3_block4_1_conv (Conv2D) (None, 46, 70, 128) 28672 conv3_block4_0_relu[0][0] __________________________________________________________________________________________________ conv3_block4_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block4_1_conv[0][0] __________________________________________________________________________________________________ conv3_block4_1_relu (Activation (None, 46, 70, 128) 0 conv3_block4_1_bn[0][0] __________________________________________________________________________________________________ conv3_block4_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block4_1_relu[0][0] __________________________________________________________________________________________________ conv3_block4_concat (Concatenat (None, 46, 70, 256) 0 conv3_block3_concat[0][0] conv3_block4_2_conv[0][0] __________________________________________________________________________________________________ conv3_block5_0_bn (BatchNormali (None, 46, 70, 256) 1024 conv3_block4_concat[0][0] __________________________________________________________________________________________________ conv3_block5_0_relu (Activation (None, 46, 70, 256) 0 conv3_block5_0_bn[0][0] __________________________________________________________________________________________________ conv3_block5_1_conv (Conv2D) (None, 46, 70, 128) 32768 conv3_block5_0_relu[0][0] __________________________________________________________________________________________________ conv3_block5_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block5_1_conv[0][0] __________________________________________________________________________________________________ conv3_block5_1_relu (Activation (None, 46, 70, 128) 0 conv3_block5_1_bn[0][0] __________________________________________________________________________________________________ conv3_block5_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block5_1_relu[0][0] __________________________________________________________________________________________________ conv3_block5_concat (Concatenat (None, 46, 70, 288) 0 conv3_block4_concat[0][0] conv3_block5_2_conv[0][0] __________________________________________________________________________________________________ conv3_block6_0_bn (BatchNormali (None, 46, 70, 288) 1152 conv3_block5_concat[0][0] __________________________________________________________________________________________________ conv3_block6_0_relu (Activation (None, 46, 70, 288) 0 conv3_block6_0_bn[0][0] __________________________________________________________________________________________________ conv3_block6_1_conv (Conv2D) (None, 46, 70, 128) 36864 conv3_block6_0_relu[0][0] __________________________________________________________________________________________________ conv3_block6_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block6_1_conv[0][0] __________________________________________________________________________________________________ conv3_block6_1_relu (Activation (None, 46, 70, 128) 0 conv3_block6_1_bn[0][0] __________________________________________________________________________________________________ conv3_block6_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block6_1_relu[0][0] __________________________________________________________________________________________________ conv3_block6_concat (Concatenat (None, 46, 70, 320) 0 conv3_block5_concat[0][0] conv3_block6_2_conv[0][0] __________________________________________________________________________________________________ conv3_block7_0_bn (BatchNormali (None, 46, 70, 320) 1280 conv3_block6_concat[0][0] __________________________________________________________________________________________________ conv3_block7_0_relu (Activation (None, 46, 70, 320) 0 conv3_block7_0_bn[0][0] __________________________________________________________________________________________________ conv3_block7_1_conv (Conv2D) (None, 46, 70, 128) 40960 conv3_block7_0_relu[0][0] __________________________________________________________________________________________________ conv3_block7_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block7_1_conv[0][0] __________________________________________________________________________________________________ conv3_block7_1_relu (Activation (None, 46, 70, 128) 0 conv3_block7_1_bn[0][0] __________________________________________________________________________________________________ conv3_block7_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block7_1_relu[0][0] __________________________________________________________________________________________________ conv3_block7_concat (Concatenat (None, 46, 70, 352) 0 conv3_block6_concat[0][0] conv3_block7_2_conv[0][0] __________________________________________________________________________________________________ conv3_block8_0_bn (BatchNormali (None, 46, 70, 352) 1408 conv3_block7_concat[0][0] __________________________________________________________________________________________________ conv3_block8_0_relu (Activation (None, 46, 70, 352) 0 conv3_block8_0_bn[0][0] __________________________________________________________________________________________________ conv3_block8_1_conv (Conv2D) (None, 46, 70, 128) 45056 conv3_block8_0_relu[0][0] __________________________________________________________________________________________________ conv3_block8_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block8_1_conv[0][0] __________________________________________________________________________________________________ conv3_block8_1_relu (Activation (None, 46, 70, 128) 0 conv3_block8_1_bn[0][0] __________________________________________________________________________________________________ conv3_block8_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block8_1_relu[0][0] __________________________________________________________________________________________________ conv3_block8_concat (Concatenat (None, 46, 70, 384) 0 conv3_block7_concat[0][0] conv3_block8_2_conv[0][0] __________________________________________________________________________________________________ conv3_block9_0_bn (BatchNormali (None, 46, 70, 384) 1536 conv3_block8_concat[0][0] __________________________________________________________________________________________________ conv3_block9_0_relu (Activation (None, 46, 70, 384) 0 conv3_block9_0_bn[0][0] __________________________________________________________________________________________________ conv3_block9_1_conv (Conv2D) (None, 46, 70, 128) 49152 conv3_block9_0_relu[0][0] __________________________________________________________________________________________________ conv3_block9_1_bn (BatchNormali (None, 46, 70, 128) 512 conv3_block9_1_conv[0][0] __________________________________________________________________________________________________ conv3_block9_1_relu (Activation (None, 46, 70, 128) 0 conv3_block9_1_bn[0][0] __________________________________________________________________________________________________ conv3_block9_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block9_1_relu[0][0] __________________________________________________________________________________________________ conv3_block9_concat (Concatenat (None, 46, 70, 416) 0 conv3_block8_concat[0][0] conv3_block9_2_conv[0][0] __________________________________________________________________________________________________ conv3_block10_0_bn (BatchNormal (None, 46, 70, 416) 1664 conv3_block9_concat[0][0] __________________________________________________________________________________________________ conv3_block10_0_relu (Activatio (None, 46, 70, 416) 0 conv3_block10_0_bn[0][0] __________________________________________________________________________________________________ conv3_block10_1_conv (Conv2D) (None, 46, 70, 128) 53248 conv3_block10_0_relu[0][0] __________________________________________________________________________________________________ conv3_block10_1_bn (BatchNormal (None, 46, 70, 128) 512 conv3_block10_1_conv[0][0] __________________________________________________________________________________________________ conv3_block10_1_relu (Activatio (None, 46, 70, 128) 0 conv3_block10_1_bn[0][0] __________________________________________________________________________________________________ conv3_block10_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block10_1_relu[0][0] __________________________________________________________________________________________________ conv3_block10_concat (Concatena (None, 46, 70, 448) 0 conv3_block9_concat[0][0] conv3_block10_2_conv[0][0] __________________________________________________________________________________________________ conv3_block11_0_bn (BatchNormal (None, 46, 70, 448) 1792 conv3_block10_concat[0][0] __________________________________________________________________________________________________ conv3_block11_0_relu (Activatio (None, 46, 70, 448) 0 conv3_block11_0_bn[0][0] __________________________________________________________________________________________________ conv3_block11_1_conv (Conv2D) (None, 46, 70, 128) 57344 conv3_block11_0_relu[0][0] __________________________________________________________________________________________________ conv3_block11_1_bn (BatchNormal (None, 46, 70, 128) 512 conv3_block11_1_conv[0][0] __________________________________________________________________________________________________ conv3_block11_1_relu (Activatio (None, 46, 70, 128) 0 conv3_block11_1_bn[0][0] __________________________________________________________________________________________________ conv3_block11_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block11_1_relu[0][0] __________________________________________________________________________________________________ conv3_block11_concat (Concatena (None, 46, 70, 480) 0 conv3_block10_concat[0][0] conv3_block11_2_conv[0][0] __________________________________________________________________________________________________ conv3_block12_0_bn (BatchNormal (None, 46, 70, 480) 1920 conv3_block11_concat[0][0] __________________________________________________________________________________________________ conv3_block12_0_relu (Activatio (None, 46, 70, 480) 0 conv3_block12_0_bn[0][0] __________________________________________________________________________________________________ conv3_block12_1_conv (Conv2D) (None, 46, 70, 128) 61440 conv3_block12_0_relu[0][0] __________________________________________________________________________________________________ conv3_block12_1_bn (BatchNormal (None, 46, 70, 128) 512 conv3_block12_1_conv[0][0] __________________________________________________________________________________________________ conv3_block12_1_relu (Activatio (None, 46, 70, 128) 0 conv3_block12_1_bn[0][0] __________________________________________________________________________________________________ conv3_block12_2_conv (Conv2D) (None, 46, 70, 32) 36864 conv3_block12_1_relu[0][0] __________________________________________________________________________________________________ conv3_block12_concat (Concatena (None, 46, 70, 512) 0 conv3_block11_concat[0][0] conv3_block12_2_conv[0][0] __________________________________________________________________________________________________ pool3_bn (BatchNormalization) (None, 46, 70, 512) 2048 conv3_block12_concat[0][0] __________________________________________________________________________________________________ pool3_relu (Activation) (None, 46, 70, 512) 0 pool3_bn[0][0] __________________________________________________________________________________________________ pool3_conv (Conv2D) (None, 46, 70, 256) 131072 pool3_relu[0][0] __________________________________________________________________________________________________ pool3_pool (AveragePooling2D) (None, 23, 35, 256) 0 pool3_conv[0][0] __________________________________________________________________________________________________ conv4_block1_0_bn (BatchNormali (None, 23, 35, 256) 1024 pool3_pool[0][0] __________________________________________________________________________________________________ conv4_block1_0_relu (Activation (None, 23, 35, 256) 0 conv4_block1_0_bn[0][0] __________________________________________________________________________________________________ conv4_block1_1_conv (Conv2D) (None, 23, 35, 128) 32768 conv4_block1_0_relu[0][0] __________________________________________________________________________________________________ conv4_block1_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block1_1_conv[0][0] __________________________________________________________________________________________________ conv4_block1_1_relu (Activation (None, 23, 35, 128) 0 conv4_block1_1_bn[0][0] __________________________________________________________________________________________________ conv4_block1_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block1_1_relu[0][0] __________________________________________________________________________________________________ conv4_block1_concat (Concatenat (None, 23, 35, 288) 0 pool3_pool[0][0] conv4_block1_2_conv[0][0] __________________________________________________________________________________________________ conv4_block2_0_bn (BatchNormali (None, 23, 35, 288) 1152 conv4_block1_concat[0][0] __________________________________________________________________________________________________ conv4_block2_0_relu (Activation (None, 23, 35, 288) 0 conv4_block2_0_bn[0][0] __________________________________________________________________________________________________ conv4_block2_1_conv (Conv2D) (None, 23, 35, 128) 36864 conv4_block2_0_relu[0][0] __________________________________________________________________________________________________ conv4_block2_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block2_1_conv[0][0] __________________________________________________________________________________________________ conv4_block2_1_relu (Activation (None, 23, 35, 128) 0 conv4_block2_1_bn[0][0] __________________________________________________________________________________________________ conv4_block2_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block2_1_relu[0][0] __________________________________________________________________________________________________ conv4_block2_concat (Concatenat (None, 23, 35, 320) 0 conv4_block1_concat[0][0] conv4_block2_2_conv[0][0] __________________________________________________________________________________________________ conv4_block3_0_bn (BatchNormali (None, 23, 35, 320) 1280 conv4_block2_concat[0][0] __________________________________________________________________________________________________ conv4_block3_0_relu (Activation (None, 23, 35, 320) 0 conv4_block3_0_bn[0][0] __________________________________________________________________________________________________ conv4_block3_1_conv (Conv2D) (None, 23, 35, 128) 40960 conv4_block3_0_relu[0][0] __________________________________________________________________________________________________ conv4_block3_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block3_1_conv[0][0] __________________________________________________________________________________________________ conv4_block3_1_relu (Activation (None, 23, 35, 128) 0 conv4_block3_1_bn[0][0] __________________________________________________________________________________________________ conv4_block3_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block3_1_relu[0][0] __________________________________________________________________________________________________ conv4_block3_concat (Concatenat (None, 23, 35, 352) 0 conv4_block2_concat[0][0] conv4_block3_2_conv[0][0] __________________________________________________________________________________________________ conv4_block4_0_bn (BatchNormali (None, 23, 35, 352) 1408 conv4_block3_concat[0][0] __________________________________________________________________________________________________ conv4_block4_0_relu (Activation (None, 23, 35, 352) 0 conv4_block4_0_bn[0][0] __________________________________________________________________________________________________ conv4_block4_1_conv (Conv2D) (None, 23, 35, 128) 45056 conv4_block4_0_relu[0][0] __________________________________________________________________________________________________ conv4_block4_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block4_1_conv[0][0] __________________________________________________________________________________________________ conv4_block4_1_relu (Activation (None, 23, 35, 128) 0 conv4_block4_1_bn[0][0] __________________________________________________________________________________________________ conv4_block4_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block4_1_relu[0][0] __________________________________________________________________________________________________ conv4_block4_concat (Concatenat (None, 23, 35, 384) 0 conv4_block3_concat[0][0] conv4_block4_2_conv[0][0] __________________________________________________________________________________________________ conv4_block5_0_bn (BatchNormali (None, 23, 35, 384) 1536 conv4_block4_concat[0][0] __________________________________________________________________________________________________ conv4_block5_0_relu (Activation (None, 23, 35, 384) 0 conv4_block5_0_bn[0][0] __________________________________________________________________________________________________ conv4_block5_1_conv (Conv2D) (None, 23, 35, 128) 49152 conv4_block5_0_relu[0][0] __________________________________________________________________________________________________ conv4_block5_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block5_1_conv[0][0] __________________________________________________________________________________________________ conv4_block5_1_relu (Activation (None, 23, 35, 128) 0 conv4_block5_1_bn[0][0] __________________________________________________________________________________________________ conv4_block5_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block5_1_relu[0][0] __________________________________________________________________________________________________ conv4_block5_concat (Concatenat (None, 23, 35, 416) 0 conv4_block4_concat[0][0] conv4_block5_2_conv[0][0] __________________________________________________________________________________________________ conv4_block6_0_bn (BatchNormali (None, 23, 35, 416) 1664 conv4_block5_concat[0][0] __________________________________________________________________________________________________ conv4_block6_0_relu (Activation (None, 23, 35, 416) 0 conv4_block6_0_bn[0][0] __________________________________________________________________________________________________ conv4_block6_1_conv (Conv2D) (None, 23, 35, 128) 53248 conv4_block6_0_relu[0][0] __________________________________________________________________________________________________ conv4_block6_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block6_1_conv[0][0] __________________________________________________________________________________________________ conv4_block6_1_relu (Activation (None, 23, 35, 128) 0 conv4_block6_1_bn[0][0] __________________________________________________________________________________________________ conv4_block6_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block6_1_relu[0][0] __________________________________________________________________________________________________ conv4_block6_concat (Concatenat (None, 23, 35, 448) 0 conv4_block5_concat[0][0] conv4_block6_2_conv[0][0] __________________________________________________________________________________________________ conv4_block7_0_bn (BatchNormali (None, 23, 35, 448) 1792 conv4_block6_concat[0][0] __________________________________________________________________________________________________ conv4_block7_0_relu (Activation (None, 23, 35, 448) 0 conv4_block7_0_bn[0][0] __________________________________________________________________________________________________ conv4_block7_1_conv (Conv2D) (None, 23, 35, 128) 57344 conv4_block7_0_relu[0][0] __________________________________________________________________________________________________ conv4_block7_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block7_1_conv[0][0] __________________________________________________________________________________________________ conv4_block7_1_relu (Activation (None, 23, 35, 128) 0 conv4_block7_1_bn[0][0] __________________________________________________________________________________________________ conv4_block7_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block7_1_relu[0][0] __________________________________________________________________________________________________ conv4_block7_concat (Concatenat (None, 23, 35, 480) 0 conv4_block6_concat[0][0] conv4_block7_2_conv[0][0] __________________________________________________________________________________________________ conv4_block8_0_bn (BatchNormali (None, 23, 35, 480) 1920 conv4_block7_concat[0][0] __________________________________________________________________________________________________ conv4_block8_0_relu (Activation (None, 23, 35, 480) 0 conv4_block8_0_bn[0][0] __________________________________________________________________________________________________ conv4_block8_1_conv (Conv2D) (None, 23, 35, 128) 61440 conv4_block8_0_relu[0][0] __________________________________________________________________________________________________ conv4_block8_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block8_1_conv[0][0] __________________________________________________________________________________________________ conv4_block8_1_relu (Activation (None, 23, 35, 128) 0 conv4_block8_1_bn[0][0] __________________________________________________________________________________________________ conv4_block8_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block8_1_relu[0][0] __________________________________________________________________________________________________ conv4_block8_concat (Concatenat (None, 23, 35, 512) 0 conv4_block7_concat[0][0] conv4_block8_2_conv[0][0] __________________________________________________________________________________________________ conv4_block9_0_bn (BatchNormali (None, 23, 35, 512) 2048 conv4_block8_concat[0][0] __________________________________________________________________________________________________ conv4_block9_0_relu (Activation (None, 23, 35, 512) 0 conv4_block9_0_bn[0][0] __________________________________________________________________________________________________ conv4_block9_1_conv (Conv2D) (None, 23, 35, 128) 65536 conv4_block9_0_relu[0][0] __________________________________________________________________________________________________ conv4_block9_1_bn (BatchNormali (None, 23, 35, 128) 512 conv4_block9_1_conv[0][0] __________________________________________________________________________________________________ conv4_block9_1_relu (Activation (None, 23, 35, 128) 0 conv4_block9_1_bn[0][0] __________________________________________________________________________________________________ conv4_block9_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block9_1_relu[0][0] __________________________________________________________________________________________________ conv4_block9_concat (Concatenat (None, 23, 35, 544) 0 conv4_block8_concat[0][0] conv4_block9_2_conv[0][0] __________________________________________________________________________________________________ conv4_block10_0_bn (BatchNormal (None, 23, 35, 544) 2176 conv4_block9_concat[0][0] __________________________________________________________________________________________________ conv4_block10_0_relu (Activatio (None, 23, 35, 544) 0 conv4_block10_0_bn[0][0] __________________________________________________________________________________________________ conv4_block10_1_conv (Conv2D) (None, 23, 35, 128) 69632 conv4_block10_0_relu[0][0] __________________________________________________________________________________________________ conv4_block10_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block10_1_conv[0][0] __________________________________________________________________________________________________ conv4_block10_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block10_1_bn[0][0] __________________________________________________________________________________________________ conv4_block10_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block10_1_relu[0][0] __________________________________________________________________________________________________ conv4_block10_concat (Concatena (None, 23, 35, 576) 0 conv4_block9_concat[0][0] conv4_block10_2_conv[0][0] __________________________________________________________________________________________________ conv4_block11_0_bn (BatchNormal (None, 23, 35, 576) 2304 conv4_block10_concat[0][0] __________________________________________________________________________________________________ conv4_block11_0_relu (Activatio (None, 23, 35, 576) 0 conv4_block11_0_bn[0][0] __________________________________________________________________________________________________ conv4_block11_1_conv (Conv2D) (None, 23, 35, 128) 73728 conv4_block11_0_relu[0][0] __________________________________________________________________________________________________ conv4_block11_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block11_1_conv[0][0] __________________________________________________________________________________________________ conv4_block11_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block11_1_bn[0][0] __________________________________________________________________________________________________ conv4_block11_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block11_1_relu[0][0] __________________________________________________________________________________________________ conv4_block11_concat (Concatena (None, 23, 35, 608) 0 conv4_block10_concat[0][0] conv4_block11_2_conv[0][0] __________________________________________________________________________________________________ conv4_block12_0_bn (BatchNormal (None, 23, 35, 608) 2432 conv4_block11_concat[0][0] __________________________________________________________________________________________________ conv4_block12_0_relu (Activatio (None, 23, 35, 608) 0 conv4_block12_0_bn[0][0] __________________________________________________________________________________________________ conv4_block12_1_conv (Conv2D) (None, 23, 35, 128) 77824 conv4_block12_0_relu[0][0] __________________________________________________________________________________________________ conv4_block12_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block12_1_conv[0][0] __________________________________________________________________________________________________ conv4_block12_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block12_1_bn[0][0] __________________________________________________________________________________________________ conv4_block12_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block12_1_relu[0][0] __________________________________________________________________________________________________ conv4_block12_concat (Concatena (None, 23, 35, 640) 0 conv4_block11_concat[0][0] conv4_block12_2_conv[0][0] __________________________________________________________________________________________________ conv4_block13_0_bn (BatchNormal (None, 23, 35, 640) 2560 conv4_block12_concat[0][0] __________________________________________________________________________________________________ conv4_block13_0_relu (Activatio (None, 23, 35, 640) 0 conv4_block13_0_bn[0][0] __________________________________________________________________________________________________ conv4_block13_1_conv (Conv2D) (None, 23, 35, 128) 81920 conv4_block13_0_relu[0][0] __________________________________________________________________________________________________ conv4_block13_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block13_1_conv[0][0] __________________________________________________________________________________________________ conv4_block13_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block13_1_bn[0][0] __________________________________________________________________________________________________ conv4_block13_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block13_1_relu[0][0] __________________________________________________________________________________________________ conv4_block13_concat (Concatena (None, 23, 35, 672) 0 conv4_block12_concat[0][0] conv4_block13_2_conv[0][0] __________________________________________________________________________________________________ conv4_block14_0_bn (BatchNormal (None, 23, 35, 672) 2688 conv4_block13_concat[0][0] __________________________________________________________________________________________________ conv4_block14_0_relu (Activatio (None, 23, 35, 672) 0 conv4_block14_0_bn[0][0] __________________________________________________________________________________________________ conv4_block14_1_conv (Conv2D) (None, 23, 35, 128) 86016 conv4_block14_0_relu[0][0] __________________________________________________________________________________________________ conv4_block14_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block14_1_conv[0][0] __________________________________________________________________________________________________ conv4_block14_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block14_1_bn[0][0] __________________________________________________________________________________________________ conv4_block14_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block14_1_relu[0][0] __________________________________________________________________________________________________ conv4_block14_concat (Concatena (None, 23, 35, 704) 0 conv4_block13_concat[0][0] conv4_block14_2_conv[0][0] __________________________________________________________________________________________________ conv4_block15_0_bn (BatchNormal (None, 23, 35, 704) 2816 conv4_block14_concat[0][0] __________________________________________________________________________________________________ conv4_block15_0_relu (Activatio (None, 23, 35, 704) 0 conv4_block15_0_bn[0][0] __________________________________________________________________________________________________ conv4_block15_1_conv (Conv2D) (None, 23, 35, 128) 90112 conv4_block15_0_relu[0][0] __________________________________________________________________________________________________ conv4_block15_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block15_1_conv[0][0] __________________________________________________________________________________________________ conv4_block15_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block15_1_bn[0][0] __________________________________________________________________________________________________ conv4_block15_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block15_1_relu[0][0] __________________________________________________________________________________________________ conv4_block15_concat (Concatena (None, 23, 35, 736) 0 conv4_block14_concat[0][0] conv4_block15_2_conv[0][0] __________________________________________________________________________________________________ conv4_block16_0_bn (BatchNormal (None, 23, 35, 736) 2944 conv4_block15_concat[0][0] __________________________________________________________________________________________________ conv4_block16_0_relu (Activatio (None, 23, 35, 736) 0 conv4_block16_0_bn[0][0] __________________________________________________________________________________________________ conv4_block16_1_conv (Conv2D) (None, 23, 35, 128) 94208 conv4_block16_0_relu[0][0] __________________________________________________________________________________________________ conv4_block16_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block16_1_conv[0][0] __________________________________________________________________________________________________ conv4_block16_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block16_1_bn[0][0] __________________________________________________________________________________________________ conv4_block16_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block16_1_relu[0][0] __________________________________________________________________________________________________ conv4_block16_concat (Concatena (None, 23, 35, 768) 0 conv4_block15_concat[0][0] conv4_block16_2_conv[0][0] __________________________________________________________________________________________________ conv4_block17_0_bn (BatchNormal (None, 23, 35, 768) 3072 conv4_block16_concat[0][0] __________________________________________________________________________________________________ conv4_block17_0_relu (Activatio (None, 23, 35, 768) 0 conv4_block17_0_bn[0][0] __________________________________________________________________________________________________ conv4_block17_1_conv (Conv2D) (None, 23, 35, 128) 98304 conv4_block17_0_relu[0][0] __________________________________________________________________________________________________ conv4_block17_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block17_1_conv[0][0] __________________________________________________________________________________________________ conv4_block17_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block17_1_bn[0][0] __________________________________________________________________________________________________ conv4_block17_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block17_1_relu[0][0] __________________________________________________________________________________________________ conv4_block17_concat (Concatena (None, 23, 35, 800) 0 conv4_block16_concat[0][0] conv4_block17_2_conv[0][0] __________________________________________________________________________________________________ conv4_block18_0_bn (BatchNormal (None, 23, 35, 800) 3200 conv4_block17_concat[0][0] __________________________________________________________________________________________________ conv4_block18_0_relu (Activatio (None, 23, 35, 800) 0 conv4_block18_0_bn[0][0] __________________________________________________________________________________________________ conv4_block18_1_conv (Conv2D) (None, 23, 35, 128) 102400 conv4_block18_0_relu[0][0] __________________________________________________________________________________________________ conv4_block18_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block18_1_conv[0][0] __________________________________________________________________________________________________ conv4_block18_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block18_1_bn[0][0] __________________________________________________________________________________________________ conv4_block18_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block18_1_relu[0][0] __________________________________________________________________________________________________ conv4_block18_concat (Concatena (None, 23, 35, 832) 0 conv4_block17_concat[0][0] conv4_block18_2_conv[0][0] __________________________________________________________________________________________________ conv4_block19_0_bn (BatchNormal (None, 23, 35, 832) 3328 conv4_block18_concat[0][0] __________________________________________________________________________________________________ conv4_block19_0_relu (Activatio (None, 23, 35, 832) 0 conv4_block19_0_bn[0][0] __________________________________________________________________________________________________ conv4_block19_1_conv (Conv2D) (None, 23, 35, 128) 106496 conv4_block19_0_relu[0][0] __________________________________________________________________________________________________ conv4_block19_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block19_1_conv[0][0] __________________________________________________________________________________________________ conv4_block19_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block19_1_bn[0][0] __________________________________________________________________________________________________ conv4_block19_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block19_1_relu[0][0] __________________________________________________________________________________________________ conv4_block19_concat (Concatena (None, 23, 35, 864) 0 conv4_block18_concat[0][0] conv4_block19_2_conv[0][0] __________________________________________________________________________________________________ conv4_block20_0_bn (BatchNormal (None, 23, 35, 864) 3456 conv4_block19_concat[0][0] __________________________________________________________________________________________________ conv4_block20_0_relu (Activatio (None, 23, 35, 864) 0 conv4_block20_0_bn[0][0] __________________________________________________________________________________________________ conv4_block20_1_conv (Conv2D) (None, 23, 35, 128) 110592 conv4_block20_0_relu[0][0] __________________________________________________________________________________________________ conv4_block20_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block20_1_conv[0][0] __________________________________________________________________________________________________ conv4_block20_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block20_1_bn[0][0] __________________________________________________________________________________________________ conv4_block20_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block20_1_relu[0][0] __________________________________________________________________________________________________ conv4_block20_concat (Concatena (None, 23, 35, 896) 0 conv4_block19_concat[0][0] conv4_block20_2_conv[0][0] __________________________________________________________________________________________________ conv4_block21_0_bn (BatchNormal (None, 23, 35, 896) 3584 conv4_block20_concat[0][0] __________________________________________________________________________________________________ conv4_block21_0_relu (Activatio (None, 23, 35, 896) 0 conv4_block21_0_bn[0][0] __________________________________________________________________________________________________ conv4_block21_1_conv (Conv2D) (None, 23, 35, 128) 114688 conv4_block21_0_relu[0][0] __________________________________________________________________________________________________ conv4_block21_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block21_1_conv[0][0] __________________________________________________________________________________________________ conv4_block21_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block21_1_bn[0][0] __________________________________________________________________________________________________ conv4_block21_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block21_1_relu[0][0] __________________________________________________________________________________________________ conv4_block21_concat (Concatena (None, 23, 35, 928) 0 conv4_block20_concat[0][0] conv4_block21_2_conv[0][0] __________________________________________________________________________________________________ conv4_block22_0_bn (BatchNormal (None, 23, 35, 928) 3712 conv4_block21_concat[0][0] __________________________________________________________________________________________________ conv4_block22_0_relu (Activatio (None, 23, 35, 928) 0 conv4_block22_0_bn[0][0] __________________________________________________________________________________________________ conv4_block22_1_conv (Conv2D) (None, 23, 35, 128) 118784 conv4_block22_0_relu[0][0] __________________________________________________________________________________________________ conv4_block22_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block22_1_conv[0][0] __________________________________________________________________________________________________ conv4_block22_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block22_1_bn[0][0] __________________________________________________________________________________________________ conv4_block22_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block22_1_relu[0][0] __________________________________________________________________________________________________ conv4_block22_concat (Concatena (None, 23, 35, 960) 0 conv4_block21_concat[0][0] conv4_block22_2_conv[0][0] __________________________________________________________________________________________________ conv4_block23_0_bn (BatchNormal (None, 23, 35, 960) 3840 conv4_block22_concat[0][0] __________________________________________________________________________________________________ conv4_block23_0_relu (Activatio (None, 23, 35, 960) 0 conv4_block23_0_bn[0][0] __________________________________________________________________________________________________ conv4_block23_1_conv (Conv2D) (None, 23, 35, 128) 122880 conv4_block23_0_relu[0][0] __________________________________________________________________________________________________ conv4_block23_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block23_1_conv[0][0] __________________________________________________________________________________________________ conv4_block23_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block23_1_bn[0][0] __________________________________________________________________________________________________ conv4_block23_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block23_1_relu[0][0] __________________________________________________________________________________________________ conv4_block23_concat (Concatena (None, 23, 35, 992) 0 conv4_block22_concat[0][0] conv4_block23_2_conv[0][0] __________________________________________________________________________________________________ conv4_block24_0_bn (BatchNormal (None, 23, 35, 992) 3968 conv4_block23_concat[0][0] __________________________________________________________________________________________________ conv4_block24_0_relu (Activatio (None, 23, 35, 992) 0 conv4_block24_0_bn[0][0] __________________________________________________________________________________________________ conv4_block24_1_conv (Conv2D) (None, 23, 35, 128) 126976 conv4_block24_0_relu[0][0] __________________________________________________________________________________________________ conv4_block24_1_bn (BatchNormal (None, 23, 35, 128) 512 conv4_block24_1_conv[0][0] __________________________________________________________________________________________________ conv4_block24_1_relu (Activatio (None, 23, 35, 128) 0 conv4_block24_1_bn[0][0] __________________________________________________________________________________________________ conv4_block24_2_conv (Conv2D) (None, 23, 35, 32) 36864 conv4_block24_1_relu[0][0] __________________________________________________________________________________________________ conv4_block24_concat (Concatena (None, 23, 35, 1024) 0 conv4_block23_concat[0][0] conv4_block24_2_conv[0][0] __________________________________________________________________________________________________ pool4_bn (BatchNormalization) (None, 23, 35, 1024) 4096 conv4_block24_concat[0][0] __________________________________________________________________________________________________ pool4_relu (Activation) (None, 23, 35, 1024) 0 pool4_bn[0][0] __________________________________________________________________________________________________ pool4_conv (Conv2D) (None, 23, 35, 512) 524288 pool4_relu[0][0] __________________________________________________________________________________________________ pool4_pool (AveragePooling2D) (None, 11, 17, 512) 0 pool4_conv[0][0] __________________________________________________________________________________________________ conv5_block1_0_bn (BatchNormali (None, 11, 17, 512) 2048 pool4_pool[0][0] __________________________________________________________________________________________________ conv5_block1_0_relu (Activation (None, 11, 17, 512) 0 conv5_block1_0_bn[0][0] __________________________________________________________________________________________________ conv5_block1_1_conv (Conv2D) (None, 11, 17, 128) 65536 conv5_block1_0_relu[0][0] __________________________________________________________________________________________________ conv5_block1_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block1_1_conv[0][0] __________________________________________________________________________________________________ conv5_block1_1_relu (Activation (None, 11, 17, 128) 0 conv5_block1_1_bn[0][0] __________________________________________________________________________________________________ conv5_block1_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block1_1_relu[0][0] __________________________________________________________________________________________________ conv5_block1_concat (Concatenat (None, 11, 17, 544) 0 pool4_pool[0][0] conv5_block1_2_conv[0][0] __________________________________________________________________________________________________ conv5_block2_0_bn (BatchNormali (None, 11, 17, 544) 2176 conv5_block1_concat[0][0] __________________________________________________________________________________________________ conv5_block2_0_relu (Activation (None, 11, 17, 544) 0 conv5_block2_0_bn[0][0] __________________________________________________________________________________________________ conv5_block2_1_conv (Conv2D) (None, 11, 17, 128) 69632 conv5_block2_0_relu[0][0] __________________________________________________________________________________________________ conv5_block2_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block2_1_conv[0][0] __________________________________________________________________________________________________ conv5_block2_1_relu (Activation (None, 11, 17, 128) 0 conv5_block2_1_bn[0][0] __________________________________________________________________________________________________ conv5_block2_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block2_1_relu[0][0] __________________________________________________________________________________________________ conv5_block2_concat (Concatenat (None, 11, 17, 576) 0 conv5_block1_concat[0][0] conv5_block2_2_conv[0][0] __________________________________________________________________________________________________ conv5_block3_0_bn (BatchNormali (None, 11, 17, 576) 2304 conv5_block2_concat[0][0] __________________________________________________________________________________________________ conv5_block3_0_relu (Activation (None, 11, 17, 576) 0 conv5_block3_0_bn[0][0] __________________________________________________________________________________________________ conv5_block3_1_conv (Conv2D) (None, 11, 17, 128) 73728 conv5_block3_0_relu[0][0] __________________________________________________________________________________________________ conv5_block3_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block3_1_conv[0][0] __________________________________________________________________________________________________ conv5_block3_1_relu (Activation (None, 11, 17, 128) 0 conv5_block3_1_bn[0][0] __________________________________________________________________________________________________ conv5_block3_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block3_1_relu[0][0] __________________________________________________________________________________________________ conv5_block3_concat (Concatenat (None, 11, 17, 608) 0 conv5_block2_concat[0][0] conv5_block3_2_conv[0][0] __________________________________________________________________________________________________ conv5_block4_0_bn (BatchNormali (None, 11, 17, 608) 2432 conv5_block3_concat[0][0] __________________________________________________________________________________________________ conv5_block4_0_relu (Activation (None, 11, 17, 608) 0 conv5_block4_0_bn[0][0] __________________________________________________________________________________________________ conv5_block4_1_conv (Conv2D) (None, 11, 17, 128) 77824 conv5_block4_0_relu[0][0] __________________________________________________________________________________________________ conv5_block4_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block4_1_conv[0][0] __________________________________________________________________________________________________ conv5_block4_1_relu (Activation (None, 11, 17, 128) 0 conv5_block4_1_bn[0][0] __________________________________________________________________________________________________ conv5_block4_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block4_1_relu[0][0] __________________________________________________________________________________________________ conv5_block4_concat (Concatenat (None, 11, 17, 640) 0 conv5_block3_concat[0][0] conv5_block4_2_conv[0][0] __________________________________________________________________________________________________ conv5_block5_0_bn (BatchNormali (None, 11, 17, 640) 2560 conv5_block4_concat[0][0] __________________________________________________________________________________________________ conv5_block5_0_relu (Activation (None, 11, 17, 640) 0 conv5_block5_0_bn[0][0] __________________________________________________________________________________________________ conv5_block5_1_conv (Conv2D) (None, 11, 17, 128) 81920 conv5_block5_0_relu[0][0] __________________________________________________________________________________________________ conv5_block5_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block5_1_conv[0][0] __________________________________________________________________________________________________ conv5_block5_1_relu (Activation (None, 11, 17, 128) 0 conv5_block5_1_bn[0][0] __________________________________________________________________________________________________ conv5_block5_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block5_1_relu[0][0] __________________________________________________________________________________________________ conv5_block5_concat (Concatenat (None, 11, 17, 672) 0 conv5_block4_concat[0][0] conv5_block5_2_conv[0][0] __________________________________________________________________________________________________ conv5_block6_0_bn (BatchNormali (None, 11, 17, 672) 2688 conv5_block5_concat[0][0] __________________________________________________________________________________________________ conv5_block6_0_relu (Activation (None, 11, 17, 672) 0 conv5_block6_0_bn[0][0] __________________________________________________________________________________________________ conv5_block6_1_conv (Conv2D) (None, 11, 17, 128) 86016 conv5_block6_0_relu[0][0] __________________________________________________________________________________________________ conv5_block6_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block6_1_conv[0][0] __________________________________________________________________________________________________ conv5_block6_1_relu (Activation (None, 11, 17, 128) 0 conv5_block6_1_bn[0][0] __________________________________________________________________________________________________ conv5_block6_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block6_1_relu[0][0] __________________________________________________________________________________________________ conv5_block6_concat (Concatenat (None, 11, 17, 704) 0 conv5_block5_concat[0][0] conv5_block6_2_conv[0][0] __________________________________________________________________________________________________ conv5_block7_0_bn (BatchNormali (None, 11, 17, 704) 2816 conv5_block6_concat[0][0] __________________________________________________________________________________________________ conv5_block7_0_relu (Activation (None, 11, 17, 704) 0 conv5_block7_0_bn[0][0] __________________________________________________________________________________________________ conv5_block7_1_conv (Conv2D) (None, 11, 17, 128) 90112 conv5_block7_0_relu[0][0] __________________________________________________________________________________________________ conv5_block7_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block7_1_conv[0][0] __________________________________________________________________________________________________ conv5_block7_1_relu (Activation (None, 11, 17, 128) 0 conv5_block7_1_bn[0][0] __________________________________________________________________________________________________ conv5_block7_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block7_1_relu[0][0] __________________________________________________________________________________________________ conv5_block7_concat (Concatenat (None, 11, 17, 736) 0 conv5_block6_concat[0][0] conv5_block7_2_conv[0][0] __________________________________________________________________________________________________ conv5_block8_0_bn (BatchNormali (None, 11, 17, 736) 2944 conv5_block7_concat[0][0] __________________________________________________________________________________________________ conv5_block8_0_relu (Activation (None, 11, 17, 736) 0 conv5_block8_0_bn[0][0] __________________________________________________________________________________________________ conv5_block8_1_conv (Conv2D) (None, 11, 17, 128) 94208 conv5_block8_0_relu[0][0] __________________________________________________________________________________________________ conv5_block8_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block8_1_conv[0][0] __________________________________________________________________________________________________ conv5_block8_1_relu (Activation (None, 11, 17, 128) 0 conv5_block8_1_bn[0][0] __________________________________________________________________________________________________ conv5_block8_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block8_1_relu[0][0] __________________________________________________________________________________________________ conv5_block8_concat (Concatenat (None, 11, 17, 768) 0 conv5_block7_concat[0][0] conv5_block8_2_conv[0][0] __________________________________________________________________________________________________ conv5_block9_0_bn (BatchNormali (None, 11, 17, 768) 3072 conv5_block8_concat[0][0] __________________________________________________________________________________________________ conv5_block9_0_relu (Activation (None, 11, 17, 768) 0 conv5_block9_0_bn[0][0] __________________________________________________________________________________________________ conv5_block9_1_conv (Conv2D) (None, 11, 17, 128) 98304 conv5_block9_0_relu[0][0] __________________________________________________________________________________________________ conv5_block9_1_bn (BatchNormali (None, 11, 17, 128) 512 conv5_block9_1_conv[0][0] __________________________________________________________________________________________________ conv5_block9_1_relu (Activation (None, 11, 17, 128) 0 conv5_block9_1_bn[0][0] __________________________________________________________________________________________________ conv5_block9_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block9_1_relu[0][0] __________________________________________________________________________________________________ conv5_block9_concat (Concatenat (None, 11, 17, 800) 0 conv5_block8_concat[0][0] conv5_block9_2_conv[0][0] __________________________________________________________________________________________________ conv5_block10_0_bn (BatchNormal (None, 11, 17, 800) 3200 conv5_block9_concat[0][0] __________________________________________________________________________________________________ conv5_block10_0_relu (Activatio (None, 11, 17, 800) 0 conv5_block10_0_bn[0][0] __________________________________________________________________________________________________ conv5_block10_1_conv (Conv2D) (None, 11, 17, 128) 102400 conv5_block10_0_relu[0][0] __________________________________________________________________________________________________ conv5_block10_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block10_1_conv[0][0] __________________________________________________________________________________________________ conv5_block10_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block10_1_bn[0][0] __________________________________________________________________________________________________ conv5_block10_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block10_1_relu[0][0] __________________________________________________________________________________________________ conv5_block10_concat (Concatena (None, 11, 17, 832) 0 conv5_block9_concat[0][0] conv5_block10_2_conv[0][0] __________________________________________________________________________________________________ conv5_block11_0_bn (BatchNormal (None, 11, 17, 832) 3328 conv5_block10_concat[0][0] __________________________________________________________________________________________________ conv5_block11_0_relu (Activatio (None, 11, 17, 832) 0 conv5_block11_0_bn[0][0] __________________________________________________________________________________________________ conv5_block11_1_conv (Conv2D) (None, 11, 17, 128) 106496 conv5_block11_0_relu[0][0] __________________________________________________________________________________________________ conv5_block11_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block11_1_conv[0][0] __________________________________________________________________________________________________ conv5_block11_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block11_1_bn[0][0] __________________________________________________________________________________________________ conv5_block11_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block11_1_relu[0][0] __________________________________________________________________________________________________ conv5_block11_concat (Concatena (None, 11, 17, 864) 0 conv5_block10_concat[0][0] conv5_block11_2_conv[0][0] __________________________________________________________________________________________________ conv5_block12_0_bn (BatchNormal (None, 11, 17, 864) 3456 conv5_block11_concat[0][0] __________________________________________________________________________________________________ conv5_block12_0_relu (Activatio (None, 11, 17, 864) 0 conv5_block12_0_bn[0][0] __________________________________________________________________________________________________ conv5_block12_1_conv (Conv2D) (None, 11, 17, 128) 110592 conv5_block12_0_relu[0][0] __________________________________________________________________________________________________ conv5_block12_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block12_1_conv[0][0] __________________________________________________________________________________________________ conv5_block12_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block12_1_bn[0][0] __________________________________________________________________________________________________ conv5_block12_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block12_1_relu[0][0] __________________________________________________________________________________________________ conv5_block12_concat (Concatena (None, 11, 17, 896) 0 conv5_block11_concat[0][0] conv5_block12_2_conv[0][0] __________________________________________________________________________________________________ conv5_block13_0_bn (BatchNormal (None, 11, 17, 896) 3584 conv5_block12_concat[0][0] __________________________________________________________________________________________________ conv5_block13_0_relu (Activatio (None, 11, 17, 896) 0 conv5_block13_0_bn[0][0] __________________________________________________________________________________________________ conv5_block13_1_conv (Conv2D) (None, 11, 17, 128) 114688 conv5_block13_0_relu[0][0] __________________________________________________________________________________________________ conv5_block13_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block13_1_conv[0][0] __________________________________________________________________________________________________ conv5_block13_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block13_1_bn[0][0] __________________________________________________________________________________________________ conv5_block13_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block13_1_relu[0][0] __________________________________________________________________________________________________ conv5_block13_concat (Concatena (None, 11, 17, 928) 0 conv5_block12_concat[0][0] conv5_block13_2_conv[0][0] __________________________________________________________________________________________________ conv5_block14_0_bn (BatchNormal (None, 11, 17, 928) 3712 conv5_block13_concat[0][0] __________________________________________________________________________________________________ conv5_block14_0_relu (Activatio (None, 11, 17, 928) 0 conv5_block14_0_bn[0][0] __________________________________________________________________________________________________ conv5_block14_1_conv (Conv2D) (None, 11, 17, 128) 118784 conv5_block14_0_relu[0][0] __________________________________________________________________________________________________ conv5_block14_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block14_1_conv[0][0] __________________________________________________________________________________________________ conv5_block14_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block14_1_bn[0][0] __________________________________________________________________________________________________ conv5_block14_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block14_1_relu[0][0] __________________________________________________________________________________________________ conv5_block14_concat (Concatena (None, 11, 17, 960) 0 conv5_block13_concat[0][0] conv5_block14_2_conv[0][0] __________________________________________________________________________________________________ conv5_block15_0_bn (BatchNormal (None, 11, 17, 960) 3840 conv5_block14_concat[0][0] __________________________________________________________________________________________________ conv5_block15_0_relu (Activatio (None, 11, 17, 960) 0 conv5_block15_0_bn[0][0] __________________________________________________________________________________________________ conv5_block15_1_conv (Conv2D) (None, 11, 17, 128) 122880 conv5_block15_0_relu[0][0] __________________________________________________________________________________________________ conv5_block15_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block15_1_conv[0][0] __________________________________________________________________________________________________ conv5_block15_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block15_1_bn[0][0] __________________________________________________________________________________________________ conv5_block15_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block15_1_relu[0][0] __________________________________________________________________________________________________ conv5_block15_concat (Concatena (None, 11, 17, 992) 0 conv5_block14_concat[0][0] conv5_block15_2_conv[0][0] __________________________________________________________________________________________________ conv5_block16_0_bn (BatchNormal (None, 11, 17, 992) 3968 conv5_block15_concat[0][0] __________________________________________________________________________________________________ conv5_block16_0_relu (Activatio (None, 11, 17, 992) 0 conv5_block16_0_bn[0][0] __________________________________________________________________________________________________ conv5_block16_1_conv (Conv2D) (None, 11, 17, 128) 126976 conv5_block16_0_relu[0][0] __________________________________________________________________________________________________ conv5_block16_1_bn (BatchNormal (None, 11, 17, 128) 512 conv5_block16_1_conv[0][0] __________________________________________________________________________________________________ conv5_block16_1_relu (Activatio (None, 11, 17, 128) 0 conv5_block16_1_bn[0][0] __________________________________________________________________________________________________ conv5_block16_2_conv (Conv2D) (None, 11, 17, 32) 36864 conv5_block16_1_relu[0][0] __________________________________________________________________________________________________ conv5_block16_concat (Concatena (None, 11, 17, 1024) 0 conv5_block15_concat[0][0] conv5_block16_2_conv[0][0] __________________________________________________________________________________________________ bn (BatchNormalization) (None, 11, 17, 1024) 4096 conv5_block16_concat[0][0] __________________________________________________________________________________________________ relu (Activation) (None, 11, 17, 1024) 0 bn[0][0] __________________________________________________________________________________________________ avg_pool (GlobalAveragePooling2 (None, 1024) 0 relu[0][0] __________________________________________________________________________________________________ fc1000 (Dense) (None, 10) 10250 avg_pool[0][0] ================================================================================================== Total params: 7,047,754 Trainable params: 6,964,106 Non-trainable params: 83,648 __________________________________________________________________________________________________ Train for 100 steps, validate for 10 steps WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/layers/normalization.py:477: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where 2019-09-19 11:25:34.482086: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcublas.so.10.0 2019-09-19 11:25:34.711640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcudnn.so.7 2019-09-19 11:25:35.685779: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once.
If I remove the MirroredStrategy
scope, the code runs successfully and does not hang (doing meaningless training).
Investigation
top
3161 root 20 0 0.112t 0.013t 948384 S 24.0 5.3 181:17.23 python3
nvidia-smi
’s output is the same that I used in the “System information”: all the GPUs are constantly 100% busy.
top -H -p 3161
- threads of the running process
Threads: 155 total, 0 running, 155 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 0.8 sy, 0.0 ni, 97.8 id, 0.0 wa, 0.3 hi, 0.2 si, 0.0 st KiB Mem : 26408952+total, 99229216 free, 21207464 used, 14365283+buff/cache KiB Swap: 0 total, 0 free, 0 used. 20145740+avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3261 root 20 0 0.112t 0.013t 948360 S 6.3 5.3 42:18.36 python3 3255 root 20 0 0.112t 0.013t 948360 S 6.0 5.3 41:49.75 python3 3259 root 20 0 0.112t 0.013t 948360 S 6.0 5.3 42:09.41 python3 3257 root 20 0 0.112t 0.013t 948360 S 5.6 5.3 42:10.03 python3 3161 root 20 0 0.112t 0.013t 948360 S 0.0 5.3 2:11.62 python3 3165 root 20 0 0.112t 0.013t 948360 S 0.0 5.3 0:00.00 python3 3166 root 20 0 0.112t 0.013t 948360 S 0.0 5.3 0:15.45 python3 …
bt
in gdb --pid 3161
- trace of the main thread
#0 0x00007f26924c5839 in syscall () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f264b30e53b in nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #2 0x00007f264b30db59 in nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #3 0x00007f264b30b11b in nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #4 0x00007f264b30b5f3 in nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #5 0x00007f264344f60c in tensorflow::KernelAndDeviceFunc::Run(tensorflow::ScopedStepContainer*, absl::InlinedVector<tensorflow::TensorValue, 4ul, std::allocator<tensorflow::TensorValue> > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::NodeExecStats*, tensorflow::StepStats*, tensorflow::GraphCollector*, tensorflow::CancellationManager*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #6 0x00007f264344fa06 in tensorflow::KernelAndDeviceFunc::Run(absl::InlinedVector<tensorflow::TensorValue, 4ul, std::allocator<tensorflow::TensorValue> > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::NodeExecStats*, tensorflow::StepStats*, tensorflow::GraphCollector*, tensorflow::CancellationManager*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #7 0x00007f26434313f6 in tensorflow::EagerKernelExecute(tensorflow::EagerContext*, absl::InlinedVector<tensorflow::TensorHandle*, 4ul, std::allocator<tensorflow::TensorHandle*> > const&, std::unique_ptr<tensorflow::KernelAndDevice, tensorflow::core::RefCountDeleter> const&, tensorflow::NodeExecStats*, tensorflow::StepStats*, tensorflow::GraphCollector*, tensorflow::CancellationManager*, absl::Span<tensorflow::TensorHandle*>) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #8 0x00007f2643431aed in tensorflow::ExecuteNode::Run() () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #9 0x00007f264346ca85 in tensorflow::EagerExecutor::RunItem(std::unique_ptr<tensorflow::EagerExecutor::NodeItem, tensorflow::core::RefCountDeleter>) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #10 0x00007f264346d18d in tensorflow::EagerExecutor::AddOrExecute(std::unique_ptr<tensorflow::EagerNode, std::default_delete<tensorflow::EagerNode> >) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #11 0x00007f264342cd86 in tensorflow::(anonymous namespace)::EagerLocalExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #12 0x00007f264342ed00 in tensorflow::EagerExecute(tensorflow::EagerOperation*, tensorflow::TensorHandle**, int*) () ---Type <return> to continue, or q <return> to quit--- from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #13 0x00007f26432bc05d in TFE_Execute () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #14 0x00007f264324640c in TFE_Py_ExecuteCancelable(TFE_Context*, char const*, char const*, absl::InlinedVector<TFE_TensorHandle*, 4ul, std::allocator<TFE_TensorHandle*> >*, _object*, TFE_CancellationManager*, absl::InlinedVector<TFE_TensorHandle*, 2ul, std::allocator<TFE_TensorHandle*> >*, TF_Status*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #15 0x00007f2643246941 in TFE_Py_Execute(TFE_Context*, char const*, char const*, absl::InlinedVector<TFE_TensorHandle*,4ul, std::allocator<TFE_TensorHandle*> >*, _object*, absl::InlinedVector<TFE_TensorHandle*, 2ul, std::allocator<TFE_TensorHandle*> >*, TF_Status*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #16 0x00007f2642ddeb34 in _wrap_TFE_Py_Execute () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #17 0x00000000005097cf in _PyCFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>, func_obj=<built-in method TFE_Py_Execute of module object at remote 0x7f26805d2778>) at ../Objects/methodobject.c:234 #18 _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>) at ../Objects/methodobject.c:294 #19 call_function.lto_priv () at ../Python/ceval.c:4851 #20 0x000000000050b4a9 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335 #21 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f= Frame 0x62d109a8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py, line 61,in quick_execute (op_name='__inference_distributed_function_164755', num_outputs=3, inputs=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f256431f198>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f256431f2e8>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f25642d2c18>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f263badc6d8>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c506cc0>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c50f8d0>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c506780>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c49d2e8>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c50fc18>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c420d68>, <tensorflow.python.framework.ops.EagerTensor at remote 0x7f260c420630>, <tensorflow.python.frame...(truncated)) at ../Python/ceval.c:754 #22 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #23 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #24 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #25 0x000000000050c36e in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3351 #26 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x71ccbef8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py, line 4---Type <return> to continue, or q <return> to quit--- 95, in call (self=<_EagerDefinedFunction(name=b'__inference_distributed_function_164755', _function_deleter=<_EagerDefinedFunctionDeleter(name=b'__inference_distributed_function_164755') at remote 0x7f1e0e0df438>, _registered_on_context=True, definition=<FunctionDef at remote 0x7f24bc06bfa8>, signature=<OpDef at remote 0x7f24bc06bef8>, _num_outputs=3, _output_types=[9, 1, 1], _output_shapes=[<TensorShape(_dims=[]) at remote 0x7f2384537a90>, <TensorShape(_dims=[]) at remote 0x7f2384537518>, <TensorShape(_dims=[]) at remote 0x7f2384537e80>], _control_captures=set(), _func_graph_outputs=[<Tensor(_op=<Operation(_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f25642c78d0>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote 0x7f24c4746288>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f24c4746288>, release=<built-in method release of _thread.lock object at...(truncated)) at ../Python/ceval.c:754 #27 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #28 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #29 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #30 0x000000000050c36e in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3351 #31 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x71ccb5b8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py, line 1600, in _call_flat (self=<ConcreteFunction(_arg_keywords=None, _num_positional_args=None, _func_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f25642c78d0>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote 0x7f24c4746288>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f24c4746288>, release=<built-in method release of _thread.lock object at remote 0x7f24c4746288>, _waiters=<collections.deque at remote 0x7f24e44428d0>) at remote0x7f2384537f60>, _num_groups=2, _group_member_counts=[0, 0]) at remote 0x7f2384537c88>, _nodes_by_id={1: <Operation(_graph=<...>, _inputs_val=(), _id_value=1, _original_op=None, _traceback=<tensorflow_core.python._tf_stack.StackSummary at remote 0x7f23844c6fb8>, _device_code_locations=[<TraceableObject(obj='/job:localhost/replica:0/task:0/device:GPU:0', filename='/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/fr...(truncated)) at ../Python/ceval.c:754 #32 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #33 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #34 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #35 0x000000000050b4a9 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335 #36 0x0000000000508c69 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7f18b8000b38, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py, line 1515, in _filtered_call (self=<ConcreteFunction(_arg_keywords=None, _num_positional_args=None, _func_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f25642c78d0>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote0x7f24c4746288>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f24c4746288>, release=<built-in method release of _thread.lock object at remote 0x7f24c4746288>, _waiters=<collections.deque at remote 0x7f24e44428d0>) at remote 0x7f2384537f60>, _num_groups=2, _group_member_counts=[0, 0]) at remote 0x7f2384537c88>, _nodes_by_id={1: <Operation(_graph=<...>, _inputs_val=(), _id_value=1, _original_op=None, _traceback=<tensorflow_core.python._tf_stack.StackSummary at remote 0x7f23844c6fb8>, _device_code_locations=[<TraceableObject(obj='/job:localhost/replica:0/task:0/device:GPU:0', filename='/usr/local/lib/python3.6/dist-packages/tensorflow_core/p...(truncated)) at ../Python/ceval.c:754 ---Type <return> to continue, or q <return> to quit--- #37 _PyFunction_FastCall (globals=<optimized out>, nargs=139744142953272, args=<optimized out>, co=<optimized out>) at ../Python/ceval.c:4933 #38 fast_function.lto_priv () at ../Python/ceval.c:4968 #39 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #40 0x000000000050b4a9 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335 #41 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x1d37bb48, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py, line 2237, in __call__ (self=<Function(_python_function=<function at remote 0x7f2635ff3a60>, _function_spec=<FunctionSpec(_fullargspec=<FullArgSpec at remote 0x7f24942b4eb8>, _is_method=False, _default_values=None, _args_to_indices={'input_iterator': 0}, arg_names=['input_iterator'], vararg_name=None, _arg_indices_to_default_values={}, _input_signature=None) at remote 0x7f25642e3630>, _name='distributed_function', _autograph=False, _autograph_options=None, _experimental_relax_shapes=False, _function_cache=<FunctionCache(missed={<CacheKey at remote 0x7f244a21be28>}, primary={<CacheKey at remote 0x7f244a21bd68>: <ConcreteFunction(_arg_keywords=None, _num_positional_args=None, _func_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f25642c78d0>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote 0x7f24c4746288>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f24c4...(truncated)) at ../Python/ceval.c:754 #42 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #43 0x0000000000508794 in _PyFunction_FastCallDict () at ../Python/ceval.c:5084 #44 0x00000000005940d1 in _PyObject_FastCallDict (kwargs={}, nargs=2, args=0x7ffcaa451a50, func=<function at remote 0x7f263bd949d8>) at ../Objects/abstract.c:2310 #45 _PyObject_Call_Prepend (kwargs={}, args=<optimized out>, obj=<optimized out>, func=<function at remote 0x7f263bd949d8>) at ../Objects/abstract.c:2373 #46 method_call.lto_priv () at ../Objects/classobject.c:314 #47 0x0000000000549f41 in PyObject_Call (kwargs={}, args=(<DistributedIterator(_enable_get_next_as_optional=False, _iterators=[<_SingleWorkerDatasetIterator(_dataset=<_AutoShardDataset(_input_dataset=<_OptionsDataset(_input_dataset=<_OptionsDataset(_input_dataset=<PrefetchDataset(_input_dataset=<_RebatchDataset(_input_dataset=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_device='CPU', _resource_deleter=<CapturableResourceDeleter(_destroy_resource=None) at remote 0x7f263afb4400>, _create_resource=<function at remote 0x7f263bb23620>, _sel...(truncated), func=<method at remote 0x7f25643a5d88>) at ../Objects/abstract.c:2261 #48 slot_tp_call () at ../Objects/typeobject.c:6207 #49 0x000000000059f50e in PyObject_Call () at ../Objects/abstract.c:2261 #50 0x000000000050c854 in do_call_core (kwdict={}, ---Type <return> to continue, or q <return> to quit--- callargs=(<DistributedIterator(_enable_get_next_as_optional=False, _iterators=[<_SingleWorkerDatasetIterator(_dataset=<_AutoShardDataset(_input_dataset=<_OptionsDataset(_input_dataset=<_OptionsDataset(_input_dataset=<PrefetchDataset(_input_dataset=<_RebatchDataset(_input_dataset=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_device='CPU', _resource_deleter=<CapturableResourceDeleter(_destroy_resource=None) at remote 0x7f263afb4400>, _create_resource=<function at remote 0x7f263bb23620>, _sel...(truncated), func=<Function(_python_function=<function at remote 0x7f2635ff3a60>, _function_spec=<FunctionSpec(_fullargspec=<FullArgSpec at remote 0x7f24942b4eb8>, _is_method=False, _default_values=None, _args_to_indices={'input_iterator': 0}, arg_names=['input_iterator'], vararg_name=None, _arg_indices_to_default_values={}, _input_signature=None) at remote 0x7f25642e3630>, _name='distributed_function', _autograph=False, _autograph_options=None, _experimental_relax_shapes=False, _function_cache=<FunctionCache(missed={<CacheKey at remote 0x7f244a21be28>}, primary={<CacheKey at remote 0x7f244a21bd68>: <ConcreteFunction(_arg_keywords=None, _num_positional_args=None, _func_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f25642c78d0>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote 0x7f24c4746288>, acquire=<built-inmethod acquire of _thread.lock object at remote 0x7f24c4746288>, release=<built-in method release of _thread.lock object at remote 0x7f24c4746288>, _waiters=<collections.deque at remote 0x7f24e...(truncated)) at ../Python/ceval.c:5120 #51 _PyEval_EvalFrameDefault () at ../Python/ceval.c:3404 #52 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x68702018, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py, line 543, in _call (args=(<DistributedIterator(_enable_get_next_as_optional=False, _iterators=[<_SingleWorkerDatasetIterator(_dataset=<_AutoShardDataset(_input_dataset=<_OptionsDataset(_input_dataset=<_OptionsDataset(_input_dataset=<PrefetchDataset(_input_dataset=<_RebatchDataset(_input_dataset=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_device='CPU', _resource_deleter...(truncated)) at ../Python/ceval.c:754 #53 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #54 0x0000000000508794 in _PyFunction_FastCallDict () at ../Python/ceval.c:5084 #55 0x00000000005940d1 in _PyObject_FastCallDict (kwargs={}, nargs=2, args=0x7ffcaa451e10, func=<function at remote 0x7f263bdae048>) at ../Objects/abstract.c:2310 #56 _PyObject_Call_Prepend (kwargs={}, args=<optimized out>, obj=<optimized out>, func=<function at remote 0x7f263bdae048>) at ../Objects/abstract.c:2373 #57 method_call.lto_priv () at ../Objects/classobject.c:314 ---Type <return> to continue, or q <return> to quit--- #58 0x000000000059f50e in PyObject_Call () at ../Objects/abstract.c:2261 #59 0x000000000050c854 in do_call_core (kwdict={}, callargs=(<DistributedIterator(_enable_get_next_as_optional=False, _iterators=[<_SingleWorkerDatasetIterator(_dataset=<_AutoShardDataset(_input_dataset=<_OptionsDataset(_input_dataset=<_OptionsDataset(_input_dataset=<PrefetchDataset(_input_dataset=<_RebatchDataset(_input_dataset=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_device='CPU', _resource_deleter=<CapturableResourceDeleter(_destroy_resource=None) at remote 0x7f263afb4400>, _create_resource=<function at remote 0x7f263bb23620>, _sel...(truncated), func=<method at remote 0x7f25b05c7f88>) at ../Python/ceval.c:5120 #60 _PyEval_EvalFrameDefault () at ../Python/ceval.c:3404 #61 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7f2564359dd8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py, line 480, in __call__ (self=<Function(_lock=<_thread.lock at remote 0x7f2564374df0>, _python_function=<function at remote 0x7f2564495f28>, _function_spec=<FunctionSpec(_fullargspec=<FullArgSpec at remote 0x7f25644326d8>, _is_method=False, _default_values=None, _args_to_indices={'input_iterator': 0}, arg_names=['input_iterator'], vararg_name=None, _arg_indices_to_default_values={}, _input_signature=None) at remote 0x7f256435b400>, _autograph=False, _experimental_autograph_options=None, experimental_relax_shapes=False, _experimental_compile=None, _created_variables=[<weakref at remote 0x7f256418ea48>, <weakref at remote 0x7f256418eae8>, <weakref at remote 0x7f256418ebd8>, <weakref at remote 0x7f256418ed18>, <weakref at remote 0x7f256418ed68>, <weakref at remote 0x7f256418eef8>, <weakref at remote 0x7f252832d098>, <weakref at remote 0x7f252832d188>, <weakref at remote 0x7f252832d228>, <weakref at r...(truncated)) at ../Python/ceval.c:754 #62 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #63 0x0000000000508537 in _PyFunction_FastCallDict () at ../Python/ceval.c:5075 #64 0x00000000005940d1 in _PyObject_FastCallDict (kwargs=0x0, nargs=2, args=0x7ffcaa452190, func=<function at remote 0x7f263bdbef28>) at ../Objects/abstract.c:2310 #65 _PyObject_Call_Prepend (kwargs=0x0, args=<optimized out>, obj=<optimized out>, func=<function at remote 0x7f263bdbef28>) at ../Objects/abstract.c:2373 #66 method_call.lto_priv () at ../Objects/classobject.c:314 #67 0x0000000000549f41 in PyObject_Call (kwargs=0x0, args=(<DistributedIterator(_enable_get_next_as_optional=False, _iterators=[<_SingleWorkerDatasetIterator(_dataset=<_AutoShardDataset(_input_dataset=<_OptionsDataset(_input_dataset=<_OptionsDataset(_input_dataset=<PrefetchDataset(_input_dataset=<_RebatchDataset(_input_dataset=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.Eager---Type <return> to continue, or q <return> to quit--- Tensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_device='CPU', _resource_deleter=<CapturableResourceDeleter(_destroy_resource=None) at remote 0x7f263afb4400>, _create_resource=<function at remote 0x7f263bb23620>, _sel...(truncated), func=<method at remote 0x7f26914e20c8>) at ../Objects/abstract.c:2261 #68 slot_tp_call () at ../Objects/typeobject.c:6207 #69 0x00000000005a95fc in _PyObject_FastCallDict (kwargs=<optimized out>, nargs=1, args=0x7f25642fdc98, func=<Function(_lock=<_thread.lock at remote 0x7f2564374df0>, _python_function=<function at remote 0x7f2564495f28>,_function_spec=<FunctionSpec(_fullargspec=<FullArgSpec at remote 0x7f25644326d8>, _is_method=False, _default_values=None, _args_to_indices={'input_iterator': 0}, arg_names=['input_iterator'], vararg_name=None, _arg_indices_to_default_values={}, _input_signature=None) at remote 0x7f256435b400>, _autograph=False, _experimental_autograph_options=None, experimental_relax_shapes=False, _experimental_compile=None, _created_variables=[<weakref at remote 0x7f256418ea48>, <weakref atremote 0x7f256418eae8>, <weakref at remote 0x7f256418ebd8>, <weakref at remote 0x7f256418ed18>, <weakref at remote 0x7f256418ed68>, <weakref at remote 0x7f256418eef8>, <weakref at remote 0x7f252832d098>, <weakref at remote 0x7f252832d188>,<weakref at remote 0x7f252832d228>, <weakref at remote 0x7f252832d278>, <weakref at remote 0x7f252832d1d8>, <weakref atremote 0x7f252832d318>, <weakref at remote 0x7f252832d4a8>, <weakref at r...(truncated)) at ../Objects/tupleobject.c:131 #70 _PyObject_FastCallKeywords () at ../Objects/abstract.c:2496 #71 0x0000000000509ad3 in call_function.lto_priv () at ../Python/ceval.c:4875 #72 0x000000000050b4a9 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335 #73 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7f25642fdaf8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py, line 86, in execution_function (input_fn=<DistributedIterator(_enable_get_next_as_optional=False, _iterators=[<_SingleWorkerDatasetIterator(_dataset=<_AutoShardDataset(_input_dataset=<_OptionsDataset(_input_dataset=<_OptionsDataset(_input_dataset=<PrefetchDataset(_input_dataset=<_RebatchDataset(_input_dataset=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_...(truncated)) at ../Python/ceval.c:754 #74 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #75 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #76 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #77 0x000000000050b4a9 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335 #78 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x689353d8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.---Type <return> to continue, or q <return> to quit--- py, line 123, in run_one_epoch (model=<Model(_self_setattr_tracking=True, _nested_outputs=<Tensor(_op=<Operation(_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f262967f690>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lockat remote 0x7f260c4a7f30>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f260c4a7f30>, release=<built-in method release of _thread.lock object at remote 0x7f260c4a7f30>, _waiters=<collections.deque at remote 0x7f260c594730>) at remote 0x7f260c5101d0>, _num_groups=2, _group_member_counts=[0, 0]) at remote 0x7f260c510160>, _nodes_by_id={1: <Operation(_graph=<...>, _inputs_val=None, _id_value=1, _original_op=None, _traceback=<tensorflow_core.python._tf_stack.StackSummary at remote 0x7f260c510f48>, _device_code_locations=[<TraceableObject(obj='', filename='/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py', ...(truncated)) at ../Python/ceval.c:754 #79 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #80 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #81 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #82 0x000000000050c36e in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3351 #83 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x68693178, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py, line 331, in fit (self=<Loop at remote 0x7f260c5102b0>, model=<Model(_self_setattr_tracking=True, _nested_outputs=<Tensor(_op=<Operation(_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f262967f690>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote 0x7f260c4a7f30>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f260c4a7f30>, release=<built-in method release of _thread.lock object at remote 0x7f260c4a7f30>, _waiters=<collections.deque at remote 0x7f260c594730>) at remote 0x7f260c5101d0>, _num_groups=2, _group_member_counts=[0, 0]) at remote 0x7f260c510160>, _nodes_by_id={1: <Operation(_graph=<...>, _inputs_val=None, _id_value=1, _original_op=None, _traceback=<tensorflow_core.python._tf_stack.StackSummary at remote 0x7f260c510f48>, _device_code_locations=[<TraceableObject(obj='',filename='/usr/local/lib/python3.6/dist-packages/tensorflow_core/pytho...(truncated)) at ../Python/ceval.c:754 #84 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #85 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #86 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #87 0x000000000050c36e in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3351 #88 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7f20bc0086b8, for file /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py, line 766, in fit (self=<Model(_self_setattr_tracking=True, _nested_outputs=<Tensor(_op=<Operation(_graph=<FuncGraph(_lock=<_thread.RLock at remote 0x7f262967f690>, _group_lock=<GroupLock(_ready=<Condition(_lock=<_thread.lock at remote0x7f260c4a7f30>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f260c4a7f30>, release=<built-in method release of _thread.lock object at remote 0x7f260c4a7f30>, _waiters=<collections.deque at remote 0x7f260c594730>) at remote 0x7f260c5101d0>, _num_groups=2, _group_member_counts=[0, 0]) at remote 0x7f260c510160>, _nodes_by_id={1: <Operation(_graph=<...>, _inputs_val=None, _id_value=1, _original_op=None, _traceback=<tensorflow_core.python._tf_stack.StackSummary at remote 0x7f260c510f48>, _device_code_locations=[<TraceableObject(obj='', filename='/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py', lineno=390...(truncated)) at ../Python/ceval.c:754 ---Type <return> to continue, or q <return> to quit--- #89 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #90 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #91 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #92 0x000000000050c36e in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3351 #93 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x52a7658, for file /user/vmarkovtsev/images/hang.py, line 31, in main (sample=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f26295f78d0>, ds_train=<MapDataset(_input_dataset=<BatchDataset(_input_dataset=<RepeatDataset(_input_dataset=<MapDataset(_input_dataset=<TensorDataset(_structure=<TensorSpec at remote 0x7f26295ffe10>, _tensors=[<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d514438>], _variant_tensor_attr=<tensorflow.python.framework.ops.EagerTensor at remote 0x7f263d5148d0>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[<TrackableReference at remote 0x7f26295ffd80>], _self_unconditional_dependency_names={'_variant_tracker': <_VariantTracker(_resource_handle=<...>, _resource_device='CPU', _resource_deleter=<CapturableResourceDeleter(_destroy_resource=None)at remote 0x7f263afb4400>, _create_resource=<function at remote 0x7f263bb23620>, _self_setattr_tracking=True, _self_unconditional_checkpoint_dependencies=[], _self_unconditional_dependency_n...(truncated)) at ../Python/ceval.c:754 #94 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #95 0x0000000000508fa0 in fast_function.lto_priv () at ../Python/ceval.c:4992 #96 0x000000000050999d in call_function.lto_priv () at ../Python/ceval.c:4872 #97 0x000000000050b4a9 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335 #98 0x0000000000507125 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x20509a8, for file /user/vmarkovtsev/images/hang.py, line 35, in <module> ()) at ../Python/ceval.c:754 #99 _PyEval_EvalCodeWithName.lto_priv.1821 () at ../Python/ceval.c:4166 #100 0x000000000050a3b3 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>) at ../Python/ceval.c:4187 #101 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:731 #102 0x00000000006349e2 in run_mod () at ../Python/pythonrun.c:1025 #103 0x0000000000634a97 in PyRun_FileExFlags () at ../Python/pythonrun.c:978 #104 0x000000000063824f in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:419 #105 0x0000000000638425 in PyRun_AnyFileExFlags () at ../Python/pythonrun.c:81 #106 0x0000000000638df1 in run_file (p_cf=0x7ffcaa45361c, filename=<optimized out>, fp=<optimized out>) at ../Modules/main.c:340 #107 Py_Main () at ../Modules/main.c:810 #108 0x00000000004b0de0 in main (argc=2, argv=0x7ffcaa453818) at ../Programs/python.c:69
bt
of each of the 4 running threads
#0 0x00007fa23e7989d0 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fa1ec03cffd in tensorflow::(anonymous namespace)::PosixEnv::SleepForMicroseconds(long long) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2 #2 0x00007fa1f5d2dcd5 in tensorflow::EventMgr::PollLoop() () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so #3 0x00007fa1ec0528d1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2 #4 0x00007fa1ec04feb8 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2 #5 0x00007fa1ec6a58df in std::execute_native_thread_routine (__p=0x6360ed0) at /dt7-src/libstdc++-v3/src/nonshared11/../c++11/thread.cc:83 #6 0x00007fa23e49c6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fa23e7d588f in clone () from /lib/x86_64-linux-gnu/libc.so.6
Speculation
As we see, there are 4 threads - I guess one for each of my GPUs - which are polling something. They make 25-30% CPU load together. There are more than a hundred other threads, so I don’t know which ones I should bt
additionally. I tried with different batch sizes, which ofc influences the memory consumption, but does not change anything with the hang.
I can provide the access to the hardware or execute arbitrary commands if needed.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (14 by maintainers)
My root problem was malfunctioning peer to peer GPU access. I saw something like this in
dmesg
:My workaround is
export NCCL_P2P_DISABLE=1
.