deeplearning4j: Deadlock during training with OMP_NUM_THREADS >= 8
Issue Description
When training a CNN on text classification, training hangs when using OMP_NUM_THREADS >= 8. For lower num_threads the performance increases almost linearly: OMP_NUM_THREADS | Batches/sec 1 | 2.117 2 | 3.815 4 | 7.006 6 | 9.539
The (simple) network:
MultiLayerConfiguration config = new NeuralNetConfiguration.Builder()
.weightInit(WeightInit.RELU)
.activation(Activation.LEAKYRELU)
.updater(new Adam(0.01))
.convolutionMode(ConvolutionMode.Same)
.l2(0.001)
.list()
.layer(new ConvolutionLayer.Builder()
.kernelSize(3, 50)
.stride(1, 50)
.nIn(1)
.nOut(100)
.build())
.layer(new GlobalPoolingLayer.Builder()
.poolingType(PoolingType.MAX)
.dropOut(0.7)
.build())
.layer(new OutputLayer.Builder()
.lossFunction(LossFunctions.LossFunction.MCXENT)
.activation(Activation.SOFTMAX)
.nIn(100)
.nOut(AgeGroup.values().length - 1)
.build())
.build();
Output of kill -3 in this gist: https://gist.github.com/tschut/730ebeff7039baed44e52d623c841334.
Version Information
- snapshot version of dl4j and nd4j
- running on cpu (no gpu) on ubuntu 18.04
$ uname -a
Linux gpu-instance2 4.15.0-1029-gcp #31-Ubuntu SMP Thu Mar 21 09:40:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- processor info
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
Stepping: 3
CPU MHz: 2000.180
BogoMIPS: 4000.36
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 56320K
NUMA node0 CPU(s): 0-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat arch_capabilities
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 28 (13 by maintainers)
The dependency should look more like: