tesseract: lstmtraining : terminate called after throwing an instance of 'std::system_error'
uname -a
Linux tesseract-ocr 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:15 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux
lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Model: 2.1 (pvr 004b 0201)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
g++ --version
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
tesseract -v
tesseract 5.0.0-alpha-788-gda42
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found OpenMP 201511
In a recent run of lstmtraining, I have got the above error. I restarted the training and it happened again (4 times so far)…
Iteration 30714: BEST OCR TEXT : पाषाणे सत्यां अनभ्यन्तरे महामुनीनां -पदं शब्दः ९४।१८अबु| -मुदढ 28 डयन्ते.
File gt/allfonts-Martel_Sans_Semi-Bold/Martel_Sans_Semi-Bold.15.exp0.lstmf line 9 :
Mean rms=0.736%, delta=2.331%, train=7.266%(22.381%), skip ratio=0%
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
LAYER.sh: line 23: 2864 Aborted lstmtraining --debug_interval -1 --learning_rate 0.001 --continue_from ../training/$TRAINDIR/$STARTMODEL.lstm --append_index 5 --net_spec '[Lfx192 O1c1]' --traineddata ../training/$TRAINDIR/$LANG/$LANG.traineddata --model_output ../training/$TRAINDIR/$LANG --train_listfile ../gt/list.02 --eval_listfile ../gt/list.01 --max_iterations 50000000
Iteration 61409: GROUND TRUTH : विस्तृत वीडियो होती - पाकिस्तान जाए बारिश हैं। 'क्या शुरू पड़ेंगे 18%, चीज़ें
Iteration 61409: BEST OCR TEXT : विस्तृत वीडियो होती - पाकिस्तान जाए बारिश हैं। क्या शुरु पंगे 18%, चीञें
File ../gt/kraken-devatest/Sura_Bold.15.exp0_17.lstmf line 0 :
Mean rms=0.653%, delta=1.872%, train=5.676%(19.293%), skip ratio=0%
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
LAYER.sh: line 23: 5643 Aborted lstmtraining --debug_interval -1 --learning_rate 0.001 --continue_from ../training/$TRAINDIR/$STARTMODEL.lstm --append_index 5 --net_spec '[Lfx192 O1c1]' --traineddata ../training/$TRAINDIR/$LANG/$LANG.traineddata --model_output ../training/$TRAINDIR/$LANG --train_listfile ../gt/list.02 --eval_listfile ../gt/list.01 --max_iterations 50000000
Iteration 92112: GROUND TRUTH : ऐकार aikāra [(ai)-kāra] m. le son ou la lettre 'ai'.
Iteration 92112: BEST OCR TEXT : ऍऐकार aikāra [(ai)-kāra] m. le son ou la lettre 'ai.
File ../gt/all1004/san.Amiko.0001023.exp0.lstmf line 0 :
Mean rms=0.59%, delta=1.594%, train=4.648%(16.392%), skip ratio=0%
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
LAYER.sh: line 23: 15833 Aborted lstmtraining --debug_interval -1 --learning_rate 0.001 --continue_from ../training/$TRAINDIR/$STARTMODEL.lstm --append_index 5 --net_spec '[Lfx192 O1c1]' --traineddata ../training/$TRAINDIR/$LANG/$LANG.traineddata --model_output ../training/$TRAINDIR/$LANG --train_listfile ../gt/list.02 --eval_listfile ../gt/list.01 --max_iterations 50000000
Iteration 122811: GROUND TRUTH : E. aśita eaten, and gavīna relating to cattle; also āśitaṅgavīna.
File ../gt/all1005/san.Mukta.0000129.exp0.lstmf line 0 (Perfect):
Mean rms=0.563%, delta=1.479%, train=4.183%(15.125%), skip ratio=0%
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
LAYER.sh: line 23: 22227 Aborted lstmtraining --debug_interval -1 --learning_rate 0.001 --continue_from ../training/$TRAINDIR/$STARTMODEL.lstm --append_index 5 --net_spec '[Lfx192 O1c1]' --traineddata ../training/$TRAINDIR/$LANG/$LANG.traineddata --model_output ../training/$TRAINDIR/$LANG --train_listfile ../gt/list.02 --eval_listfile ../gt/list.01 --max_iterations 50000000
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 36 (15 by maintainers)
Commits related to this issue
- Run ReCachePages synchronously during training (fix issue #3111) Signed-off-by: Stefan Weil <sw@weilnetz.de> — committed to tesseract-ocr/tesseract by stweil 3 years ago
That didn’t work.
Following suggestions in this discussion thread I changed
and also added
DefaultTasksMax=99999to/etc/systemd/system.confandUserTasksMax=100000to/etc/systemd/logind.confEarlier runs experienced a crash after about 27700 iterations. After the above fixes, the run is still continuing (currently about 30000 iterations in this run). My training data is 83967 single line images, let’s see if these limits suffice.
The issue is now fixed in commit 73a1bfc4e881d1ce05067546742bea04f488f811.