tesseract: Infinite Loop of Compute CTC targets failed!

Environment

  • Tesseract Version: 4.0.0-beta.1-306-g45b11cd
  • Commit Number: 4.0.0-beta.1-306-g45b11cd
  • Platform: Ubuntu x86_64 GNU/Linux

Current Behavior:

Infinite loop of Compute CTC targets failed

I have a box file and tif images and i run the below script for training.


ALL_BOXES = data/all-boxes
ALL_LSTMF = data/all-lstmf

# Create unicharset
unicharset: data/unicharset

# Create lists of lstmf filenames for training and eval
lists: $(ALL_LSTMF) data/list.train data/list.eval

data/list.train: $(ALL_LSTMF)
	total=`cat $(ALL_LSTMF) | wc -l` \
	   no=`echo "$$total * $(RATIO_TRAIN) / 1" | bc`; \
	   head -n "$$no" $(ALL_LSTMF) > "$@"

data/list.eval: $(ALL_LSTMF)
	total=`cat $(ALL_LSTMF) | wc -l` \
	   no=`echo "($$total - $$total * $(RATIO_TRAIN)) / 1" | bc`; \
	   tail -n "+$$no" $(ALL_LSTMF) > "$@"

# Start training
training: data/$(MODEL_NAME).traineddata

data/unicharset: $(ALL_BOXES)

	combine_tessdata -u $(TESSDATA)/eng.traineddata  $(TESSDATA)/$(MODEL_NAME).
	unicharset_extractor --output_unicharset "$(TRAIN)/my.unicharset" --norm_mode $(NORM_MODE) "$(ALL_BOXES)"
	merge_unicharsets $(TESSDATA)/$(MODEL_NAME).lstm-unicharset $(TRAIN)/my.unicharset  "$@"

$(ALL_BOXES): $(sort $(patsubst %.tif,%.box,$(wildcard $(TRAIN)/*.tif)))
	find $(TRAIN) -name '*.box' -exec cat {} \; > "$@"

#$(TRAIN)/%.box: $(TRAIN)/%.tif $(TRAIN)/%.gt.txt
	#python3 generate_line_box.py -i "$(TRAIN)/$*.tif" -t "$(TRAIN)/$*.gt.txt" > "$@"

$(ALL_LSTMF): $(sort $(patsubst %.tif,%.lstmf,$(wildcard $(TRAIN)/*.tif)))
	find $(TRAIN) -name '*.lstmf' -exec echo {} \; | sort -R -o "$@"

$(TRAIN)/%.lstmf: $(TRAIN)/%.box
	tesseract $(TRAIN)/$*.tif $(TRAIN)/$* --psm $(PSM) lstm.train

# Build the proto model
proto-model: data/$(MODEL_NAME)/$(MODEL_NAME).traineddata

data/$(MODEL_NAME)/$(MODEL_NAME).traineddata: $(LANGDATA) data/unicharset
	combine_lang_model \
	  --input_unicharset data/unicharset \
	  --script_dir $(LANGDATA) \
	  --output_dir data/ \
	  --lang $(MODEL_NAME)

data/checkpoints/$(MODEL_NAME)_checkpoint: unicharset lists proto-model
	mkdir -p data/checkpoints
	lstmtraining \
	  --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \
	  --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head -n1 data/unicharset`]" \
	  --model_output data/checkpoints/$(MODEL_NAME) \
	  --learning_rate 20e-4 \
	  --train_listfile data/list.train \
	  --eval_listfile data/list.eval \
	  --max_iterations 10000

Here is the Logs including the error .

find data/train -name '*.box' -exec cat {} \; > "data/all-boxes"
#python3 generate_line_box.py -i "data/train/.tif" -t "data/train/.gt.txt" > "data/all-boxes"
combine_tessdata -u /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/eng.traineddata  /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.
Version string:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
1:unicharset:size=7477, offset=192
2:unicharambigs:size=1047, offset=7669
3:inttemp:size=976552, offset=8716
4:pffmtable:size=844, offset=985268
5:normproto:size=13408, offset=986112
6:punc-dawg:size=4322, offset=999520
7:word-dawg:size=1082890, offset=1003842
8:number-dawg:size=6426, offset=2086732
9:freq-dawg:size=1410, offset=2093158
13:shapetable:size=63346, offset=2094568
14:bigram-dawg:size=16109842, offset=2157914
17:lstm:size=1487588, offset=18267756
18:lstm-punc-dawg:size=4322, offset=19755344
19:lstm-word-dawg:size=3694794, offset=19759666
20:lstm-number-dawg:size=4738, offset=23454460
21:lstm-unicharset:size=6360, offset=23459198
22:lstm-recoder:size=1012, offset=23465558
23:version:size=80, offset=23466570
Extracting tessdata components from /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/eng.traineddata
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.unicharset
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.unicharambigs
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.inttemp
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.pffmtable
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.normproto
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.punc-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.word-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.number-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.freq-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.shapetable
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.bigram-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-punc-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-word-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-number-dawg
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-recoder
Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.version
unicharset_extractor --output_unicharset "data/train/my.unicharset" --norm_mode 2 "data/all-boxes"
Extracting unicharset from box file data/all-boxes
Other case f of F is not in unicharset
Other case d of D is not in unicharset
Other case h of H is not in unicharset
Other case z of Z is not in unicharset
Other case k of K is not in unicharset
Other case w of W is not in unicharset
Other case v of V is not in unicharset
Other case j of J is not in unicharset
Other case b of B is not in unicharset
Other case q of Q is not in unicharset
Wrote unicharset file data/train/my.unicharset
merge_unicharsets /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset data/train/my.unicharset  "data/unicharset"
Loaded unicharset of size 112 from file /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset
Loaded unicharset of size 62 from file data/train/my.unicharset
Wrote unicharset file data/unicharset.
tesseract data/train/10_0.tif data/train/10_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/11_0.tif data/train/11_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/12_0.tif data/train/12_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/13_0.tif data/train/13_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/14_0.tif data/train/14_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/15_0.tif data/train/15_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/16_0.tif data/train/16_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/17_0.tif data/train/17_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/18_0.tif data/train/18_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/19_0.tif data/train/19_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/1_0.tif data/train/1_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/20_0.tif data/train/20_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/21_0.tif data/train/21_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/22_0.tif data/train/22_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/23_0.tif data/train/23_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/24_0.tif data/train/24_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/25_0.tif data/train/25_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/26_0.tif data/train/26_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/27_0.tif data/train/27_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/28_0.tif data/train/28_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/29_0.tif data/train/29_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/2_0.tif data/train/2_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/30_0.tif data/train/30_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/31_0.tif data/train/31_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/32_0.tif data/train/32_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/33_0.tif data/train/33_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/3_0.tif data/train/3_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/4_0.tif data/train/4_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/5_0.tif data/train/5_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/6_0.tif data/train/6_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/7_0.tif data/train/7_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/8_0.tif data/train/8_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
tesseract data/train/9_0.tif data/train/9_0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Warning. Invalid resolution 0 dpi. Using 70 instead.
find data/train -name '*.lstmf' -exec echo {} \; | sort -R -o "data/all-lstmf"
total=`cat data/all-lstmf | wc -l` \
   no=`echo "$total * 0.90 / 1" | bc`; \
   head -n "$no" data/all-lstmf > "data/list.train"
total=`cat data/all-lstmf | wc -l` \
   no=`echo "($total - $total * 0.90) / 1" | bc`; \
   tail -n "+$no" data/all-lstmf > "data/list.eval"
combine_lang_model \
  --input_unicharset data/unicharset \
  --script_dir /mnt/Training_Tesseract/ocrd-train/langdata-master \
  --output_dir data/ \
  --lang Invoice
Loaded unicharset of size 112 from file data/unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /mnt/Training_Tesseract/ocrd-train/langdata-master/Invoice/Invoice.config
Null char=2
mkdir -p data/checkpoints
lstmtraining \
  --traineddata data/Invoice/Invoice.traineddata \
  --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head -n1 data/unicharset`]" \
  --model_output data/checkpoints/Invoice \
  --learning_rate 20e-4 \
  --train_listfile data/list.train \
  --eval_listfile data/list.eval \
  --max_iterations 10000
Warning: given outputs 112 not equal to unicharset of 111.
Num outputs,weights in Series:
  1,36,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx256:256, 361472
  Fc111:111, 28527
Total weights = 532431
Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc111] from request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c112]
Training parameters:
  Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5
null char=110
Loaded 1/1 pages (1-1) of document data/train/14_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/22_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/8_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/24_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/33_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/2_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/17_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/20_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/12_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/33_0.lstmf
Loaded 1/1 pages (1-1) of document data/train/24_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/13_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/21_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/11_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/31_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/16_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/6_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/25_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/5_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/26_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/18_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/7_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/9_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/30_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/32_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/29_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/28_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/1_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/15_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/27_0.lstmf
Compute CTC targets failed!
Loaded 1/1 pages (1-1) of document data/train/19_0.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!

I have added a sample for the training data train.zip

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19

Most upvoted comments

Please ask ocrd-train related questions in their repo.

Your issue of Infinite Loop of Compute CTC targets failed! was because of missing tabs.

So, please close this issue.

On Thu, Aug 23, 2018 at 3:47 AM Ahmed Osama notifications@github.com wrote:

I applied it line by line for example . I calculate the coordinates word by word in each line and break line by \t

Is this valid or i need to break page into lines and apply the script .

for line in lines words = line.find_all(‘span’,class_=‘ocrx_word’) for word in words : for character in word.get_text(): if character: outputFile.write(u"%s %d %d %d %d 0 \n" % (character,bbox[0], bbox[1], bbox[2], bbox[3])) #print (word.get_text() , words[-1].get_text() ) if word.get_text() != words[-1].get_text() : outputFile.write(u"%s %d %d %d %d 0 \n" % (" “, bbox[0], bbox[1], bbox[2], bbox[3])) outputFile.write(u”%s %d %d %d %d 0 \n" % (“\t”, int(bbox[2]), int (bbox[3]), int(bbox[2])+1, int(bbox[3])+1))

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/1848#issuecomment-415199825, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_oxEOxVxfVYlrh72BPR6gzEyyKTY2ks5uTdh0gaJpZM4WBYf7 .


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com