tesseract: Error in Training Tesseract LSTM 4.0

Hello All,

I want to train tesseract as , But I got this error, Any help

 tesstrain.sh --fonts_dir /usr/share/fonts --lang ara  --training_text ../../tesserac-ocr/langdata/ara.training_text   --langdata_dir ../langdata --tessdata_dir ./tessdata   --fontlist "Arial"   --output_dir ~/tesstutorial/aratest
=== Starting training for language 'ara'
[ر أبر 12 07:48:58 EET 2017] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Arial --outputbase=/tmp/font_tmp.Rj3QkZFztb/sample_text.txt --text=/tmp/font_tmp.Rj3QkZFztb/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.Rj3QkZFztb
Rendered page 0 to file /tmp/font_tmp.Rj3QkZFztb/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using Arial
[ر أبر 12 07:49:17 EET 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Rj3QkZFztb --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.XBcy4TVQwb/ara/ara.Arial.exp0 --font=Arial --text=../../tesserac-ocr/langdata/ara.training_text
ERROR: Non-existent flag --fontconfig_refresh_config_file=false
ERROR: /tmp/tmp.XBcy4TVQwb/ara/ara.Arial.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.XBcy4TVQwb/ara/ara.Arial.exp0.box does not exist or is not readable

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 49

Most upvoted comments

Put Arabic langdata files under ara folder in langdata - similar to https://github.com/tesseract-ocr/langdata

so that you have

./langdata ./langdata/ara ./tessdata ./tesseract ./tesseract/tessdata ./tesseract/training

etc