tesseract: Error in Training Tesseract LSTM 4.0
Hello All,
I want to train tesseract as , But I got this error, Any help
tesstrain.sh --fonts_dir /usr/share/fonts --lang ara --training_text ../../tesserac-ocr/langdata/ara.training_text --langdata_dir ../langdata --tessdata_dir ./tessdata --fontlist "Arial" --output_dir ~/tesstutorial/aratest
=== Starting training for language 'ara'
[ر أبر 12 07:48:58 EET 2017] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Arial --outputbase=/tmp/font_tmp.Rj3QkZFztb/sample_text.txt --text=/tmp/font_tmp.Rj3QkZFztb/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.Rj3QkZFztb
Rendered page 0 to file /tmp/font_tmp.Rj3QkZFztb/sample_text.txt.tif
=== Phase I: Generating training images ===
Rendering using Arial
[ر أبر 12 07:49:17 EET 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Rj3QkZFztb --fonts_dir=/usr/share/fonts --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.XBcy4TVQwb/ara/ara.Arial.exp0 --font=Arial --text=../../tesserac-ocr/langdata/ara.training_text
ERROR: Non-existent flag --fontconfig_refresh_config_file=false
ERROR: /tmp/tmp.XBcy4TVQwb/ara/ara.Arial.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.XBcy4TVQwb/ara/ara.Arial.exp0.box does not exist or is not readable
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 49
Put Arabic langdata files under ara folder in langdata - similar to https://github.com/tesseract-ocr/langdata
so that you have
./langdata ./langdata/ara ./tessdata ./tesseract ./tesseract/tessdata ./tesseract/training
etc