tesseract: user_words_suffix not working

We are trying to provide a user words file via available control params. Unfortunately I am getting below error -


Environment

  • Tesseract Version: tesseract 4.00.00alpha leptonica-1.74.4 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.1) : libpng 1.6.34 : libtiff 4.0.8 : zlib 1.2.11 : libwebp 0.6.0 : libopenjp2 2.2.0 Found AVX Found SSE
  • Platform: Linux 9cadf37d2e9c 4.9.49-moby #1 SMP Wed Sep 27 23:17:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Current Behavior:

Using below params to supply user words file -

tesseract --user_words_file /usr/share/tesseract-ocr/4.00/tessdata/eng.user-words    -psm=1 -l=eng source.ppm res11 txt 

I am getting error as -

read_params_file: parameter not found: P6

Is this supported in above tesseract version? I can see the support is mentioned in the help

user_words_file		A filename of user-provided words.
user_words_suffix            A suffix of user-provided words located in tessdata.
user_patterns_file            A filename of user-provided patterns.
user_patterns_suffix        A suffix of user-provided patterns located in tessdata.

Please note I have tried all possible options to supply the file - user_words | user_words_file | user_words_suffix | user_patterns_file | user_patterns_suffix

Please suggest the right way to achieve the same.

Thanks,

  • Dev

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 20 (1 by maintainers)

Most upvoted comments

Ok, I added the lines to Dict::LoadLSTM and it works, with the the following:

# tesseract test.png stdout --tessdata-dir ./tessdata --oem 0 -c page_separator=''
Dnline

# tesseract test.png stdout --tessdata-dir ./tessdata --oem 0 -c page_separator='' bazaar
Online


test

Dnline is corrected to Online

EDIT: While it works for this particular image, haven’t got it to work with others yet.

I now need to find an image that does not work with tessdata_best and tessdata_fast in order to test further.

read_params_file: Can’t open 6

Use--psm 6instead of -psm 6

–oem 0 is supposed to use Dict::Load().

You are right. Haven’t got it to work with --oem 1 yet.

@Shreeshrii Is that still the case?

The user_words file is just a hint given to the OCR engine.

-psm=1 -l=eng

=>

--psm 1 -l eng