tesseract: Inconsistent error message when eng.traineddata not found

# TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata

# echo $TESSDATA_PREFIX
/usr/share/tesseract-ocr/tessdata

# tesseract test.jpg test.txt digits
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error opening data file /usr/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Note: there is indeed no eng.traineddata file in /usr/share/tesseract-ocr/tessdata (there are a bunch of other eng.xxx files but no .traineddata), so it is expected to get an error, _but_ the error message says it failed to open _/usr/share/tessdata/eng.traineddata_ while TESSDATA_PREFIX is set to _/usr/share/tesseract-ocr/tessdata_. So, either the file path in the error message is not the actual path of the file that is not found, or tesseract is ignoring the variable TESSDATA_PREFIX, in which case the error message is wrong when it says “make sure the TESSDATA_PREFIX environment variable…”.

Whichever the case, something is buggy, regardless of the fact that I have no eng.traineddata file anywhere and I don’t expect tesseract to work. I just expect a consistent error message.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 3
  • Comments: 33 (9 by maintainers)

Most upvoted comments

Can some one PLEASE just provide a solution to this error message WITHOUT discussing pros and cons back to the stone age?

@nellyonlinux, that is not the right way to do things. If your Tesseract installation for Linux has a /usr/share/tesseract-ocr/4.00/tessdata, then traineddata files should be in that directory. And don’t set TESSDATA_PREFIX! Tesseract normally knows where to find the data files without that hack.