tesseract: Inconsistent error message when eng.traineddata not found
# TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata
# echo $TESSDATA_PREFIX
/usr/share/tesseract-ocr/tessdata
# tesseract test.jpg test.txt digits
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error opening data file /usr/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Note: there is indeed no eng.traineddata file in /usr/share/tesseract-ocr/tessdata (there are a bunch of other eng.xxx files but no .traineddata), so it is expected to get an error, _but_ the error message says it failed to open _/usr/share/tessdata/eng.traineddata_ while TESSDATA_PREFIX is set to _/usr/share/tesseract-ocr/tessdata_.
So, either the file path in the error message is not the actual path of the file that is not found, or tesseract is ignoring the variable TESSDATA_PREFIX, in which case the error message is wrong when it says “make sure the TESSDATA_PREFIX environment variable…”.
Whichever the case, something is buggy, regardless of the fact that I have no eng.traineddata file anywhere and I don’t expect tesseract to work. I just expect a consistent error message.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 3
- Comments: 33 (9 by maintainers)
Can some one PLEASE just provide a solution to this error message WITHOUT discussing pros and cons back to the stone age?
@nellyonlinux, that is not the right way to do things. If your Tesseract installation for Linux has a
/usr/share/tesseract-ocr/4.00/tessdata, then traineddata files should be in that directory. And don’t setTESSDATA_PREFIX! Tesseract normally knows where to find the data files without that hack.