tesserocr: OSX: RuntimeError: Failed to init API, possibly an invalid tessdata path
on OSX, I’m getting error when using other language. Here are all info I can get. Do you have any idea why this fails?
- PIP list
Pillow (5.1.0)
tesserocr (2.2.2)
- tesseract --version
# installed by brew install tesseract --with-all-languages
tesseract 3.05.01
leptonica-1.75.3
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
- test.py and output
import tesserocr
from PIL import Image
print(tesserocr.tesseract_version())
print(tesserocr.get_languages())
image = Image.open('DSCF1896.jpg')
print(tesserocr.image_to_text(image, lang='kor'))
- output of test.py
tesseract 3.05.01
leptonica-1.75.3
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
('/usr/local/Cellar/tesseract/3.05.01/share/tessdata/', ['ori', 'por', 'srp', 'hin', 'chi_sim', 'spa', 'uzb_cyrl', 'mar', 'swa', 'ces', 'urd', 'nep', 'cat', 'mya', 'lit', 'dan', 'mlt', 'enm', 'bod', 'tir', 'tgl', 'tha', 'fas', 'hrv', 'ukr', 'lao', 'ben', 'eus', 'eng', 'dzo', 'nld', 'vie', 'ita', 'kir', 'pus', 'msa', 'heb', 'slv', 'kaz', 'fin', 'yid', 'deu', 'bul', 'khm', 'ell', 'cym', 'kor', 'slk_frak', 'lav', 'mkd', 'glg', 'sin', 'syr', 'rus', 'kat', 'frk', 'kur', 'bos', 'ind', 'swe', 'est', 'iku', 'sqi', 'nor', 'pol', 'tam', 'mal', 'slk', 'jav', 'srp_latn', 'osd', 'afr', 'hat', 'gle', 'ron', 'kan', 'uig', 'lat', 'ita_old', 'frm', 'equ', 'tgk', 'kat_old', 'spa_old', 'uzb', 'dan_frak', 'hun', 'aze_cyrl', 'isl', 'grc', 'aze', 'asm', 'pan', 'epo', 'chi_tra', 'tel', 'deu_frak', 'amh', 'chr', 'guj', 'ara', 'san', 'fra', 'tur', 'jpn', 'ceb', 'bel'])
Traceback (most recent call last):
File "test.py", line 13, in <module>
print(tesserocr.image_to_text(image, lang='kor', path=cpath))
File "tesserocr.pyx", line 2400, in tesserocr.image_to_text
RuntimeError: Failed to init API, possibly an invalid tessdata path: /usr/local/Cellar/tesseract/3.05.01/share/
- training data
ls /usr/local/Cellar/tesseract/3.05.01/share/tessdata/
afr.traineddata dan_frak.traineddata fra.cube.word-freq ita.cube.nn nep.traineddata spa.cube.params
amh.traineddata deu.traineddata fra.tesseract_cube.nn ita.cube.params nld.traineddata spa.cube.size
ara.cube.bigrams deu_frak.traineddata fra.traineddata ita.cube.size nor.traineddata spa.cube.word-freq
ara.cube.fold dzo.traineddata frk.traineddata ita.cube.word-freq ori.traineddata spa.traineddata
ara.cube.lm ell.traineddata frm.traineddata ita.tesseract_cube.nn osd.traineddata spa_old.traineddata
ara.cube.nn eng.cube.bigrams gle.traineddata ita.traineddata pan.traineddata sqi.traineddata
ara.cube.params eng.cube.fold glg.traineddata ita_old.traineddata pdf.ttf srp.traineddata
ara.cube.size eng.cube.lm grc.traineddata jav.traineddata pol.traineddata srp_latn.traineddata
ara.cube.word-freq eng.cube.nn guj.traineddata jpn.traineddata por.traineddata swa.traineddata
ara.traineddata eng.cube.params hat.traineddata kan.traineddata pus.traineddata swe.traineddata
asm.traineddata eng.cube.size heb.traineddata kat.traineddata ron.traineddata syr.traineddata
aze.traineddata eng.cube.word-freq hin.cube.bigrams kat_old.traineddata rus.cube.fold tam.traineddata
aze_cyrl.traineddata eng.tesseract_cube.nn hin.cube.fold kaz.traineddata rus.cube.lm tel.traineddata
bel.traineddata eng.traineddata hin.cube.lm khm.traineddata rus.cube.nn tessconfigs
ben.traineddata enm.traineddata hin.cube.nn kir.traineddata rus.cube.params tgk.traineddata
bod.traineddata epo.traineddata hin.cube.params kor.traineddata rus.cube.size tgl.traineddata
bos.traineddata equ.traineddata hin.cube.word-freq kur.traineddata rus.cube.word-freq tha.traineddata
bul.traineddata est.traineddata hin.tesseract_cube.nn lao.traineddata rus.traineddata tir.traineddata
cat.traineddata eus.traineddata hin.traineddata lat.traineddata san.traineddata tur.traineddata
ceb.traineddata fas.traineddata hrv.traineddata lav.traineddata sin.traineddata uig.traineddata
ces.traineddata fin.traineddata hun.traineddata lit.traineddata slk.traineddata ukr.traineddata
chi_sim.traineddata fra.cube.bigrams iku.traineddata mal.traineddata slk_frak.traineddata urd.traineddata
chi_tra.traineddata fra.cube.fold ind.traineddata mar.traineddata slv.traineddata uzb.traineddata
chr.traineddata fra.cube.lm isl.traineddata mkd.traineddata spa.cube.bigrams uzb_cyrl.traineddata
configs fra.cube.nn ita.cube.bigrams mlt.traineddata spa.cube.fold vie.traineddata
cym.traineddata fra.cube.params ita.cube.fold msa.traineddata spa.cube.lm yid.traineddata
dan.traineddata fra.cube.size ita.cube.lm mya.traineddata spa.cube.nn
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (3 by maintainers)
Commits related to this issue
- Update README.rst Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these issues and get from my own exp... — committed to flip111/tesserocr by flip111 6 years ago
- Update README.rst (#131) * Update README.rst Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these ... — committed to sirfz/tesserocr by flip111 3 years ago
- Update README.rst (#131) * Update README.rst Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these ... — committed to softdev050/tesserocr by softdev050 3 years ago
- Update README.rst (#131) * Update README.rst Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these ... — committed to sayjun0505/tesserocr by sayjun0505 3 years ago
I met the same issue. And I found run at Eclipse environment will be OK. What’s different between run @Eclipse and run @Terminal ?
test 1: lang = ‘eng’ is OK but, lang = ‘chi_sim’ will meet this issue, and still OK @Eclipse environment. How ?
test 2: rename eng.traineddata to chi_sim.traineddata , then , test it again It’s OK so, this maybe caused by the download chi_sim.traineddata ? how to fix it?
finally, I got the solution for python:
add below code on your python code.
And, if you have two or more version of tesseract you do need set ‘TESSDATA_PREFIX’ to the proper one.
thank you, I’ve solved this problem by a hard way Switch to the folder to:C:\Program Files\Python36\Lib\site-packages\tesserocr `import tesserocr import os from PIL import Image
os.chdir(r"C:\Program Files\Python36\Lib\site-packages\tesserocr") image = Image.open(‘image.png’) print(tesserocr.image_to_text(image)) `
But,something tough are: Every time you have to switch the directory to tesserocr.