tesserocr: OSX: RuntimeError: Failed to init API, possibly an invalid tessdata path

on OSX, I’m getting error when using other language. Here are all info I can get. Do you have any idea why this fails?

  • PIP list
Pillow (5.1.0)
tesserocr (2.2.2)
  • tesseract --version
# installed by brew install tesseract --with-all-languages
tesseract 3.05.01
 leptonica-1.75.3
  libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
  • test.py and output
import tesserocr
from PIL import Image

print(tesserocr.tesseract_version())
print(tesserocr.get_languages())
image = Image.open('DSCF1896.jpg')
print(tesserocr.image_to_text(image, lang='kor'))
  • output of test.py
tesseract 3.05.01
 leptonica-1.75.3
  libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11

('/usr/local/Cellar/tesseract/3.05.01/share/tessdata/', ['ori', 'por', 'srp', 'hin', 'chi_sim', 'spa', 'uzb_cyrl', 'mar', 'swa', 'ces', 'urd', 'nep', 'cat', 'mya', 'lit', 'dan', 'mlt', 'enm', 'bod', 'tir', 'tgl', 'tha', 'fas', 'hrv', 'ukr', 'lao', 'ben', 'eus', 'eng', 'dzo', 'nld', 'vie', 'ita', 'kir', 'pus', 'msa', 'heb', 'slv', 'kaz', 'fin', 'yid', 'deu', 'bul', 'khm', 'ell', 'cym', 'kor', 'slk_frak', 'lav', 'mkd', 'glg', 'sin', 'syr', 'rus', 'kat', 'frk', 'kur', 'bos', 'ind', 'swe', 'est', 'iku', 'sqi', 'nor', 'pol', 'tam', 'mal', 'slk', 'jav', 'srp_latn', 'osd', 'afr', 'hat', 'gle', 'ron', 'kan', 'uig', 'lat', 'ita_old', 'frm', 'equ', 'tgk', 'kat_old', 'spa_old', 'uzb', 'dan_frak', 'hun', 'aze_cyrl', 'isl', 'grc', 'aze', 'asm', 'pan', 'epo', 'chi_tra', 'tel', 'deu_frak', 'amh', 'chr', 'guj', 'ara', 'san', 'fra', 'tur', 'jpn', 'ceb', 'bel'])
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    print(tesserocr.image_to_text(image, lang='kor', path=cpath))
  File "tesserocr.pyx", line 2400, in tesserocr.image_to_text
RuntimeError: Failed to init API, possibly an invalid tessdata path: /usr/local/Cellar/tesseract/3.05.01/share/
  • training data
ls /usr/local/Cellar/tesseract/3.05.01/share/tessdata/    
afr.traineddata       dan_frak.traineddata  fra.cube.word-freq    ita.cube.nn           nep.traineddata       spa.cube.params
amh.traineddata       deu.traineddata       fra.tesseract_cube.nn ita.cube.params       nld.traineddata       spa.cube.size
ara.cube.bigrams      deu_frak.traineddata  fra.traineddata       ita.cube.size         nor.traineddata       spa.cube.word-freq
ara.cube.fold         dzo.traineddata       frk.traineddata       ita.cube.word-freq    ori.traineddata       spa.traineddata
ara.cube.lm           ell.traineddata       frm.traineddata       ita.tesseract_cube.nn osd.traineddata       spa_old.traineddata
ara.cube.nn           eng.cube.bigrams      gle.traineddata       ita.traineddata       pan.traineddata       sqi.traineddata
ara.cube.params       eng.cube.fold         glg.traineddata       ita_old.traineddata   pdf.ttf               srp.traineddata
ara.cube.size         eng.cube.lm           grc.traineddata       jav.traineddata       pol.traineddata       srp_latn.traineddata
ara.cube.word-freq    eng.cube.nn           guj.traineddata       jpn.traineddata       por.traineddata       swa.traineddata
ara.traineddata       eng.cube.params       hat.traineddata       kan.traineddata       pus.traineddata       swe.traineddata
asm.traineddata       eng.cube.size         heb.traineddata       kat.traineddata       ron.traineddata       syr.traineddata
aze.traineddata       eng.cube.word-freq    hin.cube.bigrams      kat_old.traineddata   rus.cube.fold         tam.traineddata
aze_cyrl.traineddata  eng.tesseract_cube.nn hin.cube.fold         kaz.traineddata       rus.cube.lm           tel.traineddata
bel.traineddata       eng.traineddata       hin.cube.lm           khm.traineddata       rus.cube.nn           tessconfigs
ben.traineddata       enm.traineddata       hin.cube.nn           kir.traineddata       rus.cube.params       tgk.traineddata
bod.traineddata       epo.traineddata       hin.cube.params       kor.traineddata       rus.cube.size         tgl.traineddata
bos.traineddata       equ.traineddata       hin.cube.word-freq    kur.traineddata       rus.cube.word-freq    tha.traineddata
bul.traineddata       est.traineddata       hin.tesseract_cube.nn lao.traineddata       rus.traineddata       tir.traineddata
cat.traineddata       eus.traineddata       hin.traineddata       lat.traineddata       san.traineddata       tur.traineddata
ceb.traineddata       fas.traineddata       hrv.traineddata       lav.traineddata       sin.traineddata       uig.traineddata
ces.traineddata       fin.traineddata       hun.traineddata       lit.traineddata       slk.traineddata       ukr.traineddata
chi_sim.traineddata   fra.cube.bigrams      iku.traineddata       mal.traineddata       slk_frak.traineddata  urd.traineddata
chi_tra.traineddata   fra.cube.fold         ind.traineddata       mar.traineddata       slv.traineddata       uzb.traineddata
chr.traineddata       fra.cube.lm           isl.traineddata       mkd.traineddata       spa.cube.bigrams      uzb_cyrl.traineddata
configs               fra.cube.nn           ita.cube.bigrams      mlt.traineddata       spa.cube.fold         vie.traineddata
cym.traineddata       fra.cube.params       ita.cube.fold         msa.traineddata       spa.cube.lm           yid.traineddata
dan.traineddata       fra.cube.size         ita.cube.lm           mya.traineddata       spa.cube.nn

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (3 by maintainers)

Commits related to this issue

Most upvoted comments

I met the same issue. And I found run at Eclipse environment will be OK. What’s different between run @Eclipse and run @Terminal ?


test 1: lang = ‘eng’ is OK but, lang = ‘chi_sim’ will meet this issue, and still OK @Eclipse environment. How ?


test 2: rename eng.traineddata to chi_sim.traineddata , then , test it again It’s OK so, this maybe caused by the download chi_sim.traineddata ? how to fix it?


finally, I got the solution for python:

add below code on your python code.

import locale locale.setlocale(locale.LC_ALL, “C”)


And, if you have two or more version of tesseract you do need set ‘TESSDATA_PREFIX’ to the proper one.

thank you, I’ve solved this problem by a hard way Switch to the folder to:C:\Program Files\Python36\Lib\site-packages\tesserocr `import tesserocr import os from PIL import Image

os.chdir(r"C:\Program Files\Python36\Lib\site-packages\tesserocr") image = Image.open(‘image.png’) print(tesserocr.image_to_text(image)) `

But,something tough are: Every time you have to switch the directory to tesserocr.