tess4j: Other languages can not use except eng

OS X EI Capitan 10.11.1 JDK8_60 test4j 2.0.1 tesseract 3.04.00

i installed tesseraect from brew.

brew reinstall tesseract --all-languages --with-training-tools

tessdata path is /usr/local/share/ and it has chi_sim.traineddata image

but when i use tess4j to load chi_sim, here is code

public class TesseractOCR {
    private static Logger logger = LoggerFactory.getLogger(TesseractOCR.class);

    //default config
    private final static String DEFAULT_TESSDATA_PATH = "/usr/local/share";
    private final static String DEFAULT_PAGE_SEG_MODE = "3";
    private final static String DEFAULT_LANG = "chi_sim";

    public static void main(String[] args) {
        Tesseract instance = new Tesseract();  // JNA Interface Mapping
        instance.setLanguage(DEFAULT_LANG);
        instance.setDatapath(DEFAULT_TESSDATA_PATH);
        instance.setPageSegMode(Integer.parseInt(DEFAULT_PAGE_SEG_MODE));
        BufferedImage image = Images.from("ocr/data/input/1.png");
        String result = "";
        try {
            result = instance.doOCR(image);
        } catch (TesseractException e) {
            logger.error("ocr image error!", e);
        }
        logger.info(result);
    }
}
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000012a54e933, pid=3139, tid=5891
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.dylib+0x12933]  tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0xb9
#

the jvm crashed. here is log https://gist.github.com/fivesmallq/1f6d349c02e9bbab9b80

eng is ok.


also, i clone the tess4j project from github. and update junit test to set language chi_sim, put chi_sim.traineddata to src/main/resources, It appeared the same problem.

➜  tessdata git:(master) which tesseract
/usr/local/bin/tesseract
➜  tessdata git:(master) tesseract --list-langs
List of available languages (107):
...
chi_sim
chi_tra
...
➜  ocr  tesseract 2.jpg -l chi_sim result
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Warning in pixReadMemJpeg: work-around: writing to a temp file
Detected 56 diacritics

i use tesseract with the command line is ok.


is it not currently does not support tesseract 3.04.00 ?

Thank you

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 33 (3 by maintainers)

Most upvoted comments

@tonydeng you can download chi_sim or other languages from https://github.com/tesseract-ocr/tessdata to your /usr/local/Cellar/tesseract/3.04.01_2/share/tessdata