tess4j: Other languages can not use except eng
OS X EI Capitan 10.11.1 JDK8_60 test4j 2.0.1 tesseract 3.04.00
i installed tesseraect from brew.
brew reinstall tesseract --all-languages --with-training-tools
tessdata path is /usr/local/share/
and it has chi_sim.traineddata
but when i use tess4j to load chi_sim
, here is code
public class TesseractOCR {
private static Logger logger = LoggerFactory.getLogger(TesseractOCR.class);
//default config
private final static String DEFAULT_TESSDATA_PATH = "/usr/local/share";
private final static String DEFAULT_PAGE_SEG_MODE = "3";
private final static String DEFAULT_LANG = "chi_sim";
public static void main(String[] args) {
Tesseract instance = new Tesseract(); // JNA Interface Mapping
instance.setLanguage(DEFAULT_LANG);
instance.setDatapath(DEFAULT_TESSDATA_PATH);
instance.setPageSegMode(Integer.parseInt(DEFAULT_PAGE_SEG_MODE));
BufferedImage image = Images.from("ocr/data/input/1.png");
String result = "";
try {
result = instance.doOCR(image);
} catch (TesseractException e) {
logger.error("ocr image error!", e);
}
logger.info(result);
}
}
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000012a54e933, pid=3139, tid=5891
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [libtesseract.dylib+0x12933] tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0xb9
#
the jvm crashed. here is log https://gist.github.com/fivesmallq/1f6d349c02e9bbab9b80
eng
is ok.
also, i clone the tess4j project from github. and update junit test to set language chi_sim
, put chi_sim.traineddata
to src/main/resources
, It appeared the same problem.
➜ tessdata git:(master) which tesseract
/usr/local/bin/tesseract
➜ tessdata git:(master) tesseract --list-langs
List of available languages (107):
...
chi_sim
chi_tra
...
➜ ocr tesseract 2.jpg -l chi_sim result
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Warning in pixReadMemJpeg: work-around: writing to a temp file
Detected 56 diacritics
i use tesseract with the command line is ok.
is it not currently does not support tesseract 3.04.00 ?
Thank you
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 33 (3 by maintainers)
@tonydeng you can download
chi_sim
or other languages fromhttps://github.com/tesseract-ocr/tessdata
to your/usr/local/Cellar/tesseract/3.04.01_2/share/tessdata