tesseract: Error: Illegal Parameter specification! with Tesseract4Alpha
After upgrade to Tesseract-4-Alpha, I found this error making the OCR from my JAVA code:
ITesseract instance = new Tesseract(); instance.setDatapath("/usr/share/tessdata/"); instance.setLanguage("spa"); (...) result = instance.doOCR(imageFile);
Environment
- Tesseract Version: tesseract 4.00.00alpha
- Leptonica Version: leptonica-1.74.4
- Platform: CentOS 6.7
- Server: Wildfly 10.1
Current Behavior:
Error: Illegal Parameter specification! “Fatal error encountered!” == NULL:Error:Assert failed:in file globaloc.cpp, line 75
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00007ff1b3098549, pid=25091, tid=0x00007ff29d7d7700
JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13) Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops) Problematic frame: C [libtesseract.so+0x26f549] ERRCODE::error(char const*, TessErrorLogCode, char const*, …) const+0x129
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again
An error report file with more information is saved as: /opt/wildfly/wildfly-10.1.0.Final/hs_err_pid25091.log
If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp The crash happened outside the Java Virtual Machine in native code. See problematic frame for where to report the bug.
*** JBossAS process (25091) received ABRT signal ***
Suggested Fix:
Any idea?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 35 (5 by maintainers)
I dug into Tesseract’s code and found that the string “Illegal Parameter specification” only exists in one place, namely in the file classify/clusttool.cpp. After some debugging I realised that the function ReadParamDesc() calls sscanf() at line 82 (for git commit hash 2b854e3749d62012787dd4160fc30e86603cc540), which is locale dependent. It fails since the numeric input (two floating point values) are written with dots (example: 1.23), but using a different locale other than en_US for LC_NUMERIC may cause sscanf() to expect other characters, like commas (1,23).
With other words, the error is in tesseract, assuming a locale. It should rather be set explicitly. The workaround is to set LC_NUMERIC=en_US.UTF-8.
tess4j’s master branch is for Tesseract 4.0alpha and includes the latest Tesseract 4.0alpha Windows binary. All of its unit tests passed on Windows 10. We have not tested on Linux OS yet.
Since you link against Leptonica 1.74.4, make sure you use lept4j-1.6.0.