tesseract: wrong coordinates in .box file with LSTM
While i run tesseract with LSTM then coordinates in box file look bad (oem=2). However the same code with oem=0 look fine, but ocr resoult is less accuracy even if I have fully cleared images before processing in high resolution (see images below).
my example code:
"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" --tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata" -l pol --oem 2 --psm 6 -c tessedit_create_boxfile=1 -c tessedit_create_hocr=1 -c tessedit_create_tsv=1 -c tessedit_create_txt=1 "D:\x\ClearedText\tesseract\oem0_psm6_20180114221528\fl.txt" "D:\x\ClearedText\tesseract\oem0_psm6_20180114221528\tess"
platform: W7U x64 tesseract v4.00.00a

About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 47 (6 by maintainers)
@amitm02 Please see the thread at https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271870748 for how Arabic and other RTL languages are handled.
Use the latest code in the master.