tesseract: wrong coordinates in .box file with LSTM

While i run tesseract with LSTM then coordinates in box file look bad (oem=2). However the same code with oem=0 look fine, but ocr resoult is less accuracy even if I have fully cleared images before processing in high resolution (see images below).

my example code: "C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" --tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata" -l pol --oem 2 --psm 6 -c tessedit_create_boxfile=1 -c tessedit_create_hocr=1 -c tessedit_create_tsv=1 -c tessedit_create_txt=1 "D:\x\ClearedText\tesseract\oem0_psm6_20180114221528\fl.txt" "D:\x\ClearedText\tesseract\oem0_psm6_20180114221528\tess"

platform: W7U x64 tesseract v4.00.00a

111

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 47 (6 by maintainers)

Most upvoted comments

@amitm02 Please see the thread at https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271870748 for how Arabic and other RTL languages are handled.

when I try to use best or fast then i got error:

lstm_recognizer_->DeSerialize(&fp):Error:Assert failed:in file …/…/…/…/ccmain/tessedit.cpp, line 193

Use the latest code in the master.