tesseract: Tesseract doesn't recognize multiple languages

If I were to run tesseract page356.png page356 -l eng+osd+ell pdf

It would only recognize the English characters, but produce no errors about other language recognition

If I run tesseract page356.png page356greek -l ell

It recognizes the Greek fine, but now there is no English

If I run tesseract page356.png greekandenglish356 -l ell+eng+osd pdf I get this pdf greekandenglish356.pdf

only recognizes English

I ran apt-get install tesseract-ocr-all

and I’m experiencing this issue on multiple linux distros

Here is a sample image 1200_page_356

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 36 (6 by maintainers)

Most upvoted comments

Also, try with the script trained data

https://github.com/tesseract-ocr/tessdata_best/blob/master/script/Greek.traineddata

It should have both Greek and English.

Multiple languages are supported on v3.x