tesseract: Tesseract 4.0 hangs when processing a particular image
Environment
- Tesseract Version: tesseract 4.0.0-beta.1 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
- Platform: Ubuntu 18.04.1 LTS
Current Behavior:
hangs when running the following command:
tesseract failed-image.jpeg output.txt
output message:
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 207
Tesseract does not stop nor give any message after that. other images work fine, i only have trouble processing this particular image. I have found that the image after processed by tesseract (or leptonica?) is weird, dont know if it is related.
failed-image.jpeg: https://drive.google.com/open?id=1HsgCbtuNpgf_XxzjkekXU9-uuiWDsV0H tessinput.tif: https://drive.google.com/open?id=1sE8Nn5rykSWPT6PMF3nFSonPMT9y-H61
Expected Behavior:
Tesseract should either give an error message or finish ocr on the image even if the image quality is bad.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 18 (7 by maintainers)
Commits related to this issue
- Upgrade Docker image to Alpin 3.11 This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: https://github.com/tesseract-ocr/tesseract/issues/2288#i... — committed to languitar/paperless by languitar 4 years ago
- Upgrade Docker image to Alpine 3.11 This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: https://github.com/tesseract-ocr/tesseract/issues/2288#... — committed to languitar/paperless by languitar 4 years ago
- Upgrade Docker image to Alpine 3.11 This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288 — committed to languitar/paperless by languitar 4 years ago
- Upgrade Docker image to Alpine 3.11 This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288 — committed to languitar/paperless by languitar 4 years ago
- Upgrade Docker image to Alpine 3.11 This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288 — committed to BastianPoe/paperless by languitar 4 years ago
@saikalyan9981 Works fine with current code from repo. Time taken is different based on the traineddata file being used.