tesseract: tesseract process never finishes with specific gif image

Environment

tesseract 4.1.1

reproduced on macosx and linux

uname -a
Darwin VL-C02WL1AYHTD6 19.6.0 Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64 x86_64
Linux ocr-5b7bf86f6-f6qsd 5.4.65-wix #1 SMP Thu Nov 19 15:24:12 UTC 2020 x86_64 GNU/Linux

Current Behavior:

running tesseract in command line on this image https://bentkus.eu/ocr_while_true.gif does not finish after 1h

tesseract ocr_while_true.gif ocr_while_true --dpi 150

Expected Behavior:

process should finish in 2 minutes

Suggested Fix:

I’ll try to build and see why it never stops

upd. (by @egorpugin): test png - https://bentkus.eu/ocr_while_loop.png

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 37 (21 by maintainers)

Most upvoted comments

I now have run latest Tesseract production code on the original animated GIF image. The image is processed, and Tesseract returns a “result” for the first included image. This takes 4:26 minutes, so it finishes, but takes rather long for an image which looks empty for me but obviously includes lots of small colour variations (otherwise the PNG file would be much smaller).

How should Tesseract handle animated GIF images? Create OCR for all images, or only for the first one, or refuse to process such files?

My answer is ‘Create OCR for all images’