tesseract: Warning. Invalid resolution 0 dpi. Using 70 instead.
command tesseract https://image.ibb.co/eibzaT/test.png result
Current Behavior:
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 161
Estimating resolution as 161
version
tesseract 4.0.0-beta.2-313-g29f2
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
Found AVX
Found SSE
original image https://image.ibb.co/eibzaT/test.png
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 3
- Comments: 37 (15 by maintainers)
Commits related to this issue
- Copy resolution of source image (fix issue #1702) Signed-off-by: Stefan Weil <sw@weilnetz.de> — committed to tesseract-ocr/tesseract by stweil 5 years ago
- Copy resolution of source image (fix issue #1702) Signed-off-by: Stefan Weil <sw@weilnetz.de> — committed to tesseract-ocr/tesseract by stweil 5 years ago
There is an undocumented command line option. Try using
--dpi 300(or the correct value for your image).@bhasinnaik : your input image has no information about dpi. If you want to avoid warning, you should fix it.
It means your image does not contain a resolution info in its metadata, so Tesseract warns you about this issue in the image and it tries to estimate the resolution by itself.
To test if an image has the correct header you can use magick identify -verbose filename or equivalent tools
and make sure these 2 values are set Resolution: 118.11x118.11 Units: PixelsPerCentimeter Above is for a 300 dpi PNG
Tesseract uses Leptonica which uses libpng to read the input image source resolution. If the input png does not have the correct metadata info, it will generate the warning referred in this issue. I also seen this to cause tesseract to return slightly different text results for certain images. The code above adds metadata to the PNG
That’s a bug in Tesseract. Tesseract internally creates a new image for that png file, but does not copy the resolution from the original image. Fixed now in commit a209a6b4b503c6ada4ce6eb257fde2b76c47f771.
@stweil : Thanks for looking into this. Funny that problem was with
psm 0only. Others psm works as expected.Just very easy and short internet search suggest this modification:
mogrify -set units PixelsPerInch -density 300 image.jpgThat is not directly supported by Tesseract, but could be implemented by a wrapper script.
The current Tesseract release 5.0.0 tries to guess the correct resolution if there is no explicit information from the image file.
I’ve been using Tesseract for a while and got the same error. I just want to confirm that it is never about the metadata. I got the error while using image_to_osd for photos captured using the same device and this happened to only 3 of 50 images. I’ve checked the image details and the dpi already exists.
I’ve been trying to see if the error disappears when I crop the background around the objects in the image, and I saw that the error disappeared for 2 of them, not all of them. I still don’t really know the reason, but if it was the metadata, the third one would have worked.
However, I believe that it has something related to the number of characters. The images that resulted in the error that disappeared after cropping were somewhat rotated. The text angle was around 30-40. Tesseract was giving me rotations of 90,180, and 270 only for the images that worked. When it comes to the image that gave error in both cases, it already has a low number of characters. This is why it would be interesting if more people try this so we can figure out if it’s really the reason.
Try 300x300 with the mogrify command.
On Fri, Oct 18, 2019, 10:02 AM zdenop notifications@github.com wrote:
I tried it and it works for my image.jpg. If you are using tesseract >=4 you can use --dpi option of tesseract executable.