pytesseract: image_to_data returns different results than image_to_string
I use image_to_string
for single digits but additionally I want to obtain the confidence value. Therefore I replaced it with image_to_data
like this:
ocrResult = pytesseract.image_to_data(digitBinary, config='-psm 10 -c tessedit_char_whitelist=0123456789', output_type="dict")
digitasNumber = ocrResult["text"][0]
With image_to_string
the results are reasonably good.
With image_to_data
every text
dict entry is empty but for one case, where it returns two digits. the empty digits have conf
-1 and the digit where text
is filled the conf
is 6.
I don’t think that this is intended behavior.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 19 (6 by maintainers)
The default value is not the real problem here. The problem is that the tsv config is apparently loaded after the config is applied (see my use case in the first post) and overwriting the value. I tried to fix the code by myself but I didn’t manage to make it work so far.
Internally it is currently called like this
but it should be