tesserocr: Cannot get text with French language
Hi !
I’m trying to use tesserocr with french language but I keep getting errors on Unicode decoder
api = PyTessBaseAPI(lang='fra')
api.SetImage(Image.open("20170509_182040.jpg"))
api.SetSourceResolution(300)
api.GetUTF8Text()
Returns:
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “tesserocr.pyx”, line 2033, in tesserocr.PyTessBaseAPI.GetUTF8Text (tesserocr.cpp:18137)
File “tesserocr.pyx”, line 294, in tesserocr._free_str (tesserocr.cpp:2567)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc3 in position 341: invalid continuation byte
Although the english version is working:
api = PyTessBaseAPI() api.SetImage(Image.open(“20170509_182040.jpg”)) api.SetSourceResolution(300) api.GetUTF8Text() Returns : ‘The text that I want’
This is my installation :
tesserocr.version ‘2.1.3’
tesserocr.tesseract_version() ‘tesseract 3.05.00\n leptonica-1.74.1\n libjpeg 8d : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.8\n’
MacOS Sierra
Is it a known issue or do I need to change something to get it to work ?
Thanks for your help !
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (9 by maintainers)
Both ubuntu and mac use en_US.UTF-8. It’s magic.
@sirfz Yes it solved the problem thanks for your help 😉