RPA-Python: "invalid continuation byte" error - UTF-8 and OS code page, see solution

Hello @kensoh, I am Chinese, I think there is a problem with tagui for python handling of Chinese characters, For example:

r.type('//*[@name="q"]', '撒') # google search input type test, It will cause 'invalid continuation byte'.

and

r.type('D:\input.png', '中文') # chrome input png type test, It will nothing happens and script will pending.

Mr.kensoh, Can you give me some advice? I really need your help! Thank you so much!

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 38 (20 by maintainers)

Commits related to this issue

Most upvoted comments

Kensoh: The SikuliX engine used by rpa package does not support typing international characters.

Check this: https://github.com/tebelorg/RPA-Python/issues/451#issuecomment-1489511469

I am the same, as long as I encounter Chinese, the program will freeze image

For the second problem, below are my comments:

# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.

# this is a working use case and yes it should work 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.

# the SikuliX engine used by rpa package does not support typing international characters
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.

# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
    r.clipboard('中文')
    r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
    r.keyboard('[ctrl]v')

Hi, @kensoh Mr.Kensoh, Thank you for your reply, I’ve tried both solutions, all successful!!! 👍 But I think the solution 1 is better, So I think this question can be closed. Thanks again.

By the way, Share update default chcp 437 method with others who have the same problem:

1. "win + r" and type "regedit".
2. find "\HKEY_CURRENT_USER\Software\Microsoft\Command Processor".
3. create "autorun" type value "chcp 437" and save! enjoy it ~

Try the 2 possible solutions separately not at the same time.

Possible solution 1, chcp 437 from command prompt Possible solution 2, change header in .py file

Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.

Hi @kensoh, Mr.Kensoh, Maybe my description is wrong, When I modified tagui.py to gbk, tagui.py cannot working. So, utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?' error is tagui.py’s result in utf-8. I still think it is possible that there is a problem with substring or replace when doing the conversion.

It works on my Windows PC too :

>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'

test1

device_encoding is gbk :

test2

I have tested on many Windows10 OS and used many versions of Python(3.8.0, 3.8.1, 3.11.2), I believe this error [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte has always existed, I think it may be because this error does not affect the final type() result, so no one raise the issue.

PS: Mr.Kensoh, I already added your FB friend, Can you pass it? Thx.

抱歉,我还没用上 Python,这个问题给不了建议。


康轶文 13816359064

Ken Soh @.***> 于2023年4月9日周日 21:30写道:

Hi @kangyiwen https://github.com/kangyiwen, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you are issues using r.type() for this package with Chinese characters? @Vic-Lau https://github.com/Vic-Lau in this issue has problems with some Chinese characters, but I can’t replicate the problem on my Mac and Windows PC.

There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I’m trying to understand more about the problem before finding the best solution reported in this GitHub issue.

— Reply to this email directly, view it on GitHub https://github.com/tebelorg/RPA-Python/issues/451#issuecomment-1501129682, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASI64QNBJBNIAK2HK4OHM3LXAK2WPANCNFSM6AAAAAAV4TZSTA . You are receiving this because you were mentioned.Message ID: @.***>

Mr.kensoh,您所做的事情不仅有意义,而且也非常出色,就不要谦虚了,哈哈,致敬。

对于大神的称呼,我当然很荣幸,但也不敢当。世界之大,一山还有一山高。我只是尽我的能力,做些有意义的事。