requests-html: decode error
from requests_html import HTML
from pyquery import PyQuery
default_encoding = 'gbk'
test_html = "<html><body><p>Hello World!--你好世界</p></body></html>".encode(default_encoding)
element = HTML(url='http://example.com/hello_world', html=test_html, default_encoding=default_encoding)
print(element.text)
print(PyQuery(test_html)('html').text())
print(PyQuery(test_html.decode(default_encoding))('html').text())
output:
C:\Users\what\PycharmProjects\untitled\venv\Scripts\python.exe C:/Users/what/PycharmProjects/requests-html/BUG.py
Hello World!--ÄãºÃÊÀ½ç
Hello World!--ÄãºÃÊÀ½ç
Hello World!--你好世界
Process finished with exit code 0
So, https://github.com/kennethreitz/requests-html/blob/master/requests_html.py#L319 html should be decode.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15
Commits related to this issue
- Fix #85 — committed to cxgreat2014/requests-html by cxgreat2014 6 years ago
i got a better fixed
fixed in master