requests-html: decode error

from requests_html import HTML
from pyquery import PyQuery

default_encoding = 'gbk'
test_html = "<html><body><p>Hello World!--你好世界</p></body></html>".encode(default_encoding)

element = HTML(url='http://example.com/hello_world', html=test_html, default_encoding=default_encoding)
print(element.text)

print(PyQuery(test_html)('html').text())
print(PyQuery(test_html.decode(default_encoding))('html').text())

output:

C:\Users\what\PycharmProjects\untitled\venv\Scripts\python.exe C:/Users/what/PycharmProjects/requests-html/BUG.py
Hello World!--ÄãºÃÊÀ½ç
Hello World!--ÄãºÃÊÀ½ç
Hello World!--你好世界

Process finished with exit code 0

So, https://github.com/kennethreitz/requests-html/blob/master/requests_html.py#L319 html should be decode.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15

Commits related to this issue

Most upvoted comments

i got a better fixed

fixed in master