hosts: CANNOT using someonewhocares.org cause ENCODING or DOWNLOAD issue

When I add someonewhocares.org directory to data dir, then I cannot update hosts file due error: Python2 - cannot finish download hosts file from someonewhocares.org Python3 - finish download file, but ENCODING error made:

[fademind@manjaro hsts.test]$ python3 updateHostsFile.py -a -c
Updating source data/add.Risk from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Risk/hosts
Traceback (most recent call last):
  File "updateHostsFile.py", line 1416, in <module>
    main()
  File "updateHostsFile.py", line 167, in main
    update_all_sources(source_data_filename, settings["hostfilename"])
  File "updateHostsFile.py", line 603, in update_all_sources
    update_data = json.load(update_file)
  File "/usr/lib/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 27: ordinal not in range(128)

When I delete someonewhocares.org dir from data dir generate hosts file went FINE:

[fademind@manjaro ~]$ uhf
Updating source data/add.Risk from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Risk/hosts
Updating source data/CoinBlockerLists from https://raw.githubusercontent.com/ZeroDot1/CoinBlockerLists/master/hosts_browser
Updating source data/adaway.org from https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt
Updating source data/tyzbit from https://raw.githubusercontent.com/tyzbit/hosts/master/data/tyzbit/hosts
Updating source data/StevenBlack from https://raw.githubusercontent.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source data/add.2o7Net from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts
Updating source data/UncheckyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/UncheckyAds/hosts
Updating source data/mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source data/hpHosts-ATS from https://hosts-file.net/ad_servers.txt
Updating source data/add.Dead from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Dead/hosts
Updating source data/hpHosts-EMD from https://hosts-file.net/emd.txt
Updating source data/yoyo.org from https://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0
Updating source data/hpHosts-MMT from https://hosts-file.net/mmt.txt
Updating source data/KADhosts from https://raw.githubusercontent.com/azet12/KADhosts/master/KADhosts.txt
Updating source data/Badd-Boyz-Hosts from https://raw.githubusercontent.com/mitchellkrogza/Badd-Boyz-Hosts/master/hosts
Updating source data/malwaredomainlist.com from http://www.malwaredomainlist.com/hostslist/hosts.txt
Updating source data/Spotify-Ad-free from https://raw.githubusercontent.com/CHEF-KOCH/Spotify-Ad-free/master/Spotifynulled.txt
Updating source data/add.Spam from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Spam/hosts
Updating source extensions/fakenews from https://raw.githubusercontent.com/marktron/fakenews/master/fakenews
Updating source extensions/social from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/social-hosts
Updating source extensions/porn/clefspeare13 from https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/0.0.0.0/hosts
Updating source extensions/porn/sinfonietta-snuff from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/snuff-hosts
Updating source extensions/porn/sinfonietta from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/pornography-hosts
Updating source extensions/gambling from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/gambling-hosts
==>0.0.0.0 12.170.116.68<==
==>0.0.0.0 164.109.51.67<==
==>0.0.0.0 5.100.249.215<==
==>0.0.0.0 74.112.173.77<==
==>0.0.0.0 91.205.157.38<==
==>0.0.0.0 91.212.132.230<==
Success! The hosts file has been saved in folder 
It contains 246,768 unique entries.
Moving the file requires administrative privileges. You might need to enter your password.
Flushing the DNS cache to utilize new hosts file...
Flushing the DNS cache requires administrative privileges. You might need to enter your password.
Flushing the DNS cache by restarting NetworkManager.service succeeded
Flushing the DNS cache by restarting dnsmasq.service succeeded
Flushing the DNS cache by restarting NetworkManager.service succeeded
Flushing the DNS cache by restarting dnsmasq.service succeeded

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (16 by maintainers)

Commits related to this issue

Most upvoted comments

I can confirm that PR from @funilrys https://github.com/StevenBlack/hosts/pull/520 RESOLVED issue. CC @StevenBlack

@StevenBlack

This is fine on MacOS.

I have this issue on macOS. But not if i use the default configuration (of the hosts repo).

This is how i can reproduce it:

  1. Delete the hosts folder
  2. run git clone https://github.com/StevenBlack/hosts.git
  3. run python3 updateHostsFile.py --auto --backup --replace --flush-dns-cache This finishes succesfully
  4. create the folder hosts/extensions/hpHosts/psh
  5. inside the psh folder create update.json:
{
    "name": "hpHosts - phishing",
    "description": "This file contains phishing sites listed in the hpHosts database",
    "homeurl": "https://hosts-file.net/?s=Download",
    "frequency": "occasional",
    "issues": "",
    "url": "https://hosts-file.net/psh.txt",
    "license": ""
}
  1. run python3 updateHostsFile.py --auto --backup --replace --flush-dns-cache --extensions hpHosts
  2. You will now see an error:
Problem getting file:  https://hosts-file.net/psh.txt
Error in updating source:  https://hosts-file.net/psh.txt
  1. Go to the folder hosts/extensions/hpHosts/psh in a terminal and run this to manually download the file:
wget https://hosts-file.net/psh.txt
mv psh.txt hosts
  1. run python3 updateHostsFile.py --auto --backup --replace --flush-dns-cache --extensions hpHosts
  2. You will now see this error:
Traceback (most recent call last):
  File "updateHostsFile.py", line 1416, in <module>
    main()
  File "updateHostsFile.py", line 186, in main
    merge_file = create_initial_file()
  File "updateHostsFile.py", line 653, in create_initial_file
    write_data(merge_file, curFile.read())
  File "/usr/local/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 4812469: invalid continuation byte

We have been battling this one a long time. Two other issues were https://github.com/StevenBlack/hosts/issues/465 and https://github.com/StevenBlack/hosts/issues/440. I think @funilrys may have just closed half our tickets πŸ˜›

We can now close this @StevenBlack πŸ‘

@StevenBlack as those libraries are not built-in Python should I include a requirement.txt ? πŸ€”

Okay Steven @StevenBlack I’m going to try one of that library to see if it fixes @notDavid protocol then.

I think (from my last comment) that we can put an explanation on that issue … I don’t if it has been already done but :

We (and Steven @StevenBlack) assumes that the format is always utf-8 which is far from true as we can see every time someone got Error in updating source: xxx

Actually, this is one of the world all-time issues as we can not detect the format/encoding of a file or byte. I also think that using a library like BeautifulSoup or chardet may help but would it be without performance issues Steven @StevenBlack ?