Lists: [Remove Request] - invalid domains due to IDN instead of punycode

URL you wish to be removed: There are some (7) IDN domains in the blocklist that should be or already are converted to punycode: - госулуги.рф - госуслуги-ру.рф - гослуги.рф - госсуслуга.рф - госсуслуугии.рф

Why you believe this to be a false positive: Output from Pi-Hole preparing new gravity database:

  [i] Target: https://raw.githubusercontent.com/blocklistproject/Lists/master/phishing.txt
  [✓] Status: Retrieval successful
  [i] Analyzed 190258 domains, 7 domains invalid!
      Sample of invalid domains:
      - госулуги.рф
      - госуслуги-ру.рф
      - гослуги.рф
      - госсуслуга.рф
      - госсуслуугии.рф
  [i] List has been updated

List it is on: phishing

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 17 (14 by maintainers)

Most upvoted comments

@blocklistproject , @FDrebin , is there any progress to be reported? When will this be fixed? ⏳

Ups, I left this issue open in my browser for a while and didn’t reload and therefore didn’t notice - sorry for the inconvenience 🤷🏻‍♂️

By the way, I don’t think @FDrebin is affiliated with this project

No problem! @thomasmerz

@thomasmerz This issue is already closed. So I don’t understand what you mean.

@cryptogap , can you please review and merge linked PR #574 if ok? Thanks a lot!

Hi there, there is domain2idna. It’s not perfect (yet?) but it’s the underlying Python module that PyFunceble uses for the conversion.

If you are fluent in Python, here is the way with it:

from domain2idna import domain2idna

with open('awesome_file', 'r', encoding="utf-8") as input_file_stream, open('awesome_output_file', 'w', encoding="utf-8") as output_file_stream:
    for line in input_file_stream:
        output_file_stream.write(domain2idna(line.strip()) + "\n")

Disclaimer: I’m the author of it 😄

I prefer bash as well but that doesn’t change the facay @funilrys have made @PyFunceble in python and it is working and even convertin to ponycode by it self 😮 You coul have found and removed all inactive domains at the same time as well 😃

I would have used while read line instead for for lines in it is faster an uses less resources as you are only working though the file once vs one time for each line in $f

wouldn’t it be better to translate them into punycode?

xn--c1aapkosob.xn--p1ai
xn----etbaunrsdatce.xn--p1ai
xn--c1aapkosp.xn--p1ai
xn--80afa5aosaarc.xn--p1ai
xn--c1aapamrvaatca.xn--p1ai