hosts: INVALID domains catched by PyFunceble (dead-hosts)

Hi Steven @StevenBlack,

I hope that you’re right!

I just wanted to mention the following which was marked as INVALID by PyFunceble testing with dead-hosts.

Here they are: https://github.com/StevenBlack/hosts/blob/2246df0c0b0b80c6743d395f8d3c928d29457972/data/StevenBlack/hosts#L884

https://github.com/StevenBlack/hosts/blob/2246df0c0b0b80c6743d395f8d3c928d29457972/data/StevenBlack/hosts#L810

About _thums.ero-advertising.com

Please note the presence of _ which is illegal in domain names.

About o_thus.ero-advertising.com

Please note the presence of _ which is illegal in domain names.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 16 (11 by maintainers)

Most upvoted comments

@StevenBlack please lock this discussion as it take us (you and I) in a unconstructive loop where we may have to repeat ourselves.

@Rikk did you read the dicussions I linked previously?

This discussion (as continued) has nothing to do with @StevenBlack as he only distribute a compilation. The rest is the responsibility of the curators (again we are repeating ourselves).

@mitchellkrogza if Steven @StevenBlack lock this, please feel free to post future tests results with a new issue on PyFunceble’s repository.

Cheers, Nissar

We are clear about that and I don’t know about others but as I mentioned before elsewhere here, it’s not your role to clean what comes to you 😸

Basically, this whole discussion (as it was continued not as we both started) is about how to efficiently detect invalid domains!

But again I hope that it’s clear for everyone reading this: This discussion (as continued) is not about how Steven @StevenBlack should work/clean his awesome compilation!

Hey guys, just so we’re clear: I don’t care about presently invalid domains.

If I was writing malware, I would certainly use tactics to “disappear” a domain. That merely takes TTL time for DNS to propagate. I could then “reappear” the domain relatively quickly, anytime I want to strike.

So I don’t want to scrub any domains the curators haven’t scrubbed by whatever process they use to maintain their lists.

@Rikk We are then talking about incertitude as your DNS can choose to resolve it or not. The idea is to have a global overview. Also in one of the link you shared there is for example the technical case with Java which show us that it is not a good idea to have an underscore in a subdomain. So for me it’s still INVALID as it’s not always resolvable.

@Rikk thanks for those links. I’ve been running my discovered domains through a pretty strict regex filter. I didn’t think about it much at the time, but it could be dropping a bunch of discovered domains that would work for advertising or tracking without me knowing about it. I think I’ll play around with the filter some this weekend to serve more as a symbol blacklist then a strict whitelist.

Reading up on Punycode it says:

While the Domain Name System (DNS) technically supports arbitrary sequences of octets in domain name labels, the DNS standards recommend the use of the LDH subset of ASCII conventionally used for host names, and require that string comparisons between DNS domain names should be case-insensitive.

This is an interesting example: http://www.са.com - the URL looks fine, but it has a Unicode character in it. Most browsers will display it in the URL bar differently, but as far as the DNS lookup goes its perfectly valid. At the moment, my list would drop that domain since it doesn’t pass the strict filter.

I don’t think underscores are completely invalid. At least DNS resolves it. Luckily, http://o_thus.ero-advertising.com/ just redirects to another blocked subdomain.

https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it https://stackoverflow.com/questions/7111881/what-are-the-allowed-characters-in-a-subdomain