linkify-it: Invalid links returned with some chinese characters as delimiters

Steps to reproduce:

  1. linkify the following text

【视频奇志大兵《发烧友》 在线观看 - 酷6视频】奇志大兵《发烧友》 在线观看,奇志 大兵 搞笑双簧 _ 发烧友 (追星族) http://t.cn/RZwjG7U(分享自 @酷6网)

  1. the output link is

http://t.cn/RZwjG7U(分享自

whereas the output link should be

http://t.cn/RZwjG7U

The reason is that ( is not recognized as a separating delimiter, yet it is quite common in Chinese.

Out of 500 posts I gathered, about 20 to 30 of them had links like this, resulting in invalid links reported by linkify.

Note that I realize that these users are technically posting invalid URLs, but 20-30 out of 500 is very common and therefore there should be a way to deal with this. Any suggestion?

About this issue

  • Original URL
  • State: open
  • Created 9 years ago
  • Reactions: 1
  • Comments: 28 (12 by maintainers)

Most upvoted comments

Feel free to create a dummy facebook event (or even a FB post on your wall) and see what happens?

It seems like it fails to parse that link properly, instead linking to https://zh.wikipedia.org/wiki/ .

I will try to resolve this problem.

More examples of how difficult these Chinese posts are to parse: link

What’s your idea on this?

I’m sorry I’m not a Chinese speaker. Perhaps someone else can help out here. There are also other Chinese punctuation characters, like