linkify-it: Invalid links returned with some chinese characters as delimiters
Steps to reproduce:
- linkify the following text
【视频奇志大兵《发烧友》 在线观看 - 酷6视频】奇志大兵《发烧友》 在线观看,奇志 大兵 搞笑双簧 _ 发烧友 (追星族) http://t.cn/RZwjG7U(分享自 @酷6网)
- the output link is
http://t.cn/RZwjG7U(分享自
whereas the output link should be
http://t.cn/RZwjG7U
The reason is that ( is not recognized as a separating delimiter, yet it is quite common in Chinese.
Out of 500 posts I gathered, about 20 to 30 of them had links like this, resulting in invalid links reported by linkify.
Note that I realize that these users are technically posting invalid URLs, but 20-30 out of 500 is very common and therefore there should be a way to deal with this. Any suggestion?
About this issue
- Original URL
- State: open
- Created 9 years ago
- Reactions: 1
- Comments: 28 (12 by maintainers)
Feel free to create a dummy facebook event (or even a FB post on your wall) and see what happens?
It seems like it fails to parse that link properly, instead linking to https://zh.wikipedia.org/wiki/ .
I will try to resolve this problem.
More examples of how difficult these Chinese posts are to parse: link
What’s your idea on this?
I’m sorry I’m not a Chinese speaker. Perhaps someone else can help out here. There are also other Chinese punctuation characters, like
,