emoji-regex: Some Emoji no longer match after 6.1.3
Thanks for providing the library, we notice that some emoji no longer match the regex after the latest version publish.
Not sure if it because the unicode spec changes? http://www.unicode.org/reports/tr51/
Test case
const emojiRegex = require('emoji-regex');
const emojis = '๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐จ,๐ฉ,๐ช,๐ซ,๐ญ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ต,๐ท,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐ฟ,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,โ,โ
,โ,โ,โ,โ,โ,โ,โ,โ,โ,โจ,โณ,โด,โ,โ,โ,โ,โ,โ,โ,โ,โค,โ,โ,โ,โก,โฐ,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ข,๐ค,๐ฅ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฌ,๐ญ,๐ฒ,๐ถ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐,โ,๐
ฐ,๐
ฑ,๐
พ,๐
ฟ,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ฉ๐ช,๐ฌ๐ง,๐จ๐ณ,๐ฏ๐ต,๐ฐ๐ท,๐ซ๐ท,๐ช๐ธ,๐ฎ๐น,๐บ๐ธ,๐ท๐บ,๐,๐,๐,๐ฏ,๐ฒ,๐ณ,๐ด,๐ต,๐ถ,๐ท,๐ธ,๐น,๐บ,๐,๐,ยฉ,ยฎ,โผ,โ,8โฃ,9โฃ,7โฃ,6โฃ,1โฃ,0โฃ,2โฃ,3โฃ,5โฃ,4โฃ,#โฃ,โข,โน,โ,โ,โ,โ,โ,โ,โฉ,โช,โ,โ,โฉ,โช,โซ,โฌ,โฐ,โณ,โช,โซ,โถ,โ,โป,โผ,โฝ,โพ,โ,โ,โ,โ,โ,โ,โ,โบ,โ,โ,โ,โ,โ,โ,โ,โ,โ,โ,โ,โ,โ ,โฃ,โฅ,โฆ,โจ,โป,โฟ,โ,โ ,โก,โช,โซ,โฝ,โพ,โ,โ
,โ,โ,โช,โฒ,โณ,โต,โบ,โฝ,โคด,โคต,โฌ
,โฌ,โฌ,โฌ,โฌ,โญ,โญ,ใฐ,ใฝ,ใ,ใ,๐,๐,๐,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ฐ,๐ฑ,๐ด,๐ต,๐ท,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐ฟ,๐,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฌ,๐ญ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ต,๐ถ,๐ท,๐ธ,๐น,๐บ,๐ป,๐,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฌ,๐ญ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ต,๐ถ,๐ท,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐ฟ,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฌ,๐ญ,๐ฎ,๐ฏ,๐ฐ,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ซ,๐ฌ,๐ญ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ต,๐ถ,๐ท,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ต,๐ถ,๐ท,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐ฟ,๐,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฌ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ต,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐พ,๐ฟ,๐,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฎ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ถ,๐ท,๐น,๐บ,๐ป,๐ผ,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง,๐จ,๐ฉ,๐ช,๐ซ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ฒ,๐ณ,๐ด,๐ต,๐ถ,๐ท,๐ธ,๐น,๐บ,๐ป,๐ผ,๐ฝ,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ป,๐ผ,๐ฝ,๐พ,๐ฟ,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ฆ,๐ง,๐ฌ,๐ฎ,๐ฏ,๐ด,๐ถ,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ ,๐ก,๐ฃ,๐ฆ,๐ฎ,๐ฏ,๐ฐ,๐ฑ,๐ณ,๐ด,๐ต,๐ท,๐ธ,๐ฟ,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ฒ,๐ณ,๐,๐,๐ผ,๐,๐,๐ค,๐,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐,๐ช,๐ฅ,๐ฌ,๐ญ,๐ญ,๐ถ,๐ท,๐ฌ,๐ญ,๐ฏ,๐ต,๐,๐,๐,๐,๐
,๐,๐,๐,๐,๐ฌ,๐ญ,๐,๐,๐,๐,๐ ,๐ก,๐ข,๐ฃ,๐ค,๐ฅ,๐ฆ,๐ง'.split(',');
const exception = [];
emojis.forEach((emoji) => {
const match = emojiRegex().exec(emoji);
if (!match) { exception.push(emoji) }
});
console.log('Exception length', exception.length);
console.log(JSON.stringify(exception));
6.1.0
Exception length 0
[]
6.1.3
Exception length 72
["โ","โ","โ","โ","โ","โ","โ","โณ","โด","โ","โ","โค","โก","โ","๐
ฐ","๐
ฑ","๐
พ","๐
ฟ","๐","๐ท","ยฉ","ยฎ","โผ","โ","8โฃ","9โฃ","7โฃ","6โฃ","1โฃ","0โฃ","2โฃ","3โฃ","5โฃ","4โฃ","#โฃ","โข","โน","โ","โ","โ","โ","โ","โ","โฉ","โช","โช","โซ","โถ","โ","โป","โผ","โ","โ","โ","โ","โบ","โ ","โฃ","โฅ","โฆ","โจ","โป","โ ","โคด","โคต","โฌ
","โฌ","โฌ","ใฐ","ใฝ","ใ","ใ"]
Using https://github.com/Kikobeats/emojis-list as spec
6.1.0
Exception length 118
["๐ฆ","๐ง","๐จ","๐ฉ","๐ช","๐ซ","๐ฌ","๐ญ","๐ฎ","๐ฏ","๐ฐ","๐ฑ","๐ฒ","๐ณ","๐ด","๐ต","๐ถ","๐ท","๐ธ","๐น","๐บ๐ณ","๐บ","๐ป","๐ผ","๐ฝ","๐พ","๐ฟ","๐บ","๐ค","๐จ","๐","๐","๐ด","๐ต","๐ถ","๐ค","๐ค","๐ค","๐ค","๐ค","๐ค","๐ค ","๐คก","๐คข","๐คฃ","๐คค","๐คฅ","๐คฆโโ๏ธ","๐คฆโโ๏ธ","๐คฆ","๐คง","๐คฐ","๐คณ","๐คด","๐คต","๐คถ","๐คทโโ๏ธ","๐คทโโ๏ธ","๐คท","๐คธโโ๏ธ","๐คธโโ๏ธ","๐คธ","๐คนโโ๏ธ","๐คนโโ๏ธ","๐คน","๐คบ","๐คผโโ๏ธ","๐คผโโ๏ธ","๐คผ","๐คฝโโ๏ธ","๐คฝโโ๏ธ","๐คฝ","๐คพโโ๏ธ","๐คพโโ๏ธ","๐คพ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ
","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฅ","๐ฆ
","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","๐ฆ","โ","โ","โ","๎"]
6.1.3
Exception length 209
["๐
ฐ","๐
ฑ","๐
พ","๐
ฟ","๐","๐ท","๐ก","๐ค","๐ฅ","๐ฆ","๐ง","๐จ","๐ฉ","๐ช","๐ซ","๐ฌ","๐ถ","๐ฝ","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐ณ","๐ต","๐ท","๐ฟ","๐โ๐จ","๐","๐ฝ","๐","๐","๐ฏ","๐ฐ","๐ณ","๐ถ","๐ท","๐ธ","๐น","๐","๐","๐","๐","๐","๐ฅ","๐จ","๐ฑ","๐ฒ","๐ผ","๐","๐","๐","๐","๐","๐","๐","๐","๐","๐ก","๐ฃ","๐จ","๐ฏ","๐ณ","๐บ","๐","๐","๐","๐","๐ ","๐ก","๐ข","๐ฃ","๐ค","๐ฅ","๐ฉ","๐ฐ","๐ณ","โผ","โ","โข","โน","โ","โ","โ","โ","โ","โ","โฉ","โช","#โฃ","โจ","โ","โญ","โฎ","โฏ","โฑ","โฒ","โธ","โน","โบ","โ","โช","โซ","โถ","โ","โป","โผ","โ","โ","โ","โ","โ","โ","โ","โ","โ ","โข","โฃ","โฆ","โช","โฎ","โฏ","โธ","โน","โบ","โ","โ","โ ","โฃ","โฅ","โฆ","โจ","โป","โ","โ","โ","โ","โ","โ","โ","โ","โ ","โฐ","โฑ","โ","โ","โ","โ","โฉ","โฐ","โฑ","โด","โท","โธ","โ","โ","โ","โ","โ","โ","โ","โ","โก","โณ","โด","โ","โ","โฃ","โค","โก","โคด","โคต","*โฃ","โฌ
","โฌ","โฌ","0โฃ","ใฐ","ใฝ","1โฃ","2โฃ","ใ","ใ","3โฃ","4โฃ","5โฃ","6โฃ","7โฃ","8โฃ","9โฃ","ยฉ","ยฎ","๎"]
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 15 (8 by maintainers)
Commits related to this issue
- Support emoji sequences - http://unicode.org/Public/emoji/5.0/emoji-sequences.txt - http://unicode.org/Public/emoji/5.0/emoji-zwj-sequences.txt Variation sequences (from `emoji-variation-sequences.t... — committed to mathiasbynens/emoji-regex by mathiasbynens 7 years ago
- Move generated files to the root folder Ref. https://github.com/mathiasbynens/emoji-regex/issues/13#issuecomment-286870322. — committed to mathiasbynens/emoji-regex by mathiasbynens 7 years ago
regexgen fix for the bug mentioned above is here: https://github.com/devongovett/regexgen/pull/14. Released in v1.2.3. ๐
it works perfectly, probably in readme we should
import emojiRegex from 'emoji-regex/dist/text';
as we are not exposing dist directly ๐Thanks for quick fix!
Ah, I see. The problematic test cases, e.g. U+1F321 THERMOMETER ๐ก๏ธ are not
Emoji_Presentation
characters. This means that they get rendered as text by default (according to the Unicode data files), and only get rendered as an emoji when followed by U+FE0F VARIATION SELECTOR-16. (more info)From http://unicode.org/Public/emoji/5.0/emoji-test.txt:
(On my system U+1F321 gets rendered as an emoji even without the variation selector, but of course thatโs something that differs anyway depending on the fonts youโre using etc. This package is purely based on the Unicode data.)
I donโt really see how to move forward here โ should emoji-regex expose a secondary regex that matches text-emoji such as ๐ก๏ธ as well?