I want a regex that matches all emojis (or most of them) but excludes certain characters (such as “|”|‘|’|…|—
).
This regex does the job via negative lookahead:
/(?!\u201C|\u201D|\u2018|\u2019|\u2026|\u2014)(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])/
But apparently Google Scripts doesn't support this. Error:
Invalid regular expression pattern
(?!“|”|‘|’|…|—)(©|®|[ -㌀]|?[퀀-?]|?[퀀-?]|?[퀀-?])
Is there another way to achieve my goal (a regex that works with Google Script's findText
)?
Option 1
Maybe,
[\u{1f300}-\u{1f5ff}\u{1f900}-\u{1f9ff}\u{1f600}-\u{1f64f}\u{1f680}-\u{1f6ff}\u{2600}-\u{26ff}\u{2700}-\u{27bf}\u{1f1e6}-\u{1f1ff}\u{1f191}-\u{1f251}\u{1f004}\u{1f0cf}\u{1f170}-\u{1f171}\u{1f17e}-\u{1f17f}\u{1f18e}\u{3030}\u{2b50}\u{2b55}\u{2934}-\u{2935}\u{2b05}-\u{2b07}\u{2b1b}-\u{2b1c}\u{3297}\u{3299}\u{303d}\u{00a9}\u{00ae}\u{2122}\u{23f3}\u{24c2}\u{23e9}-\u{23ef}\u{25b6}\u{23f8}-\u{23fa}]
might be working OK for your desired emojis.
Demo
Option 2
Otherwise, you might want to negate those undesired chars using char classes, such as:
[these unicode ranges &&[^these unicodes]]
which would become pretty complicated, yet possible.
Option 3
Using this option you can most likely solve your problem much simpler. I guess, your problem is that those undesired punctuations are already among the desired unicodes. Check to see if that'd be the case. For example, in
[\u100-\u200]
you might have \u150
and \u175
as undesired chars, which you want them to be removed from your desired ranges of unicodes that you already have.
You can then simply remove those from the range, such as with:
[\u100-\u149\u151-\u174\u176-\u200]
and as simple as that the problem would be solved.
Source
javascript unicode emoji regular expressions