Assuming that you are asking for standard Unicode emoji ranges (there are different blocks by vendor) you may consider these three ranges:
0x20a0 - 0x32ff
0x1f000 - 0x1ffff
0xfe4e5 - 0xfe4ee
Besides all the thoughtful explanation that T.J.Crowder has shared with us, needs to be said that beginning with Java 7 is possible to match UTF-16 encoded surrogate pairs with ease.
A Unicode character can also be represented in a regular-expression by using its Hex notation(hexadecimal code point value) directly as described in construct \x{...}, for example a supplementary character U+2011F can be specified as \x{2011F}, instead of two consecutive Unicode escape sequences of the surrogate pair \uD840\uDD1F.
Nevertheless, if you cannot switch to Java 7, you can extend the valuable UnicodeEscaper provided by Guava.
Here an implementation for the sake of example:
public class SimpleEscaper extends UnicodeEscaper
{
@Override
protected char[] escape(int codePoint)
{
if (0x1f000 >= codePoint && codePoint <= 0x1ffff)
{
return Integer.toHexString(codePoint).toCharArray();
}
return Character.toChars(codePoint);
}
}
The first one is Using third-party libs like emoji-java and emoji4j. These are mentioned above. You can easily use the method containsEmoji or removesEmoji, etc. And in your own Apps, you need to keep update with these libs.
As for me, I want to find a simple solution to solve this problem.
After a whole day of searching, I've found a magic regex:
Assuming that you are asking for standard Unicode emoji ranges (there are different blocks by vendor) you may consider these three ranges:
Besides all the thoughtful explanation that T.J.Crowder has shared with us, needs to be said that beginning with Java 7 is possible to match UTF-16 encoded surrogate pairs with ease.
Take a look at the docs:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Nevertheless, if you cannot switch to Java 7, you can extend the valuable UnicodeEscaper provided by Guava.
Here an implementation for the sake of example:
There are two ways to solve this sticky problem.
The first one is Using third-party libs like emoji-java and emoji4j. These are mentioned above. You can easily use the method
containsEmoji
orremovesEmoji
, etc. And in your own Apps, you need to keep update with these libs.As for me, I want to find a simple solution to solve this problem.
After a whole day of searching, I've found a magic regex:
"(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)"
which I have tested OK in Java. It perfectly solved my problem.
You can view this on the Github page:
https://github.com/zly394/EmojiRegex
Notes:
The answer which provided by @Eric Nakagawa contains some errors, which cannot be operated properly.