Seems like emojis are 4 bytes long, you can simply cut your string every 4. Here's some code for you:
text = 'Hey \xf0\x9f\x98\xb7\xf0\x9f\x98\xb7\xf0\x9f\x98\xb7'
print text
print 'text.split()=%s' % text.split()
emojis_str = text.split()[1]
emojis_list = [emojis_str[i:i+4] for i in range(0, len(emojis_str), 4)]
print 'emojis_list=%s' % emojis_list
for em in emojis_list:
print 'emoji: %s' % em
Seems like emojis are 4 bytes long, you can simply cut your string every 4. Here's some code for you:
will output
If the Emoji is 4 bytes, the first byte is hex Fx. Regexp:
f[0-7]
If the Emoticon is 3 bytes, the first byte is hex Ex.
e[0-9a-f]
This is where 'x' is some other hex digit.
Examples:
You should be able to use
get_emoji_regexp
from the https://pypi.org/project/emoji/, together with the usualsplit
function . So something like: