If i have the string:
"O João foi almoçar :) ."
how do i best split it into a list of words in python like so:
['O','João', 'foi', 'almoçar', ':)']
?
Thanks :)
Sofia
If i have the string:
"O João foi almoçar :) ."
how do i best split it into a list of words in python like so:
['O','João', 'foi', 'almoçar', ':)']
?
Thanks :)
Sofia
If the punctuation falls into its own space-separated token as with your example, then it's easy:
If this is not the case, you can define a dictionary of smileys like this (you'll need to add more):
and then replace each instance of the smiley with the place-holder that doesn't contain punctuation (we'll consider
<>
not to be punctuation):Which gets us to
"O João foi almoçar <HAPPY_SMILEY> ."
.We then strip punctuation:
Which gives us
"O João foi almoçar <HAPPY_SMILEY>"
.We do revert the smileys:
Which we then split:
Giving us our final result:
['O', 'Jo\xc3\xa3o', 'foi', 'almo\xc3\xa7ar', ':)']
.Putting it all together into a function: