I got a data.table base. I got a term column in this data.table
class(base$term)
[1] character
length(base$term)
[1] 27486
I'm able to remove accents from a string. I'm able to remove accents from a vector of string.
iconv("Millésime",to="ASCII//TRANSLIT")
[1] "Millesime"
iconv(c("Millésime","boulangère"),to="ASCII//TRANSLIT")
[1] "Millesime" "boulangere"
But for some reason, it does not work when I apply the very same function on my term column
base$terme[2]
[1] "Millésime"
iconv(base$terme[2],to="ASCII//TRANSLIT")
[1] "MillACsime"
Does anybody know what is going on here?
You can apply this function
Ok the way to solve the problem :
Thanks to @nicola
It might be easier to use the stringi package. This way, you don't need to check the encoding beforehand. Furthermore stringi is consistent across operating systems and
inconv
is not.