Convert unicode to readable characters in R

2020-02-13 01:57发布

I have a .csv where the encoding returns "unknown" and "UTF-8" when using Encoding(data). The text looks like this:

<U+1042><U+1040><U+1042><U+1040> <U+1019><U+103D><U+102C>\n\n<U+1010><U+102D><U+102F><U+1004><U+1039><U+1038><U+103B><U+1015><U+100A><U+1039><U+1000><U+102D><U+102F><U+101C><U+1032> <U+1000><U+102C><U+1000><U+103C>

I would like to turn it into a readable format, which in this case is Myanmar language, so something that looks a little like this:

၂၀၂၀မွာတိုင္းျ

Strangely, the text in this data used to be readable in RStudio, but at some point -- I don't know when -- this changed and I can only see the Unicode characters now. I have tried these solutions with no success.

标签： r unicode utf-8

1条回答

We Are One

2楼-- · 2020-02-13 02:42

You could do something like this:

library(stringi)

string <- "<U+1042><U+1040><U+1042><U+1040> <U+1019><U+103D><U+102C>\n\n<U+1010><U+102D><U+102F><U+1004><U+1039><U+1038><U+103B><U+1015><U+100A><U+1039><U+1000><U+102D><U+102F><U+101C><U+1032> <U+1000><U+102C><U+1000><U+103C>" 

cat(stri_unescape_unicode(gsub("<U\\+(....)>", "\\\\u\\1", string)))

Which results in:

၂၀၂၀ မွာ

တိုင္းျပည္ကိုလဲ ကာကြ

0人赞添加讨论(0) 举报

Convert unicode to readable characters in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间