Writing data isn't preserving encoding

I have a string like the following:

str <- "ていただけるなら"
Encoding(str) #returns "UTF-8"

I write it to disk:

write.table(str, file="chartest", quote=F, col.names=F, row.names=F)

Now I look at the file in Notepadd++, which is set to UTF-8 without BOM encoding, and I get this:

<U+3066><U+3044><U+305F><U+3060><U+3051><U+308B><U+306A><U+3089>

What is going wrong in this process? I would like the written text file to display the string as it appears in R.

This is on Windows 7, R version 2.15

标签： r encoding character-encoding

2条回答

做个烂人

2楼-- · 2019-02-06 17:15

This is an annoying "feature" of R in Windows. The only solution that I have found so far is to temporarily and programatically switch your locale to the appropriate one required to decode the script of the text in question. So, in the above case you would use the Japanese locale.

## This won't work on Windows
str <- "ていただけるなら"
Encoding(str) #returns "UTF-8"
write.table(str, file="c:/chartest.txt", quote=F, col.names=F, row.names=F)
## The following should work on Windows - first grab and save your existing locale
print(Sys.getlocale(category = "LC_CTYPE"))
original_ctype <- Sys.getlocale(category = "LC_CTYPE")
## Switch to the appropriate local for the script
Sys.setlocale("LC_CTYPE","japanese")
## Now you can write your text out and have it look as you would expect
write.table(str, "c:/chartest2.txt", quote = FALSE, col.names = FALSE, 
            row.names = FALSE, sep = "\t", fileEncoding = "UTF-8")
## ...and don't forget to switch back
Sys.setlocale("LC_CTYPE", original_ctype)

The above produces the two files you can see in this screenshot. The first file shows the Unicode code points, which is not what you want, while the second shows the glyphs you would normally expect.

Japanese text

So far nobody has been able to explain to me why this happens in R. It is not an unavoidable feature of Windows because Perl, as I mention in this post, gets round the issue somehow.

0人赞添加讨论(0) 举报

Evening l夕情丶

3楼-- · 2019-02-06 17:21

Have you tried using argument fileEncoding ?

write.table(str, file="chartest", quote=F, col.names=F, row.names=F, fileEncoding="UTF-8")

0人赞添加讨论(0) 举报

Writing data isn't preserving encoding

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间