R, Windows and foreign language characters

2019-04-13 12:25发布

This has been a longstanding problem with R: it can read non-latin characters on Unix, but I cannot read them on Windows. I've reproduced this program on several English-edition Windows machines over the years. I've tried changing the localisation settings in Windows and numerous other to no effect. Has anyone actually been able to read a foreign text file on Windows? I think being able to read/write/display unicode is a pretty nifty feature for a program.

Environment:

 > Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" 

The problem can be reproduced as follows:

Create simple file in a language like Russian or Arabic in a text editor and save it as UTF-8 w/o BOM.

> test_df <- read.table("test2.txt",header=FALSE, sep=";", quote="",fill=FALSE, 
encoding="UTF-8",comment.char="",dec=",")
......Warning message:
......In read.table("test2.txt", header = FALSE, sep = ";", quote = "",  :
......incomplete final line found by readTableHeader on 'test2.txt'
> test_df
......                    V1 V2
......1 <U+043E><U+0439>!yes  9

using read.csv()yields the same results, minus the warning. I realize that the "" is both searchable and can be converted to the readable character by an external program. But I want to see actual cyrillic text in charts, tables, output etc, like I can in every other program I've used.

So I've had this problem for a few years, consistently. Then one morning, yesterday, I tried the following:

test_df <- read.table("items.txt",header=FALSE, sep=";",quote="",fill=FALSE,
encoding="bytes",comment.char="",dec=",")

And encoding="bytes" worked! I saw cyrillic in the console. I then had to reinstall R (same version, same computer, same everything), the solution evaporated. I've literally retraced all my steps, and it seems like magic. Now encoding="bytes", just produces the same garbage (РєРѕРЅСЊСЏРє) as encoding="pizza" would (the param is ignored).

There is also a fileEncoding param for read.table. I am not sure how what it does, but it doesn't work either and cannot read even english text.

Can you read a non-ascii text file on your windows PC? How on earth do you do it?

1条回答
祖国的老花朵
2楼-- · 2019-04-13 13:23

Try setting the locale. For example,

Sys.setlocale(locale = "Russian")

See ?Sys.setlocale for more information.

查看更多
登录 后发表回答