This has been a longstanding problem with R: it can read non-latin characters on Unix, but I cannot read them on Windows. I've reproduced this program on several English-edition Windows machines over the years. I've tried changing the localisation settings in Windows and numerous other to no effect. Has anyone actually been able to read a foreign text file on Windows? I think being able to read/write/display unicode is a pretty nifty feature for a program.
Environment:
> Sys.getlocale() [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
The problem can be reproduced as follows:
Create simple file in a language like Russian or Arabic in a text editor and save it as UTF-8 w/o BOM.
> test_df <- read.table("test2.txt",header=FALSE, sep=";", quote="",fill=FALSE,
encoding="UTF-8",comment.char="",dec=",")
......Warning message:
......In read.table("test2.txt", header = FALSE, sep = ";", quote = "", :
......incomplete final line found by readTableHeader on 'test2.txt'
> test_df
...... V1 V2
......1 <U+043E><U+0439>!yes 9
using read.csv()
yields the same results, minus the warning. I realize that the "" is both searchable and can be converted to the readable character by an external program. But I want to see actual cyrillic text in charts, tables, output etc, like I can in every other program I've used.
So I've had this problem for a few years, consistently. Then one morning, yesterday, I tried the following:
test_df <- read.table("items.txt",header=FALSE, sep=";",quote="",fill=FALSE,
encoding="bytes",comment.char="",dec=",")
And encoding="bytes"
worked! I saw cyrillic in the console. I then had to reinstall R (same version, same computer, same everything), the solution evaporated. I've literally retraced all my steps, and it seems like magic. Now encoding="bytes"
, just produces the same garbage (РєРѕРЅСЊСЏРє) as encoding="pizza"
would (the param is ignored).
There is also a fileEncoding
param for read.table. I am not sure how what it does, but it doesn't work either and cannot read even english text.
Can you read a non-ascii text file on your windows PC? How on earth do you do it?
Try setting the locale. For example,
See
?Sys.setlocale
for more information.