UTF-8 encoding not used although it is set in sour

I don't understand what is going on here (working with RStudio on Windows platform):

Save script test_abc.R

a <- "ä"
b <- "ü"
c <- "ö"

Then, run the following script Test.R:

compare_text <- function() {
  l <- list()
  if (a != a2) {
    l[[1]] <- c(a, a2)
  }
  if (b != b2) {
    l[[1]] <- c(b, b2)
  }
  if (c != c2) {
    l[[1]] <- c(c, c2)
  }
}

a <- "ä"
b <- "ü"
c <- "ö"
a2 <- "ä"
b2 <- "ü"
c2 <- "ö"

out_text <- compare_text()
# The next active "source-line" overwrites a, b and c!
source("path2/test2_abc.R") # called "V1" OR
# source("path2/test2_abc.R", encoding = "UTF-8") # called "V2"
out_text2 <- compare_text()
print(out_text)
print(out_text2)

If you run the script test.R in version V1 you get

source('~/Desktop/test1.R', encoding = 'UTF-8')
# NULL
# [1] "Ã¶" "ö"

although it states that it is run using UTF-8 encoding.
If you run the script test.R in version "V2" you get

source('~/Desktop/test1.R', encoding = 'UTF-8') 
# NULL
# NULL

I don't know whether that related post is helpful.

标签： r utf-8 rstudio

1条回答

等我变得足够好

2楼-- · 2020-04-28 07:52

In V1 you source a file without specifying the encoding of that file (test_abc.R). The "encoding"-section of source help says:

By default the input is read and parsed in the current encoding of the R session. This is usually what it required, but occasionally re-encoding is needed, e.g. if a file from a UTF-8-using system is to be read on Windows (or vice versa).

The "Umlaute" can't be read correctly and function compare_text returns c(c, c2) because c != c2 is TRUE.

In V2 the "Umlaute" are read correctly and compare_text function returns null (no difference is found).

It's R itself that reads the file within the source function. R uses the default encoding of the OS. On Windows, this is (mostly?) "Windows code page 1252", which differs from UTF-8. You can test it on your machine with Sys.getlocale(). That's why you have to tell R that the file you want to source is encoded UTF-8

0人赞添加讨论(0) 举报

UTF-8 encoding not used although it is set in sour

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间