UTF-8 encoding not used although it is set in sour

2020-04-28 07:10发布

问题:

I don't understand what is going on here (working with RStudio on Windows platform):

Save script test_abc.R

a <- "ä"
b <- "ü"
c <- "ö"

Then, run the following script Test.R:

compare_text <- function() {
  l <- list()
  if (a != a2) {
    l[[1]] <- c(a, a2)
  }
  if (b != b2) {
    l[[1]] <- c(b, b2)
  }
  if (c != c2) {
    l[[1]] <- c(c, c2)
  }
}

a <- "ä"
b <- "ü"
c <- "ö"
a2 <- "ä"
b2 <- "ü"
c2 <- "ö"

out_text <- compare_text()
# The next active "source-line" overwrites a, b and c!
source("path2/test2_abc.R") # called "V1" OR
# source("path2/test2_abc.R", encoding = "UTF-8") # called "V2"
out_text2 <- compare_text()
print(out_text)
print(out_text2)

If you run the script test.R in version V1 you get

source('~/Desktop/test1.R', encoding = 'UTF-8')
# NULL
# [1] "ö" "ö"

although it states that it is run using UTF-8 encoding.
If you run the script test.R in version "V2" you get

source('~/Desktop/test1.R', encoding = 'UTF-8') 
# NULL
# NULL

I don't know whether that related post is helpful.

回答1:

In V1 you source a file without specifying the encoding of that file (test_abc.R). The "encoding"-section of source help says:

By default the input is read and parsed in the current encoding of the R session. This is usually what it required, but occasionally re-encoding is needed, e.g. if a file from a UTF-8-using system is to be read on Windows (or vice versa).

The "Umlaute" can't be read correctly and function compare_text returns c(c, c2) because c != c2 is TRUE.

In V2 the "Umlaute" are read correctly and compare_text function returns null (no difference is found).

It's R itself that reads the file within the source function. R uses the default encoding of the OS. On Windows, this is (mostly?) "Windows code page 1252", which differs from UTF-8. You can test it on your machine with Sys.getlocale(). That's why you have to tell R that the file you want to source is encoded UTF-8



标签: r utf-8 rstudio