I don't understand what is going on here (working with RStudio on Windows platform):
Save script test_abc.R
a <- "ä"
b <- "ü"
c <- "ö"
Then, run the following script Test.R
:
compare_text <- function() {
l <- list()
if (a != a2) {
l[[1]] <- c(a, a2)
}
if (b != b2) {
l[[1]] <- c(b, b2)
}
if (c != c2) {
l[[1]] <- c(c, c2)
}
}
a <- "ä"
b <- "ü"
c <- "ö"
a2 <- "ä"
b2 <- "ü"
c2 <- "ö"
out_text <- compare_text()
# The next active "source-line" overwrites a, b and c!
source("path2/test2_abc.R") # called "V1" OR
# source("path2/test2_abc.R", encoding = "UTF-8") # called "V2"
out_text2 <- compare_text()
print(out_text)
print(out_text2)
If you run the script test.R
in version V1 you get
source('~/Desktop/test1.R', encoding = 'UTF-8')
# NULL
# [1] "ö" "ö"
although it states that it is run using UTF-8 encoding.
If you run the script test.R
in version "V2" you get
source('~/Desktop/test1.R', encoding = 'UTF-8')
# NULL
# NULL
I don't know whether that related post is helpful.
In V1 you source a file without specifying the encoding of that file (test_abc.R). The "encoding"-section of source help says:
The "Umlaute" can't be read correctly and function compare_text returns c(c, c2) because c != c2 is TRUE.
In V2 the "Umlaute" are read correctly and compare_text function returns null (no difference is found).
It's R itself that reads the file within the source function. R uses the default encoding of the OS. On Windows, this is (mostly?) "Windows code page 1252", which differs from UTF-8. You can test it on your machine with Sys.getlocale(). That's why you have to tell R that the file you want to source is encoded UTF-8