Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):
readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing
However I want to make sure it works also when UTF-8
is not the current locale encoding. For example inside rApache, default is ascii
. Hence I pass the encoding parameter:
readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))
But this actually results in output getting messed up. Why is this? How should I call textConnection
to make sure the stream gets read properly on any platform or locale?