Read text as UTF-8 encoding

2019-04-02 02:07发布

Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):

readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing

However I want to make sure it works also when UTF-8 is not the current locale encoding. For example inside rApache, default is ascii. Hence I pass the encoding parameter:

readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))

But this actually results in output getting messed up. Why is this? How should I call textConnection to make sure the stream gets read properly on any platform or locale?

标签: r utf-8 locale
1条回答
Viruses.
2楼-- · 2019-04-02 02:46

The suggestion by @flodel did the trick indeed:

readLines(textConnection("Z\u00FCrich", encoding="UTF-8"), encoding="UTF-8")

However it never became clear to me why this is needed.

查看更多
登录 后发表回答