How to ensure that Strings are in UTF-8?

2019-03-28 08:53发布

How to convert this String the surveyÂ’s rules to UTF-8 in Scala?

I tried these roads but does not work:

scala> val text = "the surveyÂ’s rules"
text: String = the surveyÂ’s rules

scala> scala.io.Source.fromBytes(text.getBytes(), "UTF-8").mkString
res17: String = the surveyÂ’s rules

scala> new String(text.getBytes(),"UTF8")
res21: String = the surveyÂ’s rules

Ok, i'm resolved in this way. Not a converting but a simple reading

implicit val codec = Codec("US-ASCII").onMalformedInput(CodingErrorAction.IGNORE).onUnmappableCharacter(CodingErrorAction.IGNORE)

val src = Source.fromFile(new File (folderDestination + name + ".csv"))
val src2 = Source.fromFile(new File (folderDestination + name + ".csv"))

val reader = CSVReader.open(src.reader())

2条回答
趁早两清
2楼-- · 2019-03-28 09:17

Note that when you call text.getBytes() without arguments, you're in fact getting an array of bytes representing the string in your platform's default encoding. On Windows, for example, it could be some single-byte encoding; on Linux it can be UTF-8 already.

To be correct you need to specify exact encoding in getBytes() method call. For Java 7 and later do this:

import java.nio.charset.StandardCharsets

val bytes = text.getBytes(StandardCharsets.UTF_8)

For Java 6 do this:

import java.nio.charset.Charset

val bytes = text.getBytes(Charset.forName("UTF-8"))

Then bytes will contain UTF-8-encoded text.

查看更多
我只想做你的唯一
3楼-- · 2019-03-28 09:31

Just set the JVM's file.encoding parameter to UTF-8 as follows:

-Dfile.encoding=UTF-8

It makes sure that UTF-8 is the default encoding.

Using scala it could be scala -Dfile.encoding=UTF-8.

查看更多
登录 后发表回答