How to convert (Java) files with different encodin

2019-07-25 02:35发布

I'm working on a big java web application in Eclipse, whose files have different encodings: some are in UTF-8, others in Cp1252, yet others are in ISO-8859-1 (with no distinction between JSP's or java source files, or CSS) — but I know the encoding of each file.

I'm converting the project to Maven, and this is a great occasion to turn all of them to UTF-8.
Of course I don't want to lose a single character (so fully automated conversions do not apply here).

How should I go about it? Is there a tool that can help me ensure I don't lose any special character?
The webapp is in Italian, so, especially in JSP's, there could be lots of accented letters (probably not everywhere HTML entities have been used).

The project is in Eclipse, but I can use an external editor if that could make the conversion easier.

2条回答
淡お忘
2楼-- · 2019-07-25 03:08

Converting a single file can be done with the iconv function (I used LibIconv for Windows).

It lets you specify the source and destinations encodings, and warns when characters can't be converted.

I tried it with a couple of source files and all the accented letters were correctly converted in UTF-8 from Cp1252.

查看更多
beautiful°
3楼-- · 2019-07-25 03:13

It's very easy to write code to convert encodings - although I'd expect there are tools to do it anyway. Simply:

  • Create one FileInputStream to the existing file, and wrap it in an InputStreamReader with the appropriate encoding
  • Create one FileOutputStream to the new file, and wrap it in an OutputStreamWriter with the appropriate encoding
  • Loop over the reader, reading characters into a buffer and writing out the contents of that buffer (just as many characters as you read) until you've read the whole file
  • Close all resources (automatic with a try-with-resources block)

The first two steps are simpler with Files.newBufferedReader and Files.newBufferedWriter, too.

查看更多
登录 后发表回答