I'm working on a big java web application in Eclipse, whose files have different encodings: some are in UTF-8, others in Cp1252, yet others are in ISO-8859-1 (with no distinction between JSP's or java source files, or CSS) — but I know the encoding of each file.
I'm converting the project to Maven, and this is a great occasion to turn all of them to UTF-8.
Of course I don't want to lose a single character (so fully automated conversions do not apply here).
How should I go about it? Is there a tool that can help me ensure I don't lose any special character?
The webapp is in Italian, so, especially in JSP's, there could be lots of accented letters (probably not everywhere HTML entities have been used).
The project is in Eclipse, but I can use an external editor if that could make the conversion easier.
Converting a single file can be done with the iconv function (I used LibIconv for Windows).
It lets you specify the source and destinations encodings, and warns when characters can't be converted.
I tried it with a couple of source files and all the accented letters were correctly converted in UTF-8 from Cp1252.
It's very easy to write code to convert encodings - although I'd expect there are tools to do it anyway. Simply:
FileInputStream
to the existing file, and wrap it in anInputStreamReader
with the appropriate encodingFileOutputStream
to the new file, and wrap it in anOutputStreamWriter
with the appropriate encodingThe first two steps are simpler with
Files.newBufferedReader
andFiles.newBufferedWriter
, too.