I have often seen problems arise with encoding. Therefore I have written down this instruction set to do all the needed actions in order to make things work (with encoding).
This set is related to Eclipse but it will also guide with maven settings.
The issue with encoding is most problematic when using scandinavian letters in java files (åäö, and they had actual meaning on runtime).
An example case is having a constant variable in a java file, that contains a scandic letter and it is used to identify a value from incoming stream (wich is in UTF-8).
Also the underlying OS may be Windows and they use cp1252 by default.
E.g. the following code:
@Test
public void scandicTest() {
System.out.println("scandics: åäö");
}
When everything is configured correctly (e.g. in eclipse), running this test will produce:
scandics: åäö
But if you run this via Maven (from command line or in eclipse => mvn test), you will have:
scandics: ���
First of all, the encoding needs to be changed in eclipse and also in the maven pom.xml to read and store files correctly and for the eclipse to use correct encoding when saving the files / running tests.
However the constant value in the java file itself remains corrupted even that the files read in are correct (containing the scandic letters) when the Maven and the resulting java code handled the incoming streams (compiled & run the tests).
The System Java still uses a OS specific default encoding even that everything else is set correctly. For this reason you can not configure all within the project, you must do it for the OS-JVM also.
I will explain all the the encoding steps needed for this, even that there are multiple answers for this "common" part already (at least for step 2). My particular case is to resolve step 3.
Configure the eclipse:
- Open: Window > Preferences
- Type 'encoding' in the search field
- There will be lots of entries, but first select the 'General > Workspace'
- Find the 'Text file encoding' and select: Other > UTF-8
- You also want/need to set the encoding also for all the 'General > Content Types'
- Select 'text' item from the right hand panel (will open a list of file types), and browse through all the types. Set their 'Default encoding' to 'UTF-8'
- Click the 'update' button to persist the change.
- You may need to do this also for all the other entries and items found with the search.
- E.g. 'Web > CSS Files > Encoding' | ISO 10646/Unicode(UTF-8)
- When all set, the Eclipse should behave properly with the encoding.
Set the encoding in maven.pom.xml
<project>
...
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
...
</project>
You may need to set the encoding for all plugins also.
<plugin>
...
<configuration>
<encoding>UTF-8</encoding>
...
</configuration>
</plugin>
or
<plugin>
<executions>
<execution>
<configuration>
<encoding>UTF-8</encoding>
...
</configuration>
...
</execution>
</executions>
</plugin>
Though i am not sure if the latter is mandatory or if it will take the default actually.
Configure the OS
- You need to set environment variable
JAVA_TOOL_OPTIONS
with value -Dfile.encoding=UTF8
As suggested in comment, here's some more info for converting a file:
You should note that all the files must have the UTF-8 encoding in order them to work. If you edit everything via eclipse with the given configuration, they will be as UTF-8.
If you receive a file that you should process with your code, you may need to convert that. You can simply do that by opening it in eclipse and saving the file again (you may need to add and remove a character to enable saving).
If you can use NotePad++, there is an 'encoding' menu for converting the file.
When converting a file, the scandics may get corrupted sometimes, so you need to check them manually after conversion.
And one more thing. The files saved in other tools, may have the BOM. (Byte Order Mark). This 'character' is invisible and for example an XML file containing this can not be read in by some parsers.
You can remove the BOM mark by opening the file in eclipse and setting the cursor before the first character in the file, then tab once the 'backspace'. Nothing changes, but the character gets actually removed and the file works then.
NotePad may insert the BOM-mark, so do not use it for editing XML files!