I have just run into a very strange problem with getResourceAsStream().
In my prod project JUnit test I read test data using getResourceAsStream(), I found that getResourceAsStream() sometimes substitutes some bytes:
byte[] fileBytes = FileUtils.readFileToByteArray(new File(
"resources/test/parser/test-short-enc.xml"));
printBytes(fileBytes);
byte[] classPathBytes = IOUtils.toByteArray(ParserTest.class
.getResourceAsStream("/test/parser/test-short-enc.xml"))
printBytes(classPathBytes);
In this project output looks like this:
D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 81 D1 82 D0 B8 D0 BA D0 B0
D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 3F D1 82 D0 B8 D0 BA D0 B0
After this, I decided to create a small bug-showing project and host it at Github as an example. Here's the link: https://github.com/snowindy/getResourceAsStream-Bug
I essentially copied the code needed, after run, I could not see the problem reproducing:
D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 81 D1 82 D0 B8 D0 BA D0 B0
D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 81 D1 82 D0 B8 D0 BA D0 B0
The printBytes function looks like this:
public static void printBytes(byte[] bv) {
System.out.println();
for (byte b : bv) {
System.out.print(' ');
System.out.print(String.format("%02X", b));
}
}
What can it be??
I use eclipse, UTF-8 workspace encoding, the file contains cyrilic word "Криминалистика", it's a UTF-8 no-BOM file.
I use JavaSE-1.6 (jdk1.6.0_29) for both projects, I have Windows 7 OS, windows-1252 system encoding.
UPDATE
I was finally able to reproduce the bug. I updated the project so you can test it: https://github.com/snowindy/getResourceAsStream-Bug
The bug appears only if you have this code in maven pom.xml. This means it's maven-specific
<build>
<sourceDirectory>src</sourceDirectory>
<resources>
<resource>
<directory>resources</directory>
<filtering>true</filtering>
</resource>
</resources>
...