Java class.getResourceAsStream() returns incorrect

2020-04-10 02:06发布

问题:

I have just run into a very strange problem with getResourceAsStream().

In my prod project JUnit test I read test data using getResourceAsStream(), I found that getResourceAsStream() sometimes substitutes some bytes:

byte[] fileBytes = FileUtils.readFileToByteArray(new File(
    "resources/test/parser/test-short-enc.xml"));

printBytes(fileBytes);

byte[] classPathBytes = IOUtils.toByteArray(ParserTest.class
    .getResourceAsStream("/test/parser/test-short-enc.xml"))

printBytes(classPathBytes);

In this project output looks like this:

D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 81 D1 82 D0 B8 D0 BA D0 B0

D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 3F D1 82 D0 B8 D0 BA D0 B0

After this, I decided to create a small bug-showing project and host it at Github as an example. Here's the link: https://github.com/snowindy/getResourceAsStream-Bug

I essentially copied the code needed, after run, I could not see the problem reproducing:

D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 81 D1 82 D0 B8 D0 BA D0 B0

D0 9A D1 80 D0 B8 D0 BC D0 B8 D0 BD D0 B0 D0 BB D0 B8 D1 81 D1 82 D0 B8 D0 BA D0 B0

The printBytes function looks like this:

public static void printBytes(byte[] bv) {
    System.out.println();
    for (byte b : bv) {
        System.out.print(' ');
        System.out.print(String.format("%02X", b));
    }
}

What can it be??

I use eclipse, UTF-8 workspace encoding, the file contains cyrilic word "Криминалистика", it's a UTF-8 no-BOM file.

I use JavaSE-1.6 (jdk1.6.0_29) for both projects, I have Windows 7 OS, windows-1252 system encoding.

UPDATE

I was finally able to reproduce the bug. I updated the project so you can test it: https://github.com/snowindy/getResourceAsStream-Bug

The bug appears only if you have this code in maven pom.xml. This means it's maven-specific

<build>
    <sourceDirectory>src</sourceDirectory>
    <resources>
        <resource>
            <directory>resources</directory>
            <filtering>true</filtering>
        </resource>
    </resources>
...

回答1:

Ok, I've got the answer.

This configuration fixes the problem:

<project>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
...

I got inspired with this answer: https://stackoverflow.com/a/8979120/792313