Maven: Source Encoding in UTF-8 not working?

i am converting a project from Ant to Maven and i'm having problems with a specific unit test which deals with UTF-8 characters. The problem is about the following String:

String l_string = "ČäÁÓý\n€řЖжЦ\n№ЯФКЛ";

The problem is that the unit test fails, because the String is read as the following:

?äÁÓý
€????
?????

The java class is saved as UTF-8 and i also specify the build encoding to UTF-8 in the pom.xml.

Here is an excerpt of my pom.xml:

...

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

...

<build>
<plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
            <source>1.6</source>
            <target>1.6</target>
            <encoding>${project.build.sourceEncoding}</encoding>
        </configuration>
    </plugin>
    <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>2.4</version>
        <configuration>
            <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
            </descriptorRefs>
        </configuration>
    </plugin>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-surefire-plugin</artifactId>
      <version>2.15</version>
    </plugin>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-surefire-report-plugin</artifactId>
      <version>2.15</version>
    </plugin>
 </plugins>
</build>

Am i missing something here? It would be great, if someone could help me here.

Update

Regarding the test code:

@Test
public void testTransformation()
{

    String l_string = "ČäÁÓý\n€řЖжЦ\n№ЯФКЛ";
    System.out.println( ">>> " + l_string );
     c_log.info( l_string );
    StringBuffer l_stringBuffer = new StringBuffer();
    int l_stringLength = l_string.length();

    String l_fileName = System.getProperty( "user.dir" ) + File.separator + "transformation" + File.separator + "TransformationMap.properties";
    Transformation.init( l_fileName );

    Properties l_props = Transformation.getProps();
    for ( int i = 0; i < l_stringLength; i++ )
    {
        char l_char = l_string.charAt( i );
        int l_intValue = (int) l_char;
        if ( l_intValue <= 255 )
        {
            l_stringBuffer.append( l_char );
        }
        else
        {
            l_stringBuffer.append( l_props.getProperty( String.valueOf( l_char ), "" ) );
        }
    }
    c_log.info( l_stringBuffer.toString() );
    byte[] l_bytes = l_string.getBytes();
    byte[] l_transformedBytes = Transformation.transform( l_bytes );
    assertNotNull( l_transformedBytes );

}

The following logic isn't really relevant(?) because after the first sysout the before mentioned "?" are printed instead of the correct characters (and therefore the following tests fail). There is also no use of a default platform encoding.

The test converts each character according to the TransformationMap.properties file, which is in the following form (just an excerpt):

Ý=Y
ý=y
Ž=Z
ž=z
°=.
€=EUR

It should be noted that the test runs without any problem when i build the project with Ant.

标签： java maven encoding utf-8

5条回答

手持菜刀，她持情操

2楼-- · 2019-01-31 16:58

I had a really resilient problem of this kind and setting environmental variable

MAVEN_OPTS=-Dfile.encoding=UTF-8

fixed the issue for me.

0人赞添加讨论(0) 举报

Bombasti

3楼-- · 2019-01-31 17:01

Your problem is not the encoding of the source file (and therefore the String inside your class file) but the Problem is the encoding of System.out's implicite PrintStream. It uses file.encoding which represents the System encoding, and this is in Windows the ANSI codepage.

You would have to set up a PrintWriter with the OEM code page (or you use the class which is intended for this: Console).

See also various bugs around this in: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4153167

0人赞添加讨论(0) 举报

淡お忘

4楼-- · 2019-01-31 17:03

I have found a "solution" myself:

I had to pass the encoding into the maven-surefire-plugin, but the usual

<encoding>${project.build.sourceEncoding}</encoding>

did not work. I still have no idea why, but when i pass the command line arguments into the plugin, the tests works as they should:

<plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-surefire-plugin</artifactId>
      <version>2.15</version>
      <configuration>
        <argLine>-Dfile.encoding=UTF-8</argLine>
      </configuration>
</plugin>

Thanks for all your responses and additional comments!

0人赞添加讨论(0) 举报

我想做一个坏孩纸

5楼-- · 2019-01-31 17:06

this works for me:

...
 <properties>
        **<project.build.sourceEncoding>ISO-8859-1</project.build.sourceEncoding>
        <project.reporting.outputEncoding>ISO-8859-1</project.reporting.outputEncoding>**
    </properties>
...
  <build>
    <finalName>Project</finalName>

    <sourceDirectory>src</sourceDirectory>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.3.2</version>
        <configuration>
          <source>1.6</source>
          <target>1.6</target>
          **<encoding>${project.build.sourceEncoding}</encoding>**
        </configuration>
      </plugin>
      <plugin>
        <artifactId>maven-war-plugin</artifactId>
        <version>2.2</version>
        <configuration>
          <warSourceDirectory>WebContent</warSourceDirectory>
        </configuration>
      </plugin>
    </plugins>
  </build>

0人赞添加讨论(0) 举报

对你真心纯属浪费

6楼-- · 2019-01-31 17:19

When debugging Unicode problems, make sure you convert everything to ASCII so you can read and understand what is inside of a String without guesswork. This means you should use, for example, StringEscapeUtils from commons-lang3 to turn ä into \u00e4. That way, you can be sure that you see ? because the console can't print it. And you can distinguish " " (\u0020) from " " (\u00a0)

In the test case, check the escaped version of the inputs as early as possible to make sure the data is actually what you expect.

So the code above should be:
```
assertEquals("\u010d\u00e4\u....", escape(l_string));
```
Make sure you use the correct encoding for file I/O. Never use the default encoding of Java, always use InputStreamReader/OutputStreamWriter and specify the encoding to use.
The POM looks correct. Run mvn with -X to make sure it picks up the correct options and runs the Java compiler using the correct options. mvn help:effective-pom might also help.
Disassemble the class file to check the strings. Java will use ? to denote that it couldn't read something.

If you get the ? from System.out.println( ">>> " + l_string );, this means the code wasn't compiled with UTF-8 or that the source file was maybe saved with another Unicode encoding (UTF-16 or similar).

Another source of problems could be the properties file. Make sure it was saved with ISO-8859-1 and that it wasn't modified by the compilation process.
Make sure Maven actually compiles your file. Use mvn clean to force a full-recompile.

0人赞添加讨论(0) 举报

Maven: Source Encoding in UTF-8 not working?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间