String.format for double-width characters

2019-08-04 09:39发布

问题:

Java's String.format does not appear to be aware of double-width characters, such as Japanese or Chinese:

System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "field", "expected", "actual"));
System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "surface", "駆け", "駆け"));

The output is not aligned correctly:

field:             expected :               actual
surface:                   駆け :                   駆け

Is there a correct way to format double-width characters with String.format? If not, is there an alternative method or library which is capable of doing this correctly?

回答1:

There is no issue with Java's String.format() since it can't "know" how you want to render the text, or the font that will be used. Its role is purely to assemble a formatted string of text to be subsequently displayed. The visual appearance of that formatted text is controlled (primarily) by the display font, and the developer must explicitly set the formatting accordingly.

A simple solution would be to use a font that renders both Latin and CJK characters with glyphs of constant width, but I couldn't find one. See a Unicode Technical Report titled "East Asian Width" for more details:

For a traditional East Asian fixed pitch font, this width translates to a display width of either one half or a whole unit width. A common name for this unit width is “Em”. While an Em is customarily the height of the letter “M”, it is the same as the unit width in East Asian fonts, because in these fonts the standard character cell is square. In contrast, the character width for a fixed-pitch Latin font like Courier is generally 3/5 of an Em.

I'm guessing that there might not be any monospace font displaying CJK characters and Latin characters with the same width simply because it would look very strange. For example, imagine the two Latin characters "li" occupying the same width as the two Japanese characters "駆け". So even if you use a monospaced font to render both Latin and CJK characters, although the characters for each language are monospaced, the widths for each language are probably still different.

Google has a very helpful site for evaluating their fonts, which allows you to:

  • Filter the fonts by language: Japanese, Chinese, etc.
  • View a large number of characters being rendered. For example this page for Noto Sans JP shows:
    • The Japanese glyphs are wider than the Latin glyphs.
    • The Japanese glyphs are fixed width, whereas the Latin glyphs are not.
  • Enter any text you wish, and apply it to all selected fonts for comparison. For example, this screen shot shows how the Latin glyphs for AEIOUY look alongside some Japanese glyphs using different fonts. Note that the width of the Latin glyphs is always smaller, though by varying amounts, depending on the font being used and the specific glyph to be rendered:

Here's a possible solution to your alignment problem:

  • With the Kosugi Maru font (middle of top row in the screen shot above), Japanese characters seem to be exactly twice as wide as Latin characters, so use that font to render the output.
  • When rendering the formatted text, the leading spaces must be reduced by one for each Japanese character to be displayed to ensure column alignment (since Japanese glyphs are twice as wide).

So in the code reduce the number of leading spaces by the number of Japanese glyphs to be rendered:

    System.out.println("* The display font is named MotoyaLMaru, created by installing Google font KosugiMaru-Regular.ttf.");
    System.out.println("* With this font Japanese glyphs seem to be twice the width of Latin glyphs.");
    System.out.println("* Downloaded from https://fonts.google.com/specimen/Kosugi+Maru?selection.family=Kosugi+Maru");
    System.out.println(" ");
    System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "field", "expected", "actual"));
    System.out.println(String.format("%1$9s: %2$18s : %3$18s\n", "surface", "駆け", "駆け")); // 18, not 20!
    System.out.println(String.format("%1$9s: %2$12s : %3$12s\n", "1234567", "川土空田天生花草", "川土空田天生花草")); // 12, not 20!

This is the output from running that code in NetBeans on Windows 10, showing the columns properly aligned:

Notes:

  • The format strings were hard-coded in this example to ensure column alignment, but it would be simple to dynamically build the format string based on the number of Japanese characters to be rendered.
  • Also see Monospace font that supports both English and Japanese.