How to parse non-standard month names with DateTim

2019-04-05 13:21发布

问题:

I need to parse (German) dates that come in the following form:

10. Jan. 18:14
8. Feb. 19:02
1. Mär. 19:40
4. Apr. 18:55
2. Mai 21:55
5. Juni 08:25
5. Juli 20:09
1. Aug. 13:42
[...]

As you can see, the month names are cut if the month has more than 4 characters. Even weirder, don't aks me why, the month of March is shortened to Mär. although the whole name is März. How can I parse this with java.time? (The dates are formatted based on the localization of the android device that creates the list of dates. However, I'm not parsing it on Android)

My approach was to create a DateTimeFormatter like this:

DateTimeFormatter.ofPattern("d. MMMM HH:mm").withLocale(Locale.GERMAN);
// or
DateTimeFormatter.ofPattern("d. MMMMM HH:mm").withLocale(Locale.GERMAN);

But neither the MMMM nor the MMMMM pattern fit the dates that are shortened. I can, of course, have the following pattern d. MMM. HH:mm to match the shortened months, but then I can't match the 3 and 4 characters months. I am aware that I can have two formatters (MMM. and MMMMM) but I would rather have a solution where I have only one formatter and possibly a custom locale or something like this.

回答1:

The answer to the problem is the DateTimeFormatterBuilder class and the appendText(TemporalField, Map) method. It allows any text to be associated with a value when formatting or parsing, which solves the problem effectively and elegantly:

Map<Long, String> monthNameMap = new HashMap<>();
map.put(1L, "Jan.");
map.put(2L, "Feb.");
map.put(3L, "Mar.");
DateTimeFormatter fmt = new DateTimeFormatterBuilder()
    .appendPattern("d. ")
    .appendText(ChronoField.MONTH_OF_YEAR, monthNameMap)
    .appendPattern(" HH:mm")
    .parseDefaulting(ChronoField.YEAR, 2016)
    .toFormatter();

System.out.println(LocalDateTime.parse("10. Jan. 18:14", fmt));
System.out.println(LocalDateTime.parse("8. Feb. 19:02", fmt));

Some notes:

  • The monthNameMap must be populated with all 12 months
  • The formatter should normally be assigned to a static final constant, rather than being created all the time
  • The parseDefaulting(YEAR, 2016) has been added so that LocalDateTime.parse(String, DateTimeFormatter) can be used directly. Without it, there would be no year, and thus nothing more than a TemporalAccessor could be parsed (the year must be a leap year, in case 29th Feb is being parsed)


回答2:

You could use a DateTimeFormatterBuilder:

private static final DateTimeFormatter formatter = new DateTimeFormatterBuilder()
            .appendOptional(DateTimeFormatter.ofPattern("d. MMM. HH:ss"))
            .appendOptional(DateTimeFormatter.ofPattern("d. MMMM HH:ss"))
            .toFormatter(Locale.GERMAN);

Running it on this:

Stream.of(("10. Jan. 18:14\n" +
           "8. Feb. 19:02\n" +
           "1. Mär. 19:40\n" +
           "4. Apr. 18:55\n" +
           "2. Mai 21:55\n" +
           "5. Juni 08:25\n" +
           "5. Juli 20:09\n" +
           "1. Aug. 13:42").split("\n"))
       .map(formatter::parse)
       .forEach(System.out::println);

you get:

{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=10, MonthOfYear=1, MilliOfSecond=0, SecondOfMinute=14, HourOfDay=18},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=8, MonthOfYear=2, MilliOfSecond=0, SecondOfMinute=2, HourOfDay=19},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=1, MonthOfYear=3, MilliOfSecond=0, SecondOfMinute=40, HourOfDay=19},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=4, MonthOfYear=4, MilliOfSecond=0, SecondOfMinute=55, HourOfDay=18},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=2, MonthOfYear=5, MilliOfSecond=0, SecondOfMinute=55, HourOfDay=21},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=5, MonthOfYear=6, MilliOfSecond=0, SecondOfMinute=25, HourOfDay=8},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=5, MonthOfYear=7, MilliOfSecond=0, SecondOfMinute=9, HourOfDay=20},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=1, MonthOfYear=8, MilliOfSecond=0, SecondOfMinute=42, HourOfDay=13},ISO


回答3:

As pointed out it would be easier to use a standard and consistent format - here you are mixing long and short month names.

One option (short of using a DateTimeFormatterBuilder) is to handle both cases separately:

private static final DateTimeFormatter SHORT_MONTH = DateTimeFormatter.ofPattern("d. MMM. HH:ss", Locale.GERMAN);
private static final DateTimeFormatter LONG_MONTH = DateTimeFormatter.ofPattern("d. MMMM HH:ss", Locale.GERMAN);
private static TemporalAccessor parse(String s) {
  try {
    return SHORT_MONTH.parse(s);
  } catch (DateTimeParseException e) {
    return LONG_MONTH.parse(s);
  }
}


回答4:

You can regex replace the month portion so it's always 3 characters length before parsing it using "d. MMM HH:mm"

text = text.replaceFirst("(\\S+\\s\\S{3})\\S", "$1")

Explanation for the regex part: Find 1 or more non-whitespace (\S+) followed by 1 whitespace (\s) followed by three non-whitespace (\S{3}) followed by one non-whitespace, and replace it with the portion inside first bracket ($1)

10. Jan. 18:14 will become 10. Jan 18:14 and 5. Juni 08:25 will become 5. Jun 08:25