What is best ways to validate string date to be va

2020-04-30 07:24发布

问题:

Started working with WEB UI recently. And encountered a problem of date string parsing/validation. "dd-mm-yyyy" Some approaches I found are:

  1. Matching - not complete validation, not flexible.

    (19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])

  2. There was a post where guy suggest to preinitialize Set with possible date string - fast, valid, but also not flexible and memory consuming

Is there something easier, maybe available in public libs ?

Please don't suggest SimpleDateFormat :)

UPDATE for java 8 correct answer is https://stackoverflow.com/a/43076001/1479668

回答1:

If you are using java 8 then DateTimeFormatter is what you are looking for. The link to javadoc also contains sample code and a number of predefined formats. Besides you can also define your own.


Here is some code, an example from the same link:

LocalDate date = LocalDate.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy MM dd");
String text = date.format(formatter);
LocalDate parsedDate = LocalDate.parse(text, formatter);

Also, this How to parse/format dates with LocalDateTime? (Java 8) question has got some fantastic answers.


EDIT: Thanks Basil Bourque for the updates about ThreeTen-Backport project in case one needs to use almost the same features as provided by java 8 in some older versions of java.



回答2:

Preamble:

If you don't care about details then the accepted answer suggesting DateTimeFormatter.ofPattern("yyyy MM dd"); is fine. Otherwise if you are interested in the tricky details of parsing then read further:


Regular expressions

As you have already recognized, a complete validation is not possible by using regular expressions like (19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01]). For example, this expression would accept "2017-02-31" (February with 31 days???).

Java-8-parsing mechanism

The Java-8-class DateTimeFormatter however, can invalidate such non-existing dates just by parsing. To go into the details, we have to differentiate between syntactic validation and calendrical validation. The first kind of syntactic validation is performed by the method parseUnresolved().

Parsing is implemented as a two-phase operation. First, the text is parsed using the layout defined by the formatter, producing a Map of field to value, a ZoneId and a Chronology. Second, the parsed data is resolved, by validating, combining and simplifying the various fields into more useful ones. This method performs the parsing stage but not the resolving stage.

The main advantage of this method is to not use exception flow which makes this kind of parsing fast. However, the second step of parsing uses exception flow, see also the javadoc of the method parse(CharSequence, ParsePosition).

By contrast, this method will throw a DateTimeParseException if an error occurs, with the exception containing the error index. This change in behavior is necessary due to the increased complexity of parsing and resolving dates/times in this API.

IMHO a performancewise limitation. Another drawback is also that the currently available API does not allow to specify a dot OR a hyphen as you have done in your regular expression. The API only offers a construct like "[.][-]" (using optional sections), but the problem is that an input sequence of ".-" would also be okay for Java-8.

Well, these minor disadvantages are mentioned here for completeness. A final almost-perfect solution would be in Java-8:

String input = "2017-02.-31";
DateTimeFormatter dtf =
    DateTimeFormatter.ofPattern("yyyy[.][-]MM[.][-]dd").withResolverStyle(
        ResolverStyle.STRICT // smart mode truncates to Feb 28!
    );
ParsePosition pos = new ParsePosition(0);
TemporalAccessor ta = dtf.parseUnresolved(input, pos); // step 1
LocalDate date = null;
if (pos.getErrorIndex() == -1 && pos.getIndex() == input.length()) {
    try {
        date = LocalDate.parse(input, dtf); // step 2
    } catch (DateTimeException dte) {
        dte.printStackTrace(); // in strict mode (see resolver style above)
    }
}
System.out.println(date); // 2017-02-28 in smart mode

Important:

  • The best possible validation is only possible in strict resolver style.
  • The validation proposed also includes a check if there are trailing unparsed chars.
  • The result ta of method parseUnresolved() in step 1 cannot be used as intermediate result due to internal limitations of resolving. So this 2-step-approach is also not so overly good for performance. I have not benchmarked it against a normal 1-step-approach but hope that the main author of the new API (S. Colebourne) might have done it, see also for comparison his solution in his own Threeten-extra-library. More or less a hackish workaround to avoid exception flow as much as possible.
  • For Java 6+7, there is a backport available.

Alternative

If you look for an alternative but not for SimpleDateFormat, then you might also find my library Time4J interesting. It supports real OR-logic and avoids exception flow logic as much as possible (highly tuned parsing only in one step). Example:

    String input = "2017-02-31";
    ParseLog plog = new ParseLog();
    PlainDate date =
        ChronoFormatter.ofDatePattern(
            "uuuu-MM-dd|uuuu.MM.dd", PatternType.CLDR, Locale.ROOT)
        .parse(input, plog); // uses smart mode by default and rejects feb 31 in this mode
    if (plog.isError()) {
        System.out.println(plog.getErrorMessage());
    } else {
        System.out.println(date);
    }

Notes:

  • A check of trailing characters can be included in the same way as in Java-8
  • The parsed result is easily convertible to LocalDate via date.toTemporalAccessor()
  • Using the format attribute Attributes.LENIENCY would weaken the validation
  • Time4J is also available for Java 6+7 (when using version line v3.x)


回答3:

If you have a known list of formats you want to support, you can create instances of the thread-safe org.joda.time.format.DateTimeFormatter, place them into a list, and iterate until one of them can successfully parse the date. Memory consumption for these parsers is negligible, and you can use the resulting date object once you find the matching format.

This also has the benefit of being far more readable than regex. Beware of using regex for formats that can be ambiguous like mm-dd-yyyy vs. dd-mm-yyyy.



回答4:

You might try Pojava DateTime. It parses dates and times heuristically, rather than matching formats, and supports a wide variety of languages (e.g. for month names) and formats. See http://pojava.org/howto/datetime.html

Typical usage relies on your system's locale to resolve the ambiguity of whether a format is m/d/y vs d/m/y, so by default you usually just need: DateTime dt1=new DateTime("01/02/2003");

If your server is processing dates derived from multiple locales, and need to interpret "01/02/2003" as "January 2" if from one locale, and "February 1" if from a different locale, then you can specify a configuration object to be used when parsing from the foreign locale.

DateTimeConfigBuilder builder = DateTimeConfigBuilder.newInstance();
builder.setDmyOrder(false);
builder.setInputTimeZone(TimeZone.getTimeZone("America/Los_Angeles"));
builder.setOutputTimeZone(TimeZone.getTimeZone("America/Porto_Velho"));
IDateTimeConfig config=DateTimeConfig.fromBuilder(builder);

DateTime dt1=new DateTime("01/02/2003 13:30", config)