Determine if a String is a valid date before parsi

2019-01-14 16:39发布

I have this situation where I am reading about 130K records containing dates stored as String fields. Some records contain blanks (nulls), some contain strings like this: 'dd-MMM-yy' and some contain this 'dd/MM/yyyy'.

I have written a method like this:

public Date parsedate(String date){

   if(date !== null){
      try{
        1. create a SimpleDateFormat object using 'dd-MMM-yy' as the pattern
        2. parse the date
        3. return the parsed date
      }catch(ParseException e){
          try{
              1. create a SimpleDateFormat object using 'dd/MM/yyy' as the pattern
              2. parse the date
              3. return parsed date
           }catch(ParseException e){
              return null
           }
      }
   }else{
      return null
   }

} 

So you may have already spotted the problem. I am using the try .. catch as part of my logic. It would be better is I can determine before hand that the String actually contains a parseable date in some format then attempt to parse it.

So, is there some API or library that can help with this? I do not mind writing several different Parse classes to handle the different formats and then creating a factory to select the correct6 one, but, how do I determine which one?

Thanks.

11条回答
乱世女痞
2楼-- · 2019-01-14 17:28

Use regular expressions to parse your string. Make sure that you keep both regex's pre-compiled (not create new on every method call, but store them as constants), and compare if it actually is faster then the try-catch you use.

I still find it strange that your method returns null if both versions fail rather then throwing an exception.

查看更多
别忘想泡老子
3楼-- · 2019-01-14 17:37

See Lazy Error Handling in Java for an overview of how to eliminate try/catch blocks using an Option type.

Functional Java is your friend.

In essence, what you want to do is to wrap the date parsing in a function that doesn't throw anything, but indicates in its return type whether parsing was successful or not. For example:

import fj.F; import fj.F2;
import fj.data.Option;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import static fj.Function.curry;
import static fj.Option.some;
import static fj.Option.none;
...

F<String, F<String, Option<Date>>> parseDate =
  curry(new F2<String, String, Option<Date>>() {
    public Option<Date> f(String pattern, String s) {
      try {
        return some(new SimpleDateFormat(pattern).parse(s));
      }
      catch (ParseException e) {
        return none();
      }
    }
  });

OK, now you've a reusable date parser that doesn't throw anything, but indicates failure by returning a value of type Option.None. Here's how you use it:

import fj.data.List;
import static fj.data.Stream.stream;
import static fj.data.Option.isSome_;
....
public Option<Date> parseWithPatterns(String s, Stream<String> patterns) { 
  return stream(s).apply(patterns.map(parseDate)).find(isSome_()); 
}

That will give you the date parsed with the first pattern that matches, or a value of type Option.None, which is type-safe whereas null isn't.

If you're wondering what Stream is... it's a lazy list. This ensures that you ignore patterns after the first successful one. No need to do too much work.

Call your function like this:

for (Date d: parseWithPatterns(someString, stream("dd/MM/yyyy", "dd-MM-yyyy")) {
  // Do something with the date here.
}

Or...

Option<Date> d = parseWithPatterns(someString,
                                   stream("dd/MM/yyyy", "dd-MM-yyyy"));
if (d.isNone()) {
  // Handle the case where neither pattern matches.
} 
else {
  // Do something with d.some()
}
查看更多
一纸荒年 Trace。
4楼-- · 2019-01-14 17:37

If you formats are exact (June 7th 1999 would be either 07-Jun-99 or 07/06/1999: you are sure that you have leading zeros), then you could just check for the length of the string before trying to parse.

Be careful with the short month name in the first version, because Jun may not be June in another language.

But if your data is coming from one database, then I would just convert all dates to the common format (it is one-off, but then you control the data and its format).

查看更多
Evening l夕情丶
5楼-- · 2019-01-14 17:38

Assuming the patterns you gave are the only likely choices, I would look at the String passed in to see which format to apply.

public Date parseDate(final String date) {
  if (date == null) {
    return null;
  }

  SimpleDateFormat format = (date.charAt(2) == '/') ? new SimpleDateFormat("dd/MMM/yyyy")
                                                   : new SimpleDateFormat("dd-MMM-yy");
  try {
    return format.parse(date);
  } catch (ParseException e) {
    // Log a complaint and include date in the complaint
  }
  return null;
}

As others have mentioned, if you can guarantee that you will never access the DateFormats in a multi-threaded manner, you can make class-level or static instances.

查看更多
孤傲高冷的网名
6楼-- · 2019-01-14 17:40

In this limited situation, the best (and fastest method) is certinally to parse out the day, then based on the next char either '/' or '-' try to parse out the rest. and if at any point there is unexpected data, return NULL then.

查看更多
登录 后发表回答