Temporal Extraction (i.e. Extract date/time entiti

2020-08-23 16:22发布

Has anyone found a simple, but effective way to extract date references from text? I've done a fair amount of searching for temporal extraction tools, but there isn't a lot out there. There are a few white papers, but it seems to fall into a subset of the whole semantic web thingy but not given much attention.

I'm just looking for something that is 80% effective. There is no need to capture things like "the month after Jan 2009", but basic common dates entities would be nice.

I'm open to all suggestions, even fancy regex expressions.

Fire away!

(and thanks - Henry)

3条回答
啃猪蹄的小仙女
2楼-- · 2020-08-23 16:55

I'm drawing a blank on how to find what to feed it, but this library will parse a wide range of dates and could be used as the "is this a real date" function. (Full disclosure, I'm the author of that lib)

查看更多
我命由我不由天
3楼-- · 2020-08-23 16:57

One way I have done this is to just look for anything that is 4 numbers and convert it to a number. If the number falls within the range of years you are interested in, you probably have a year you can use. If you are interested in any matching months and days you could check adjacent words to see if they are a month name or a number between 1 and 31. I am confident this would satisfy your 80% requirement.

Regex for years: [0-9]{4} - you will need to convert to a number and see if it's within the range of years you consider valid.

Regex for months: jan|january|feb|february ... etc for each month

Regex for days of the month: [0-9]{1,2} - you would need to convert to a number and see if it is 1-31

查看更多
倾城 Initia
4楼-- · 2020-08-23 16:59
  1. If the target temporal expressions in your data are only in limited format, use regular expression and iterative approach to refine your system

  2. Otherwise, use Stanford NLP toolkit, SUTime, which might be an over-kill but definitely meet your demands

查看更多
登录 后发表回答