Finding a DOI in a document or page

2019-03-07 18:40发布

The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful for citation information, etc.

Is there a reliable way to identify a DOI in a block of text without assuming the 'doi:' prefix? (any language acceptable, regexes preferred, and avoiding false positives a must)

标签: regex doi
7条回答
我只想做你的唯一
2楼-- · 2019-03-07 19:09

This is a really old and answered question, but here's another potential substitute.

\b10\.(\d+\.*)+[\/](([^\s\.])+\.*)+\b

This assumes that white space is not part of the DOI.

Haven't tested this for false positives, but it seems to be able to find all the edge cases mentioned in this page.

查看更多
登录 后发表回答