-->

how to automatically detect acronym meaning / exte

2019-03-21 17:45发布

问题:

How can you detect / find out the meaning (the extension) of an acronym using NLP / Information Extraction (IE) methods?

We want to detect in free text if a word or it's acronym is used and map it to the same entity / token.

Most papers available online are about medical acronyms and they do not provide a library for acomplish this task.

Any ideas?

回答1:

Reading your question and the comments I understand that you want to create a mapping from an acronym to its extension.

Assuming you have a collection of textual documents where both the acronym and its expansion occur you can apply an algorithm to extract (acronym,extension) pairs.

A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst, does exactly this by looking at patterns. The Java implementation is available here.

I applied this algorithm to the English Wikipedia, you can see the results here. I also applied it to a collection of Portuguese new articles, results are here.



回答2:

Wordnet contains acronym for tons of words which you can use in variety of programming languages: http://wordnet.princeton.edu/wordnet/

Or get from Freebase. See this: What is one way to find related names using the web?