HPL-2012-46High-recall extraction of acronym-definition pairs with relevance feedback
Yarygina, Anna; Vassilieva, Natalia
Keyword(s): Text processing; acronym-definition extraction; relevance feedback; supervised learning
Abstract: This paper addresses the problem of extracting acronyms and their definitions from large documents in a setting, when high recall is required and user feedback is available. We propose a three step approach to deal with the problem. First, acronym candidates are extracted using a weak regular expression. This step results in a list of acronyms with high recall but low precision rates. Second, definitions are constructed for every acronym candidate from its surrounding text. And last, a classifier is used to select genuine acronym- definition pairs. At the last step we use relevance feedback mechanism to tune the classifier model for every particular document. This allows achieving reasonable precision without losing recall. As opposed to existing approaches, either created to be generic and domain independent or tuned to one particular domain, our method is adaptive to an input document. We evaluate the proposed approach using three datasets from different domains. The experiments prove the validity of the presented ideas.
External Posting Date: March 22, 2012 [Abstract Only]. Approved for External Publication - External Copyright Consideration
Internal Posting Date: March 22, 2012 [Fulltext]