Patent · US Active

Using context to extract entities from a document collection

US9251248B2 · kind B2 · utility

1Cited by
8References
20Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 7, 2010
Grant dateFeb 2, 2016
Priority date
Expiry dateFeb 1, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/93
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Described is using context information obtained from entity mentions in likely relevant documents to extract entity mentions from documents that are ambiguous with respect to their relevance to a domain. A list of entities is input into an entity extraction mechanism, which processes a large collection of documents to determine data (counts) corresponding to frequency of entity mentions. Infrequently mentioned entities are specific entities, while frequently mentioned entities are non-specific (generic or ambiguous) entities. The context surrounding mentions of the specific entities is processed to obtain interesting context terms (words, phrases or both) for the domain. The interesting context terms are then compared against the contexts of non-specific entity mentions to determine whether each non-specific entity mention is relevant to the domain. A result set containing only relevant documents or relevant mentions collection is output.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.