Using context to extract entities from a document collection
US9251248B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 7, 2010 |
| Grant date | Feb 2, 2016 |
| Priority date | — |
| Expiry date | Feb 1, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/93
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described is using context information obtained from entity mentions in likely relevant documents to extract entity mentions from documents that are ambiguous with respect to their relevance to a domain. A list of entities is input into an entity extraction mechanism, which processes a large collection of documents to determine data (counts) corresponding to frequency of entity mentions. Infrequently mentioned entities are specific entities, while frequently mentioned entities are non-specific (generic or ambiguous) entities. The context surrounding mentions of the specific entities is processed to obtain interesting context terms (words, phrases or both) for the domain. The interesting context terms are then compared against the contexts of non-specific entity mentions to determine whether each non-specific entity mention is relevant to the domain. A result set containing only relevant documents or relevant mentions collection is output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.