Patent · US Active

Using context to extract entities from a document collection

US9251248B2 · kind B2 · utility

1Cited by

8References

20Claims

0Family size

Assignee

Microsoft Licensing Technology, LLC · US

Inventor

Sanjay Agrawal · Kirkland, US

Key dates

Filing date	Jun 7, 2010
Grant date	Feb 2, 2016
Priority date	—
Expiry date	Feb 1, 2033

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/93
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Described is using context information obtained from entity mentions in likely relevant documents to extract entity mentions from documents that are ambiguous with respect to their relevance to a domain. A list of entities is input into an entity extraction mechanism, which processes a large collection of documents to determine data (counts) corresponding to frequency of entity mentions. Infrequently mentioned entities are specific entities, while frequently mentioned entities are non-specific (generic or ambiguous) entities. The context surrounding mentions of the specific entities is processed to obtain interesting context terms (words, phrases or both) for the domain. The interesting context terms are then compared against the contexts of non-specific entity mentions to determine whether each non-specific entity mention is relevant to the domain. A result set containing only relevant documents or relevant mentions collection is output.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.