Patent · US Active

Phrase based document clustering with automatic phrase extraction

US8781817B2 · kind B2 · utility

2Cited by
22References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 4, 2013
Grant dateJul 15, 2014
Priority date
Expiry dateMar 4, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/237
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.