Efficient document clustering
US8200670B1 · kind B1 · utility
16Cited by
0References
32Claims
0Family size
Assignee
Inventors
Key dates
| Filing date | Oct 31, 2008 |
| Grant date | Jun 12, 2012 |
| Priority date | — |
| Expiry date | Aug 15, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/355
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer program products, for clustering documents. A plurality of documents are identified from a set of documents, where the identified documents have the same top N terms by term frequency score for an integer N. A pattern string that is satisfied by at least a subset of the identified documents is identified. A document cluster is formed from at least the subset of the identified documents.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.