System and method for clustering unstructured documents
US7809727B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 24, 2007 |
| Grant date | Oct 5, 2010 |
| Priority date | — |
| Expiry date | Jan 12, 2029 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99945
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for clustering unstructured documents is provided. Documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions are selected. Concepts are generated for the selected documents. The selected documents are grouped into clusters of the documents. A weight for each of the clusters is evaluated. A similarity value is determined from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document. Each selected document is assigned into one such cluster based on the similarity value of the selected document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.