Method and system for clustering identified forms
US7996390B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 15, 2008 |
| Grant date | Aug 9, 2011 |
| Priority date | — |
| Expiry date | Mar 6, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method is provided for organizing a plurality of documents that include forms. An initial set of clusters is defined for the plurality of documents. The initial set of clusters is reclustered based on similarity values calculated in multiple feature spaces. For example, a first feature space may be associated with a content of a document while a second feature space may be associated with a content of a form associated with the document. Each cluster has an associated centroid vector in each feature space that is used to represent the cluster. The similarity between the document and each cluster is calculated in both feature spaces. Each document is assigned to the cluster whose centroid is most similar. The cluster centroids may be recalculated and the process repeated until the cluster assignments become stable.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.