Patent · US Active

Method and system for clustering identified forms

US7996390B2 · kind B2 · utility

21Cited by
3References
32Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 15, 2008
Grant dateAug 9, 2011
Priority date
Expiry dateMar 6, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method is provided for organizing a plurality of documents that include forms. An initial set of clusters is defined for the plurality of documents. The initial set of clusters is reclustered based on similarity values calculated in multiple feature spaces. For example, a first feature space may be associated with a content of a document while a second feature space may be associated with a content of a form associated with the document. Each cluster has an associated centroid vector in each feature space that is used to represent the cluster. The similarity between the document and each cluster is calculated in both feature spaces. Each document is assigned to the cluster whose centroid is most similar. The cluster centroids may be recalculated and the process repeated until the cluster assignments become stable.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.