Patent · US Expired

Automatic categorization of documents using document signatures

US6442555B1 · kind B1 · utility

61Cited by
8References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 26, 1999
Grant dateAug 27, 2002
Priority date
Expiry dateOct 26, 2019

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99942
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of quickly and automatically comparing a new document to a large number of previously seen documents and identifying the document type. First, provide a plurality of document type distributions, each document type distribution describes layout characteristics of an independent document type and may include a plurality of data points. Each document type distribution includes data derived from at least one basis document signature which may include data defining pixels of a low-resolution image of the independent basis document resolved to between 1 and 75 dots per inch or may include document segmentation data derived from the independent basis document. Next provide a new electronic document. Then create new document signature from the new electronic document. Next, distances between the new document signature and each of the plurality of document type distributions are calculated using an algorithm based on a Bayesian framework for a Gaussian distribution. The distances calculated may be Euclidean distances or may be Mahalanobis distances. Additionally, calculating the distances may include weighting the value given each of a plurality of data points in the document signa…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.