Patent · US Expired

Method and apparatus for measuring similarity among electronic documents

US6990628B1 · kind B1 · utility

202Cited by
23References
34Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 14, 1999
Grant dateJan 24, 2006
Priority date
Expiry dateMar 28, 2021

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99936
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and apparatus are provided for determining when electronic documents stored in a large collection of documents are similar to one another. A plurality of similarity information is derived from the documents. The similarity information may be based on a variety of factors, including hyperlinks in the documents, text similarity, user click-through information, similarity in the titles of the documents or their location identifiers, and patterns of user viewing. The similarity information is fed to a combination function that synthesizes the various measures of similarity information into combined similarity information. Using the combined similarity information, an objective function is iteratively maximized in order to yield a generalized similarity value that expresses the similarity of particular pairs of documents. In an embodiment, the generalized similarity value is used to determine the proper category, among a taxonomy of categories in an index, cache or search system, into which certain documents belong.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.