Method and apparatus for measuring similarity among electronic documents
US6990628B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 14, 1999 |
| Grant date | Jan 24, 2006 |
| Priority date | — |
| Expiry date | Mar 28, 2021 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99936
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and apparatus are provided for determining when electronic documents stored in a large collection of documents are similar to one another. A plurality of similarity information is derived from the documents. The similarity information may be based on a variety of factors, including hyperlinks in the documents, text similarity, user click-through information, similarity in the titles of the documents or their location identifiers, and patterns of user viewing. The similarity information is fed to a combination function that synthesizes the various measures of similarity information into combined similarity information. Using the combined similarity information, an objective function is iteratively maximized in order to yield a generalized similarity value that expresses the similarity of particular pairs of documents. In an embodiment, the generalized similarity value is used to determine the proper category, among a taxonomy of categories in an index, cache or search system, into which certain documents belong.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.