Method for characterizing a document set using evaluation surrogates
US5774888A · kind A · utility
Assignee
Inventor
Key dates
| Filing date | Dec 30, 1996 |
| Grant date | Jun 30, 1998 |
| Priority date | — |
| Expiry date | Dec 30, 2016 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99945
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method is provided for determining the relevance of a document to one or more topics, each of which is specified by a topic profile. The document is tokenized into a stream of document tokens and compound terms specified in the topic profiles are identified among the document tokens. The stream of document tokens is augmented for each identified compound term with a tagged compound term token specified in the topic profile. The augmented stream of document tokens is stopped to eliminate tokens representing common terms, redundant terms, and selected terms associated with tagged tokens. A similarity function is calculated between the resulting document representation and each of the topic profiles to provide an evaluation surrogate that includes measures of relevance the document to each of the topic profiles.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.