Patent · US Expired

Method for characterizing a document set using evaluation surrogates

US5774888A · kind A · utility

61Cited by
2References
12Claims
0Family size

Assignee

Inventor

Key dates

Filing dateDec 30, 1996
Grant dateJun 30, 1998
Priority date
Expiry dateDec 30, 2016

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99945
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method is provided for determining the relevance of a document to one or more topics, each of which is specified by a topic profile. The document is tokenized into a stream of document tokens and compound terms specified in the topic profiles are identified among the document tokens. The stream of document tokens is augmented for each identified compound term with a tagged compound term token specified in the topic profile. The augmented stream of document tokens is stopped to eliminate tokens representing common terms, redundant terms, and selected terms associated with tagged tokens. A similarity function is calculated between the resulting document representation and each of the topic profiles to provide an evaluation surrogate that includes measures of relevance the document to each of the topic profiles.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.