Patent · US Expired

Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision

US5659766A · kind A · utility

61Cited by
4References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 16, 1994
Grant dateAug 19, 1997
Priority date
Expiry dateSep 16, 2014

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/355
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An iterative method of determining the topical content of a document using a computer. The processing unit of the computer determines the topical content of documents presented to it in machine readable form using information stored in computer memory. That information includes word-clusters, a lexicon, and association strength values. The processing unit beings by generating an observed feature vector for the document being characterized, which indicates which of the words of the lexicon appear in the document. Afterward, the processing unit makes an initial prediction of the topical content of the document in the form of a topic belief vector. The processing unit uses the topic belief vector and the association strength values to predict which words of the lexicon should appear in the document. This prediction is represented via a predicted feature vector. The predicted feature vector is then compared to the observed feature vector to measure how well the topic belief vector models the topical content of the document. If the topic belief vector adequately model the topical content of the document, then the processing unit's task is complete. On the other hand, if the topic belief…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.