Patent · US Expired

Document information retrieval using global word co-occurrence patterns

US5675819A · kind A · utility

643Cited by
3References
47Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 16, 1994
Grant dateOct 7, 1997
Priority date
Expiry dateJun 16, 2014

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99933
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and apparatus accesses relevant documents based on a query. A thesaurus of word vectors is formed for the words in the corpus of documents. The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors, which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking of the documents within the factor cluster.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.