Document information retrieval using global word co-occurrence patterns
US5675819A · kind A · utility
Assignee
Inventor
Key dates
| Filing date | Jun 16, 1994 |
| Grant date | Oct 7, 1997 |
| Priority date | — |
| Expiry date | Jun 16, 2014 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99933
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and apparatus accesses relevant documents based on a query. A thesaurus of word vectors is formed for the words in the corpus of documents. The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors, which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking of the documents within the factor cluster.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.