Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix
US8290961B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 13, 2009 |
| Grant date | Oct 16, 2012 |
| Priority date | — |
| Expiry date | Mar 12, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/334
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.