Patent · US Active

Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

US8290961B2 · kind B2 · utility

11Cited by
8References
26Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 13, 2009
Grant dateOct 16, 2012
Priority date
Expiry dateMar 12, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/334
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.