Patent · US Expired

Information retrieval and text mining using distributed latent semantic indexing

US7152065B2 · kind B2 · utility

78Cited by
2References
28Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 1, 2003
Grant dateDec 19, 2006
Priority date
Expiry dateMay 1, 2023

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99943
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The use of latent semantic indexing (LSI) for information retrieval and text mining operations is adapted to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains. A similarity graph network is generated in order to expose links between concept domains which are then exploited in determing which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.