Information retrieval and text mining using distributed latent semantic indexing
US7152065B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 1, 2003 |
| Grant date | Dec 19, 2006 |
| Priority date | — |
| Expiry date | May 1, 2023 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99943
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The use of latent semantic indexing (LSI) for information retrieval and text mining operations is adapted to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains. A similarity graph network is generated in order to expose links between concept domains which are then exploited in determing which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.