Scalable probabilistic latent semantic analysis
US7844449B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 30, 2006 |
| Grant date | Nov 30, 2010 |
| Priority date | — |
| Expiry date | Mar 21, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A scalable two-pass scalable probabilistic latent semantic analysis (PLSA) methodology is disclosed that may perform more efficiently, and in some cases more accurately, than traditional PLSA, especially where large and/or sparse data sets are provided for analysis. The improved methodology can greatly reduce the storage and/or computational costs of training a PLSA model. In the first pass of the two-pass methodology, objects are clustered into groups, and PLSA is performed on the groups instead of the original individual objects. In the second pass, the conditional probability of a latent class, given an object, is obtained. This may be done by extending the training results of the first pass. During the second pass, the most likely latent classes for each object are identified.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.