Supervised semantic indexing and its extensions
US8359282B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 18, 2009 |
| Grant date | Jan 22, 2013 |
| Priority date | — |
| Expiry date | Jun 23, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/3334
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for determining a similarity between a document and a query includes providing a frequently used dictionary and an infrequently used dictionary in storage memory. For each word or gram in the infrequently used dictionary, n words or grams are correlated from the frequently used dictionary based on a first score. Features for a vector of the infrequently used words or grams are replaced with features from a vector of the correlated words or grams from the frequently used dictionary when the features from a vector of the correlated words or grams meet a threshold value. A similarity score is determined between weight vectors of a query and one or more documents in a corpus by employing the features from the vector of the correlated words or grams that met the threshold value.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.