Scalable approach to information-theoretic string similarity using a guaranteed rank threshold
US10482128B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | May 15, 2017 |
| Grant date | Nov 19, 2019 |
| Priority date | — |
| Expiry date | Jan 31, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A string analysis tool for calculating a similarity metric between an input string and a plurality of strings in a collection to be searched. The string analysis tool may include optimizations that may reduce the number of calculations to be carried out when calculating the similarity metric for large volumes of data. In this regard, the string analysis tool may represent strings as features. As such, analysis may be performed relative to features (e.g., of either the input string or plurality of strings to be searched) such that features from the strings may be eliminated from consideration when identifying candidate strings from the collection for which a similarity metric is to be calculated. The elimination of features may be based on a minimum similarity metric threshold, wherein features that are incapable of contributing to a similarity metric above the minimum similarity metric threshold are eliminated from consideration.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.