Simhash based spell correction
US8661341B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Jan 19, 2011 |
| Grant date | Feb 25, 2014 |
| Priority date | — |
| Expiry date | Sep 11, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/232
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus for performing simhash based spell correction are provided. A character string is simhashed to generate a simhashed character string. A plurality of substrings is extracted from the character string by applying a sliding window of at least two characters to the character string. The plurality of substrings are hashed to produce a plurality of corresponding hash values. Each hash value is processed to generate a simhashed character string. The simhashed character string is then compared with character strings within a simhashed dictionary dataset to determine at least one candidate to replace the character string. Processing each hash value includes extracting a set of lowest bits from each hash value, and mapping each set of lowest bits to the bitmask.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.