Locality-sensitive hashing to clean and normalize text logs
US11244156B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 29, 2020 |
| Grant date | Feb 8, 2022 |
| Priority date | — |
| Expiry date | Oct 29, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2218/12
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques for improved text normalization are provided. Signatures are generated for a first word and a second word using a locality-sensitive hashing technique. A graph is constructed based on the first and second signatures, by creating a first node in the graph for the first word, creating a second node in the graph for the second word, and creating an edge in the graph connecting the first and second nodes upon determining that the first and second signatures match. A mapping from the first word to the second word is then generated based on the graph.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.