Systems and methods for tokenizing user-annotated names
US10552462B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Oct 28, 2014 |
| Grant date | Feb 4, 2020 |
| Priority date | — |
| Expiry date | May 5, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/35
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A disclosed computer-implemented method for tokenizing user-annotated names may include (1) identifying an example set of user-annotated names, (2) creating a custom dictionary that includes known keywords by (a) extracting a set of known keywords from the example set of user-annotated names and (b) assigning a frequency score to each known keyword in the set of known keywords based on the respective frequency of each known keyword within the example set, and (3) enabling the computing device to tokenize an additional user-annotated name of arbitrary structure by performing a semantic analysis including (a) assigning, using the custom dictionary, a frequency score to a substring of the additional user-annotated name based on the substring matching the known keyword and (b) splitting the additional user-annotated name into tokens according to a permutation of substrings that received a top combined frequency score. Various other methods, systems, and computer-readable media are also disclosed.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.