Method and system for tokenizing documents
US10241998B1 · kind B1 · utility
1Cited by
4References
16Claims
0Family size
Assignee
Inventors
Key dates
| Filing date | Jun 29, 2016 |
| Grant date | Mar 26, 2019 |
| Priority date | — |
| Expiry date | Aug 23, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/126
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for tokenizing documents. The method includes obtaining a document comprising text to be tokenized, isolating a first string of consecutive characters in the document, searching, in a token tree, for an expression that matches the first string, making a determination that a matching expression exists in the token tree and, based on the determination, storing the matching expression as an extracted token.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.