Patent · US Active

Method and system for tokenizing documents

US10241998B1 · kind B1 · utility

1Cited by
4References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 29, 2016
Grant dateMar 26, 2019
Priority date
Expiry dateAug 23, 2036

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/126
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for tokenizing documents. The method includes obtaining a document comprising text to be tokenized, isolating a first string of consecutive characters in the document, searching, in a token tree, for an expression that matches the first string, making a determination that a matching expression exists in the token tree and, based on the determination, storing the matching expression as an extracted token.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.