Patent · US Expired

Tokenizer for a natural language processing system

US7269547B2 · kind B2 · utility

9Cited by
14References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 15, 2005
Grant dateSep 11, 2007
Priority date
Expiry dateJul 15, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/268
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present invention is a segmenter used in a natural language processing system. The segmenter segments a textual input string into tokens for further natural language processing. In accordance with one feature of the invention, the segmenter includes a tokeinzer engine that proposes segmentations and submits them to a linguistic knowledge component for validation. In accordance with another feature of the invention, the segmentation system includes language specific data that contains a precedence hierarchy for punctuation. If proposed tokens in the input string contain punctuation, they can illustratively be broken into subtokens based on the precedence hierarchy.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.