Patent · US Expired

Method and apparatus for breaking words in a stream of text

US6035268A · kind A · utility

59Cited by
5References
41Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 21, 1997
Grant dateMar 7, 2000
Priority date
Expiry dateAug 21, 2017

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/53
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A word breaker utilizing a lexicon module and a processing module to identify word breaks in a stream of Asian (e.g. Japanese, Chinese, or Korean) language text. The lexicon module is a dictionary or database containing words native to the language of the input text. The processing module includes a plurality of analysis modules which operate on the input text. In particular, the processing module can include modules that analyze the input text using heuristic rules and statistical analysis to identify a first set of work breaks, thereby reducing the size of segments with undefined word breaks. The processing module also includes a database analysis module that identifies the remaining undefined word breaks in those smaller segments that have undergone heuristic or statistical analysis.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.