Patent · US Expired

Identification of words in Japanese text by a computer system

US5946648A · kind A · utility

95Cited by

8References

16Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Patrick Halstead · Bellevue, US
Hisami Suzuki · Redmond, US

Key dates

Filing date	Jul 24, 1998
Grant date	Aug 31, 1999
Priority date	—
Expiry date	Jul 24, 2018

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/53
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A word breaking facility operates to identify words within a Japanese text string. The word breaking facility performs morphological processing to identify postfix bound morphemes and prefix bound morphemes. The word breaking facility also performs opheme matching to identify likely stem characters. A scoring heuristic is applied to determine an optimal analysis that includes a postfix analysis, a stem analysis, and a prefix analysis. The morphological analyses are stored in an efficient compressed format to minimize the amount of memory they occupy and maximize the analysis speed. The morphological analyses of postfixes, stems, and prefixes is performed in a right-to-left fashion. The word breaking facility may be used in applications that demand identity of selection granularity, autosummarization applications, content indexing applications, and natural language processing applications.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.