Patent · US Active

Chinese character-based parser

US7464024B2 · kind B2 · utility

5Cited by
0References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 16, 2004
Grant dateDec 9, 2008
Priority date
Expiry dateJun 26, 2026

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/53
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.