Chinese character-based parser
US7464024B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 16, 2004 |
| Grant date | Dec 9, 2008 |
| Priority date | — |
| Expiry date | Jun 26, 2026 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/53
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.