Patent · US Active

Using source-channel models for word segmentation

US7493251B2 · kind B2 · utility

136Cited by
5References
51Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 30, 2003
Grant dateFeb 17, 2009
Priority date
Expiry dateJan 9, 2027

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/284
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.