Patent · US Active

Fast text character set recognition

US7865355B2 · kind B2 · utility

8Cited by
6References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 30, 2004
Grant dateJan 4, 2011
Priority date
Expiry dateMay 22, 2027

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/263
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods and apparatus, including computer program products, for identifying a language corresponding to a string of data include receiving a data string and dividing the data string into coded character sequences for each of a plurality of languages. A length of one or more coded character sequences varies among different languages for coded character sequences having a particular number of characters. The coded character sequences are analyzed to calculate, for each of the plurality of languages, a probability that the data string corresponds to language. The calculated probabilities are compared among the languages, and a language is identified as corresponding to the data string based on the comparison.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.