Patent · US Active

Fast text character set recognition

US7865355B2 · kind B2 · utility

8Cited by

6References

18Claims

0Family size

Assignee

SAP AG · DE

Inventors

Ming Xu · Nanjing, CN
Nobuyoshi Mori · Hachioji, JP

Key dates

Filing date	Jul 30, 2004
Grant date	Jan 4, 2011
Priority date	—
Expiry date	May 22, 2027

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/263
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods and apparatus, including computer program products, for identifying a language corresponding to a string of data include receiving a data string and dividing the data string into coded character sequences for each of a plurality of languages. A length of one or more coded character sequences varies among different languages for coded character sequences having a particular number of characters. The coded character sequences are analyzed to calculate, for each of the plurality of languages, a probability that the data string corresponds to language. The calculated probabilities are compared among the languages, and a language is identified as corresponding to the data string based on the comparison.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.