Patent · US Expired

Identifying language attributes through probabilistic analysis

US7386438B1 · kind B1 · utility

234Cited by
4References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 4, 2003
Grant dateJun 10, 2008
Priority date
Expiry dateOct 17, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/263
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for identifying language attributes through probabilistic analysis is described. A set of language classes and a plurality of training documents are defined, Each language class identifies a language and a character set encoding. Occurrences of one or more document properties within each training document are evaluated. For each language class, a probability for the document properties set conditioned on the occurrence of the language class is calculated. Byte occurrences within each training document are evaluated. For each language class, a probability for the byte occurrences conditioned on the occurrence of the language class is calculated.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.