Identifying language attributes through probabilistic analysis
US7386438B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 4, 2003 |
| Grant date | Jun 10, 2008 |
| Priority date | — |
| Expiry date | Oct 17, 2025 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/263
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for identifying language attributes through probabilistic analysis is described. A set of language classes and a plurality of training documents are defined, Each language class identifies a language and a character set encoding. Occurrences of one or more document properties within each training document are evaluated. For each language class, a probability for the document properties set conditioned on the occurrence of the language class is calculated. Byte occurrences within each training document are evaluated. For each language class, a probability for the byte occurrences conditioned on the occurrence of the language class is calculated.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.