Language segmentation of multilingual texts
US9400787B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Nov 6, 2013 |
| Grant date | Jul 26, 2016 |
| Priority date | — |
| Expiry date | Sep 24, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/263
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.