Automated identification of documents as not belonging to any language
US8224642B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Nov 20, 2008 |
| Grant date | Jul 17, 2012 |
| Priority date | — |
| Expiry date | May 12, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/263
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.