Patent · US Active

Automated identification of documents as not belonging to any language

US8224642B2 · kind B2 · utility

2Cited by
10References
32Claims
0Family size

Assignee

Inventor

Key dates

Filing dateNov 20, 2008
Grant dateJul 17, 2012
Priority date
Expiry dateMay 12, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/263
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An “impostor profile” for a language is used to determine whether documents are in that language or no language. The impostor profile for a given language provides statistical information about the expected results of applying a language model for one or more other (“impostor”) languages to a document that is in fact in the given language. After a most likely language for a test document is identified, the impostor profile is used together with the scores for the test document in the various impostor languages to determine whether to identify the test document as being in the most likely language or in no language.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.