Bootstrapping text classifiers by language adaptation
US8521507B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 22, 2010 |
| Grant date | Aug 27, 2013 |
| Priority date | — |
| Expiry date | Jul 9, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/35
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Training data in one language is leveraged to develop classifiers for multiple languages under circumstances where all of those classifiers will be performing the same kind of classification task, but relative to linguistically different sets of texts, thereby saving the cost of manually labeling a different set of training data for each language. Classification knowledge is learned for a source language in which training data are available. That knowledge is transferred to another target language's classifier through the integration of language transition knowledge. The transferred model is adjusted to better fit the target language. In one technique, leveraging one language's classification knowledge in order to generate a classifiers for another language involves training a text classifier in a source language, transferring the learned classification knowledge from the source language to another target language using language translation techniques, and further tuning the transferred model to better fit the target language text.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.