Out-of-domain data augmentation for natural language processing
US12293155B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 9, 2024 |
| Grant date | May 6, 2025 |
| Priority date | — |
| Expiry date | Apr 9, 2044 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L51/02
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances including utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.