Patent · US Active

Out-of-domain data augmentation for natural language processing

US12026468B2 · kind B2 · utility

0Cited by
5References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 28, 2021
Grant dateJul 2, 2024
Priority date
Expiry dateJul 12, 2042

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04L51/02
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques for out-of-domain data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances comprising utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.