Patent · US Active

Out-of-domain data augmentation for natural language processing

US12026468B2 · kind B2 · utility

0Cited by

5References

14Claims

0Family size

Assignee

Oracle International Corporation · US

Inventors

Elias Luqman Jalaluddin · Seattle, US
Vishal Vishnoi · Redwood City, US
Thanh Long Duong · Melbourne, AU
Mark Edward Johnson · Chatswood, AU
Poorya Zaremoodi · Melbourne, AU
Gautam Singaraju · Dublin, US
Ying Xu · Albion, AU
Vladislav Blinov · Melbourne, AU
Yu-Heng Hong · Melbourne, AU

Key dates

Filing date	Oct 28, 2021
Grant date	Jul 2, 2024
Priority date	—
Expiry date	Jul 12, 2042

Classification

Technology area (CPC H)Electricity
CPC primaryH04L51/02
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Techniques for out-of-domain data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances comprising utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.