Patent · US Active

Automatically augmenting and labeling conversational data for training machine learning models

US12321702B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 31, 2022
Grant dateJun 3, 2025
Priority date
Expiry dateJul 21, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L15/063
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method implemented via execution of computing instructions configured to run at one or more processors and stored at one or more non-transitory computer-readable media. The method can include generating training data for an intent classification machine learning model by: (a) determining, via a text-to-text machine learning model, one or more respective paraphrases for each sample phrase of training phrases; (b) generating, via a label generating machine learning model, labeled data based on unlabeled live logs by: (i) determining live-log samples from the unlabeled live logs based at least in part on: a respective timestamp of each live log of the unlabeled live logs, or random sampling; and (ii) generating, via the label generating machine learning model, the labeled data based on the live-log samples and one or more labeling functions; and (c) adding the one or more respective paraphrases for the each sample phrase of the training phrases and the labeled data to the training data. In certain embodiments, a respective quantity of the one or more respective paraphrases can vary for the each sample phrase of the training phrases. In some embodiments, the method further can includ…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.