Automatically augmenting and labeling conversational data for training machine learning models
US12321702B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 31, 2022 |
| Grant date | Jun 3, 2025 |
| Priority date | — |
| Expiry date | Jul 21, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/063
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method implemented via execution of computing instructions configured to run at one or more processors and stored at one or more non-transitory computer-readable media. The method can include generating training data for an intent classification machine learning model by: (a) determining, via a text-to-text machine learning model, one or more respective paraphrases for each sample phrase of training phrases; (b) generating, via a label generating machine learning model, labeled data based on unlabeled live logs by: (i) determining live-log samples from the unlabeled live logs based at least in part on: a respective timestamp of each live log of the unlabeled live logs, or random sampling; and (ii) generating, via the label generating machine learning model, the labeled data based on the live-log samples and one or more labeling functions; and (c) adding the one or more respective paraphrases for the each sample phrase of the training phrases and the labeled data to the training data. In certain embodiments, a respective quantity of the one or more respective paraphrases can vary for the each sample phrase of the training phrases. In some embodiments, the method further can includ…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.