Generating training sets to train machine learning models
US11514691B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 12, 2019 |
| Grant date | Nov 29, 2022 |
| Priority date | — |
| Expiry date | Sep 25, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/40
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer system trains a machine learning model. A vector representation is generated for each document in a collection of documents. The documents are clustered based on the vector representations of the documents to produce a plurality of clusters. A training set is produced by selecting one or more documents from each cluster, wherein the selected documents represent a sample of the collection of documents to train the machine learning model. The machine learning model is trained by applying the training set to the machine learning model. Embodiments of the present invention further include a method and program product for training a machine learning model in substantially the same manner described above.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.