System, method, and computer program product for cleaning noisy data from unlabeled datasets using autoencoders
US11948064B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 2, 2022 |
| Grant date | Apr 2, 2024 |
| Priority date | — |
| Expiry date | Sep 2, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/094
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and computer program products are provided for cleaning noisy data from unlabeled datasets using autoencoders. A method includes receiving training data including noisy samples and other samples. An autoencoder network is trained based on the training data to increase a first metric based on the noisy samples and to reduce a second metric based on the other samples. Unlabeled data including unlabeled samples is received. A plurality of third outputs is generated by the autoencoder network based on the plurality of unlabeled samples. For each respective unlabeled sample, a respective third metric is determined based on the respective unlabeled sample and a respective third output, and whether to label the respective unlabeled sample as noisy or clean is determined based on the respective third metric and a threshold. Each respective unlabeled sample determined to be labeled as noisy is cleaned.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.