Automatically selecting relevant data based on user specified data and machine learning characteristics for data integration
US12190215B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 25, 2023 |
| Grant date | Jan 7, 2025 |
| Priority date | — |
| Expiry date | Oct 25, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N7/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Automatically selecting data for machine learning datasets is provided. The method comprises receiving an input dataset and user-specified data quality metrics. The input dataset is matched to a subset of candidate datasets in a repository according to schema characteristics. A second subset of candidate datasets having a distance from the input dataset above a specified threshold is selected from the first subset of candidate datasets. The second subset of candidate datasets are merged into a merged dataset. Top ranked samples above a specified second threshold are identified from the merged dataset based on the user-specified data quality metrics. The input dataset, augmented with the top ranked samples, is returned to the user.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.