Patent · US Active

Automatically selecting relevant data based on user specified data and machine learning characteristics for data integration

US12190215B1 · kind B1 · utility

0Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 25, 2023
Grant dateJan 7, 2025
Priority date
Expiry dateOct 25, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N7/01
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Automatically selecting data for machine learning datasets is provided. The method comprises receiving an input dataset and user-specified data quality metrics. The input dataset is matched to a subset of candidate datasets in a repository according to schema characteristics. A second subset of candidate datasets having a distance from the input dataset above a specified threshold is selected from the first subset of candidate datasets. The second subset of candidate datasets are merged into a merged dataset. Top ranked samples above a specified second threshold are identified from the merged dataset based on the user-specified data quality metrics. The input dataset, augmented with the top ranked samples, is returned to the user.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.