Patent · US Active

Identifying source datasets that fit a transfer learning process for a target domain

US11308077B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Bar Haim · Ashkelon, IL
Andrey Finkelshtein · Beer Sheva, IL
Eitan Menahem · Beer Sheva, IL
Noga Agmon · Givat Shmuel, IL

Key dates

Filing date	Jul 21, 2020
Grant date	Apr 19, 2022
Priority date	—
Expiry date	Oct 13, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.