Patent · US Active

Method and system for identifying duplicate columns using statistical, semantics and machine learning techniques

US11561944B2 · kind B2 · utility

0Cited by
3References
9Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 29, 2020
Grant dateJan 24, 2023
Priority date
Expiry dateJul 15, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N7/02
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

With the availability of huge amount of data, it has becoming difficult to identify and manage duplicate data, especially when the data is in a plurality of columns. A method and system for identifying duplicate columns using statistical, semantics and machine learning techniques have been provided. The system provides a design framework to compare huge datasets at column level and identify potential duplicate columns, not based on the column title, but based on all of its values. The disclosure has ability to compare values in multiple columns and identify potential duplicate columns wherein comparison of values is not only for the exact match, but for semantic match, smart match, fuzzy match, and match after UOM conversion etc. using Statistical, semantics and machine learning techniques.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.