Patent · US Active

Dynamic record blocking

US8645399B2 · kind B2 · utility

21Cited by
24References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 12, 2012
Grant dateFeb 4, 2014
Priority date
Expiry dateJan 12, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/215
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Dynamic blocking determines which pairs of records in a data set should be examined as potential duplicates. Records are grouped together into blocks by shared properties that are indicators of duplication. Blocks that are too large to be efficiently processed are further subdivided by other properties chosen in a data-driven way. We demonstrate the viability of this algorithm for large data sets. We have scaled this system up to work on billions of records on an 80 node Hadoop cluster.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.