Dynamic record blocking
US8645399B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 12, 2012 |
| Grant date | Feb 4, 2014 |
| Priority date | — |
| Expiry date | Jan 12, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/215
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Dynamic blocking determines which pairs of records in a data set should be examined as potential duplicates. Records are grouped together into blocks by shared properties that are indicators of duplication. Blocks that are too large to be efficiently processed are further subdivided by other properties chosen in a data-driven way. We demonstrate the viability of this algorithm for large data sets. We have scaled this system up to work on billions of records on an 80 node Hadoop cluster.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.