Cost-sensitive alternating decision trees for record linkage
US8949158B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 25, 2011 |
| Grant date | Feb 3, 2015 |
| Priority date | — |
| Expiry date | Jun 11, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F18/214
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Record Linkage (RL) is the task of identifying two or more records referring to the same entity (e.g., a person, a company, etc.). RL models can be based on Cost Sensitive Alternating Decision Trees (ADTree), an algorithm that uniquely combines boosting and decision trees algorithms to create shorter and easier-to-interpret linking rules. These models can be naturally trained to operate at industrial precision/recall operating points, and the shorter output rules are so clear that it can effectively explain its decisions to non-technical users via score aggregation or visualization. The models significantly outperform other baselines on the desired industrial operating points, and the improved understanding of the model's decisions led to faster debugging and feature development cycles.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.