Integrated fuzzy joins in database management systems
US9317544B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 5, 2011 |
| Grant date | Apr 19, 2016 |
| Priority date | — |
| Expiry date | Dec 17, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/2458
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A fuzzy joins system that is integrated in a database system generates fuzzy joins between records from two datasets. The fuzzy joins system includes a tokenizer to generate tokens for data records and a transformer to find transforms for the tokens. The fuzzy joins system invokes a signature generator, running within a runtime layer of the database system, to generate signatures for data records based on the tokens and their transforms. Subsequently, an equi-join operation joins the records from the two datasets with at least one equal signature. A similarity calculator, running within a runtime layer of the database system, computes a similarity measure using the token information of the joined records. If the similarity measure for any two records is above a threshold, the fuzzy joins system generates a fuzzy join between such two records.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.