Systems and methods for automatic clustering and canonical designation of related data in various data structures
US10127289B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 10, 2016 |
| Grant date | Nov 13, 2018 |
| Priority date | — |
| Expiry date | Aug 10, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F18/23
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.