Patent · US Active

Identifying duplicate entities

US11436532B2 · kind B2 · utility

2Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 4, 2019
Grant dateSep 6, 2022
Priority date
Expiry dateFeb 26, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/2365
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The disclosed embodiments provide a system that identifies duplicate entities. During operation, the system selects training data for a first machine learning model based on confidence scores representing likelihoods that pairs of entities in an online system are duplicates. Next, the system updates parameters of the first machine learning model based on features and labels in the training data. The system then identifies a first subset of additional pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the additional pairs and a first threshold associated with the scores. The system also determines a canonical entity in each of the duplicate entities based on additional features. Finally, the system updates content outputted in a user interface of the online system based on the identified first subset of the additional pairs.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.